Showing posts with label datawarehouse. Show all posts
Showing posts with label datawarehouse. Show all posts

Thursday, 9 October 2025

Medallion Architecture in Fabric High Level Organisation Design Pattern

Microsoft Fabric is excellent!  We do still need to follow good practices we have been using for years, such as making data accessible and secure.   Possibly the most used architecture for Big Data is the Medallion Architecture pattern, where data is ingested normally in a fairly raw format into the bronze layer, then transformed into more meaningful and usable information. Lastly, the gold layer exposes data relationally using semantic models to reporting tools.

Overview: This document outlines my attempt to organise enterprise data into MS Fabric using a Medallion Architecture based on Fabric Workspaces.  Shortcuts are better than imported data, but it does depend on factors such as what the data source is, what data we need, how up-to-date the data is and performance requirements from the systems involved.

The reports and semantic models can get data from other workspaces at any of the medallion layers.  This architecture lends itself well to using the new Direct Lake Query mode.

Summary of a Design used by a Large Enterprise:

Medallion architecture using Fabric Workspaces.

Wednesday, 22 September 2021

Azure Synapse Intro

I have not done any serious big data work in many years and I'm using this post to understand the Azure options and update my skills.   

Azure Synapse Analytics - built for limitless scale for unstructured and structure data for big data  - supports petabyte analytics.  Ingest and prepare data for BI.  Instance can be a dedicated or serverless PaaS service.  Data lake Gen 2 assigned to each synapse instance.  GA Dec 2020.


Use to be Data Lake (massive, semi structured data) and Data warehouse, Azure Synapse Analytics can be referred to as Data Lakehouse.  Basically all the sperate pieces are under 1 umbrella i.e. ADF, SSIS, Dataware house,..
  • SQL Serverless Pools is pay be use/pay per query.  Can use SSMS to manage data.  Good for small or new data ware housing.  T-SQL works perfectly so simple for smaller projects or PoCs. 
  • Go from 3NF in relational data to using Fact and dimension tables to put data into a star model for MPP.
  • Data Warehouse Units (DW) - Same as DTU, made up of CPU, Mem, IO.  Simple to upgrade.  Lowest is DW100 can be huge, and simple to scale up and down.  More DW's means more VM's/work process work on the data.

Azure Synapse Studio  - SaaS development experience provide code free and code first (C#, Scala, Python, SQL, Java), notebooks are used for working with data.




Sunday, 6 October 2019

Common Azure Services

Azure Key Vault - Secure config storage and retrieval
There are SDK's for working with Azure Key Vault such as the "Azure Key Vault secret client library for .NET (SDK v4)".  Extremely easy to get secrets from the secure vault using C#.

Azure Storage
Microsoft Azure Storage Explorer is a great tool for reviewing your Azure Storage and in the case below I used it to add some Azure table storage for a demo customer list.
There is also a web edition of Storage explorer that is in preview as of 18 Nov 2020.

App Service - Host Web sites or WebAPI

Azure Artefacts - Code and share your packages via NuGet and npm packages with Azure Artefacts for more reliable and scalable builds

Azure Data Factory (ADF) - Basically, PaaS, fully managed Azure ETL/SSIS.  Many connectors are used to ingest data.  Send to Azure Synapse Analytics. Same as AWS Glue.  GCP has two ETL tools that are decent: Cloud Data Fusion (Main ETL Tool) and Cloud Dataflow.  They have other options at GCP.  Update 2025: Fabrics Data Factory is even better than ADF.

Azure Big Data


Azure Synapse Analytics  - is a managed PaaS solution that brings together ADF, Data Lakes (both Storage and Analyse) and Azure Data Warehouse under single managed solution.  Easier than the individual pieces and scales as you need with almost unlimited capability.  Azure Purview - discover and analyses all your data, integrates with AIP.  Azure Synapses simplified analytics, sold as a PaaS (Serverless) or dedicated.  Easiest way to draw data out of Azure Synapse is Power BI.  Easy to bring data into Azure Synapse from CosmosDB and SQL databases (no affect on performance) can automatically push the data into Synapse, no need for ADF. And the data is in live time.



Azure Application Configuration - Feature Toggles/Feature flags are extremely useful in code.  This service is great for turning on experimental features, operation feature, environment/release features, and security features.  Feature Toggles (aka Feature Flags) (martinfowler.com)  Use for feature flags whereas KeyVault is for secrets.



Azure Resource Explorer - Documentation on Azure API's and ability to call the APIs.

Azure Policy - Azure Policy Templates can be custom created that apply rules to your subscription.  There are a lot out of pre-canned policies.  You can enforce naming conventions, tagging standards, enforce deployment of resources into specific regions, ....