Radimaging Ltd - Paul Beck's Technical Working Notes for Microsoft Technology: Direct Lake

Showing posts with label Direct Lake. Show all posts

Thursday, 9 October 2025

Medallion Architecture in Fabric High Level Organisation Design Pattern

Microsoft Fabric is excellent! We do still need to follow good practices we have been using for years, such as making data accessible and secure. Possibly the most used architecture for Big Data is the Medallion Architecture pattern, where data is ingested normally in a fairly raw format into the bronze layer, then transformed into more meaningful and usable information. Lastly, the gold layer exposes data relationally using semantic models to reporting tools.

Overview: This document outlines my attempt to organise enterprise data into MS Fabric using a Medallion Architecture based on Fabric Workspaces. Shortcuts are better than imported data, but it does depend on factors such as what the data source is, what data we need, how up-to-date the data is and performance requirements from the systems involved.

The reports and semantic models can get data from other workspaces at any of the medallion layers. This architecture lends itself well to using the new Direct Lake Query mode.

Summary of a Design used by a Large Enterprise:

Medallion architecture using Fabric Workspaces.

Friday, 26 September 2025

European Fabric Conference in Vienna Sept 2025 takeways

FabConEurope25 was terrific in Vienna last week. Great opportunity to meet Fabric and data experts, speak to the product teams and experts, and the presentations were fantastic. The hardest part was deciding which session to attend as there are so many competing at the same time.

My big takeaways:

Fabric SQL is excellent. The HA, managed service, redundancy, and shipping logs ensure that OneLake is in near real-time. Fabric SQL supports new native geospatial types. SQL has temporal tables (old news), but row, column and object-level (incl. table) security is part of OneLake. There are a couple of things security reviewers will query, but they are addressed.
Fabric Data Agent is interesting. Connect to your SQL relational data and work with it.
User-defined functions (UDF), including Translytical (write-back), HTTP in or out, wrap stored procedures, notebooks,.... - amazing.
OneLake security is complex but can be understood, especially with containers/layers, such as Tenant, Workspace, Item, and Data. There is more needed, but it's miles ahead of anything else, and Graph is the magic, so it will only continue to improve. - amazing, but understand security. Embrace Entra and OAuth; use keys only as a last resort.
Snowflake is our friend. Parquet is fantastic, and Snowflake, including Iceberg, play well together with MS Fabric. There are new versions of Delta Parquet on the way (and this will even make Fabric stronger, supporting both existing and the latest formats).
Mirroring and shortcuts - don't ETL unless you need to shortcut, then mirror, then ETL.
Use workspaces to build out simple medallion architectures.
AI Search/Vector Search and SQL are crazy powerful.
New Map functionality has arrived and is arriving on Fabric. Org Apps for Maps is going to be helpful in the map space. pmtiles are native... (if you know you know)
Dataverse is great with Fabric and shortcuts, as I learned from Scott Sewell at an earlier conference. Onelake coupled with Dataverse, is massively underutilised by most orgs,
Power BI also features new Mapping and reporting capabilities related to geospatial data.
Other storage: CosmosDB (it has its place, but suddenly, with shortcuts, the biggest issue of cost can be massively reduced with the right design decisions). Postgres is becoming a 1st class citizen, which is excellent on multiple levels. The CDC stuff is fantastic already.
RTI on Fabric is going to revolutionise Open Telemetry and AI, networking through the OSI model, application testing, digital twins, and live monitoring,.... I already knew this, but it keeps getting better. EventHub and notebooks are my new best friends. IoT is the future; we all knew this, but now with Fabric, it will be much easier to implement safely and get early value.
Direct Lake is a game changer for Power BI - not new, but it just keeps getting better and better thanks to MS Graph.
Manage Private Endpoint as improved and should be part of all companies' governance.
Purview... It's excellent and solves/simplifies DLP, governance and permissions. I'm out of my depth on Fabric Purview and governance, and I know way more than most people on DLP and governance. Hire one of those key folks from Microsoft here.
Warehouse lineage of data is so helpful.
We need to understand Fabric Digital Twins, as it is likely to be a competitor or a solution we offer and integrate.
Parquet is brilliant and fundamentally is why AI is so successful.
Powerful stuff in RDF for modelling domains - this is going to be a business in itself. I'm clueless here, but I won't be in a few weeks.

Now the arr..

Pricing and capacity are not transparent. Watch out for the unexpected monster bill! Saying that the monitoring and controls are in place, but switching off my tenant doesn't sit well with me if workloads aren't correctly set out. Resource governance at the workspace level will help fix the situation or design around it, but it will be more expensive.
Workspace resource reservation does not exist yet; however, it can be managed using multiple fabric tenants. Distribution will be significantly improved for cost control with Workspace resource management.
Licensing needs proper thought for an enterprise, including ours. Reserve Fabric is 40% cheaper, and it cannot be suspended, so use the reserved fabric just as you would for most Azure Services. Good design results in much lower cost with Workloads. Once again, those who genuinely understand know my pain with the workload costs.
Vendors and partners are too far behind (probably due to the pace of innovation)

Microsoft Fabric is brilliant; it is all under one simple managed autoscaling umbrella. It integrates and plays nicely with other solutions, has excellent access to Microsoft storage, and is compatible with most of the others. Many companies will move onto Fabric or increase their usage in the short term, as it is clearly the leader in multiple Gartner segments, all under one hood. AI will continue to help drive its adoption by enterprises.

Wednesday, 15 November 2023

Ignite 2023 - Microsoft Fabric - Introduction

GA: Prepare your data for AI innovation with Microsoft Fabric—now generally available | Microsoft Fabric Blog

Everything is brought in and available for analysis in a single Service. Microsoft Fabric is a unified platform that brings all your analytics under a single service.

OneLake - per Fabric instance. Stores all data within the SaaS data lake (scales itself), automatically indexes data, and abides by AIP rules/labels. Intelligent data foundations.

All data is held in the Delta Parquet format (same format for any source). Data is ready to use. One copy of data.

Parquet is an open commonly used file storage that for storing and querying large datasets.

Delta Lake is a transaction layer that sits on top of Parquet files, if you know Iceberg it does the same thing such as allow time series data recording/querying, ACID transactions,...

Iceberg can use Parquet, Avro, or OCR data files. And basically adds Delta Lake type functionality on top of the storage files.

SaaS single service, no need to bring pieces together; one data source doesn't need to moved to slice data. Data stays at the original source but can be worked with, this all falls under the OneLake concept. Can query using multiple approaches. Create a shortcut to files/folders/databricks, and it becomes part of OneLake while the underlying data resides in the original location that is now linked (only works on Parquet and specific file types).

Mirroring in MS Fabric - get same benefits of shortcuts, but can connect to databases including SnowFlake, Dataverse, AWS S3 buckets & CosmosDB. Mirroring is always up to date in real time. Data is stored in Delta Parquet format so can now use. With these 2 approaches can use nearly any source. lots of connectors so you could use: Dataverse, Cosmos, Snowflake, SQL Server, blobs on S3,.. Then can write queries across all the data.

Copilot in Microsoft Fabric will help bring and analyses all the data.

Copilot for Power BI is impressive for building reports, but I need to play with it more.

Power BI is basically becoming Microsoft Fabric. The report generation piece is still called Power BI, but it falls under the MS Fabric product. Licensing for Power BI Pro is converted to MS Fabric, and you cannot stay on Power BI Premium.

MS Fabric has a new way to access data. It is impressive in that it is fast, real-time, stores data once, carries ACLs/permissions with a lot of the data. The ETL capabilities are amazing and configured for development.

MS Fabric also supports Real-Time Intelligence (RTI) and SQL Azure integrated.

Last updated: Feb 2025