Showing posts with label Mirroring. Show all posts
Showing posts with label Mirroring. Show all posts

Sunday, 11 January 2026

Working with Snowflake and MS Fabric

Overview: Snowflake covers a small area of what Fabric does.  But Snowflake cover it's area unbelievably well.  For large enterprises use these together even though there is some overlap, Snowflake is great at what it does! 

Five ways to use Snowflake data in Fabric: 

1. ETL - use Data Factory or an ETL tool to copy data from Snowflake to Fabrics OneLake (point in time copy).  This should be your last option. 

2. Direct query (no copy) - Fabric compute (Power BI, Notebooks, Dataflows, Pipelines) runs queries directly against Snowflake’s SQL endpoint. Best when you want zero‑copy and Snowflake stays the system of record.

3. Mirroring (copy + sync) - Fabric mirrors a Snowflake database using CDC into OneLake so Fabric can work locally with governed, accelerated data while staying synced with Snowflake.  Good for small and commonly accessed data. 

4. Shortcut to Snowflake‑hosted Iceberg (no data copy) - Fabric creates a Shortcut (virtual pointer) to Iceberg tables stored with Snowflake, so Fabric tools read them without moving data.

5. Snowflake writes Iceberg to OneLake - Like option 3 but Snowflake handle the outbound - Snowflake materializes Iceberg tables into a OneLake location; Fabric then reads them natively (open‑format interop).

Reference:
Greg Beaumont's Architecture blog - Fantastic stuff! 

Wednesday, 15 November 2023

Ignite 2023 - Microsoft Fabric - Introduction

GA: Prepare your data for AI innovation with Microsoft Fabric—now generally available | Microsoft Fabric Blog

Everything is brought in and available for analysis in a single Service.  Microsoft Fabric is a unified platform that brings all your analytics under a single service.

OneLake - per Fabric instance.  Stores all data within the SaaS data lake (scales itself), automatically indexes data, and abides by AIP rules/labels.  Intelligent data foundations.

All data is held in the Delta Parquet format (same format for any source).  Data is ready to use.  One copy of data.

Parquet is an open commonly used file storage that for storing and querying large datasets. 

Delta Lake is a transaction layer that sits on top of Parquet files, if you know Iceberg it does the same thing such as allow time series data recording/querying, ACID transactions,...

Iceberg can use Parquet, Avro, or OCR data files.  And basically adds Delta Lake type functionality on top of the storage files.

SaaS single service, no need to bring pieces together; one data source doesn't need to moved to slice data.  Data stays at the original source but can be worked with, this all falls under the OneLake concept.  Can query using multiple approaches. Create a shortcut to files/folders/databricks, and it becomes part of OneLake while the underlying data resides in the original location that is now linked (only works on Parquet and specific file types).  

Mirroring in MS Fabric - get same benefits of shortcuts, but can connect to databases including SnowFlake, Dataverse, AWS S3 buckets & CosmosDB.  Mirroring is always up to date in real time.  Data is stored in Delta Parquet format so can now use.  With these 2 approaches can use nearly any source. lots of connectors so you could use: Dataverse, Cosmos, Snowflake, SQL Server, blobs on S3,..  Then can write queries across all the data. 

Copilot in Microsoft Fabric will help bring and analyses all the data.

Copilot for Power BI is impressive for building reports, but I need to play with it more.

Power BI is basically becoming Microsoft Fabric. The report generation piece is still called Power BI, but it falls under the MS Fabric product. Licensing for Power BI Pro is converted to MS Fabric, and you cannot stay on Power BI Premium.

MS Fabric has a new way to access data. It is impressive in that it is fast, real-time, stores data once, carries ACLs/permissions with a lot of the data.  The ETL capabilities are amazing and configured for development.

MS Fabric also supports Real-Time Intelligence (RTI) and SQL Azure integrated.

Last updated: Feb 2025