Radimaging Ltd - Paul Beck's Technical Working Notes for Microsoft Technology: snowflake

Showing posts with label snowflake. Show all posts

Sunday, 11 January 2026

Working with Snowflake and MS Fabric

Overview: Snowflake covers a small area of what Fabric does. But Snowflake cover it's area unbelievably well. For large enterprises use these together even though there is some overlap, Snowflake is great at what it does!

Five ways to use Snowflake data in Fabric:

1. ETL - use Data Factory or an ETL tool to copy data from Snowflake to Fabrics OneLake (point in time copy). This should be your last option.

2. Direct query (no copy) - Fabric compute (Power BI, Notebooks, Dataflows, Pipelines) runs queries directly against Snowflake’s SQL endpoint. Best when you want zero‑copy and Snowflake stays the system of record.

3. Mirroring (copy + sync) - Fabric mirrors a Snowflake database using CDC into OneLake so Fabric can work locally with governed, accelerated data while staying synced with Snowflake. Good for small and commonly accessed data.

4. Shortcut to Snowflake‑hosted Iceberg (no data copy) - Fabric creates a Shortcut (virtual pointer) to Iceberg tables stored with Snowflake, so Fabric tools read them without moving data.

5. Snowflake writes Iceberg to OneLake - Like option 3 but Snowflake handle the outbound - Snowflake materializes Iceberg tables into a OneLake location; Fabric then reads them natively (open‑format interop).

Reference:
Greg Beaumont's Architecture blog - Fantastic stuff!

Thursday, 9 October 2025

Medallion Architecture in Fabric High Level Organisation Design Pattern

Microsoft Fabric is excellent! We do still need to follow good practices we have been using for years, such as making data accessible and secure. Possibly the most used architecture for Big Data is the Medallion Architecture pattern, where data is ingested normally in a fairly raw format into the bronze layer, then transformed into more meaningful and usable information. Lastly, the gold layer exposes data relationally using semantic models to reporting tools.

Overview: This document outlines my attempt to organise enterprise data into MS Fabric using a Medallion Architecture based on Fabric Workspaces. Shortcuts are better than imported data, but it does depend on factors such as what the data source is, what data we need, how up-to-date the data is and performance requirements from the systems involved.

The reports and semantic models can get data from other workspaces at any of the medallion layers. This architecture lends itself well to using the new Direct Lake Query mode.

Summary of a Design used by a Large Enterprise:

Medallion architecture using Fabric Workspaces.

Wednesday, 30 April 2025

MS Fabric OneLake Shortcuts

"Shortcuts in Microsoft OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise." MS Learn

It allows open storage-format data to be stored in the source system, metadata to be added to OneLake, and the data to be queried. The load is primarily applied to the source system (e.g., Dataverse/Dynamics).

Clarification: A shortcut is automatically added to MS Fabric for each Dataverse. Dataverse creates Parquet files (est 5-10% extra data storage, counts against Dataverse storage). Via the shortcut, report writers or data engineers can access Dataverse data as if it were within MS Fabric's OneLake.

Understand: Dataverse creates Parquet files that MS Fabric can look at to generate dataset data.

"Shortcuts are objects in OneLake that point to other storage locations." MS Learn

External shortcuts (data is held at the source system) support any open format storage format, including:

Apache Iceberg Tables via Snowflake,
Parquet files on SnowFlake,
Microsoft Dataverse,
Azure Data Lake Storage (ADLS),
Google Cloud Storage,
Databricks,
Amazon S3 (including Iceberg tables),
Apache Spark (Iceberg)

Internal shortcuts supported:

SQL Databases: Connect to SQL databases within the Fabric environment.
Lakehouses: Reference data within different lakehouses.
Warehouses: Reference data stored in data warehouses.
Kusto Query Language (KQL) Databases: Connect to data stored in KQL databases.
Mirrored Azure Databricks Catalogues: Access data from mirrored Databricks catalogues.

I think these are also Internal shortcuts:

PostgreSQL
MySQL
MongoDB

Example High-Level Architecture

External shortcuts with Snowflake and Dataverse.

Sunday, 9 February 2025

Delta Parque Storage Understanding

Delta Lake leverages Parquet by building upon it to add powerful features, such as ACID transactions, versioning, and time travel.

Parquet Files

Store the actual data in a columnar format. Efficient for reading specific columns and compressing data

Delta Parquet

Delta adds four key advantages to Parquet file storage:

Component	Functionality Added
_delta_log	JSON and checkpoint files that track all changes (add, remove, update).
ACID Transactions	Ensure that you write to the log before modifying Parquet files.
Schema Enforcement	Validates data before writing to Parquet.
Time Travel	Uses the log to reconstruct previous versions of the data.

Process of using Delta Lake storage

Usage of Delta Parquet

Databricks created/invented Delta Parquet
Apache Spark
MS Fabric
Snowflake (via connector)
Presto (connector)

Report directly from Delta Parquet

Power BI,
Tableau, and
Jupyter Notebooks (man, do I like a good notebook!).

All can access Delta Parquet data indirectly via Spark or Databricks connectors.

Snowflake Notes:

Delta Parquet/Delta Lake in MS Fabric (Azure Synapse) can be used by Snowflake. There are various options, but Snowflake doesn't understand the delta part, so it needs to use a manifest to convert the delta part. You need to create External storage in Snowflake (similar to a shortcut in MS Fabric) and then make an external table in Snowflake. Delta Uniform then allows Snowflake to utilise Apache Iceberg, its native format for storing Parquet with time travel capabilities.

Tip: Apache Iceberg on Snowflake is similar to Delta Parquet on Databricks and MS Fabric.

Note: Power BI semantic model can utilise Snowflake parquet files, but the update is only aligned with the parquet file, and there is no ACID or time series ability.