Showing posts with label snowflake. Show all posts
Showing posts with label snowflake. Show all posts

Wednesday, 30 April 2025

MS Fabric OneLake Shortcuts

 "Shortcuts in Microsoft OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise.MS Learn

It allows open storage format data to be stored in the source system, metadata is added to OneLake, and the data can be queried; the load is predominantly performed against the source system, e.g., Dataverse/Dynamics.

Clarification: A shortcut is automatically added to MS Fabric for each Dataverse.  Dataverse creates Parquet files (est 5-10% extra data storage, counts against Dataverse storage).  Via the shortcut, report writers or data engineers can access the Dataverse data as though it is inside MS Fabric's OneLake.

Understand: Dataverse creates Parquet files that MS Fabric can look at to generate dataset data.

"Shortcuts are objects in OneLake that point to other storage locations.MS Learn

External shortcuts (data is held at the source system) supports any open format storage format, including: 

  • Apache Iceberg Tables via Snowflake,
  • Parquet files on SnowFlake,
  • Microsoft Dataverse
  • Azure Data Lake Storage (ADLS), 
  • Google Cloud Storage, 
  • Databricks, 
  • Amazon S3 (including Iceberg tables),
  • Apache Spart (Iceberg)
Internal shortcuts supported:
  • SQL Databases: Connect to SQL databases within the Fabric environment.
  • Lakehouses: Reference data within different lakehouses.
  • Warehouses: Reference data stored in data warehouses.
  • Kusto Query Language (KQL) Databases: Connect to data stored in KQL databases.
  • Mirrored Azure Databricks Catalogs: Access data from mirrored Databricks catalogs.
I think these are also Internal shortcuts:
  • PostgreSQL
  • MySQL
  • MongoDB

Example High Level Architecture

External shortcuts with snowflake and Dataverse.


Sunday, 9 February 2025

Delta Parque Storage Understanding

Delta Lake leverages Parquet by building upon it to add powerful features, such as ACID transactions, versioning, and time travel.

Parquet Files

Store the actual data in a columnar format.  Efficient for reading specific columns and compressing data

Delta Parquet

Delta adds four key advantages to Parquet file storage:

ComponentFunctionality Added
_delta_logJSON and checkpoint files that track all changes (add, remove, update).
ACID TransactionsEnsure that you write to the log before modifying Parquet files.
Schema EnforcementValidates data before writing to Parquet.
Time TravelUses the log to reconstruct previous versions of the data.

Process of using Delta Lake storage


Usage of Delta Parquet

  • Databricks created/invented Delta Parquet
  • Apache Spark
  • MS Fabric
  • Snowflake (via connector)
  • Presto (connector)

Report directly from Delta Parquet

  • Power BI, 
  • Tableau, and 
  • Jupyter Notebooks (man, do I like a good notebook!).
All can access Delta Parquet data indirectly via Spark or Databricks connectors.

Snowflake Notes:

Delta Parquet/Delta Lake in MS Fabric (Azure Synapse) can be used by Snowflake. There are various options, but Snowflake doesn't understand the delta part, so it needs to use a manifest to convert the delta part.  You need to create External storage in Snowflake (similar to a shortcut in MS Fabric) and then make an external table in Snowflake.  Delta Uniform then allows Snowflake to utilise Apache Iceberg, its native format for storing Parquet with time travel capabilities.

Tip: Apache Iceberg on Snowflake is similar to Delta Parquet on Databricks and MS Fabric.

Note: Power BI semantic model can utilise Snowflake parquet files, but the update is only aligned with the parquet file, and there is no ACID or time series ability.