Wednesday 22 September 2021

Azure Synapse Intro

I have not done any serious big data work in many years and I'm using this post to understand the Azure options and update my skills.   

Azure Synapse Analytics - built for limitless scale for unstructured and structure data for big data  - supports petabyte analytics.  Ingest and prepare data for BI.  Instance can be a dedicated or serverless PaaS service.  Data lake Gen 2 assigned to each synapse instance.  GA Dec 2020.


Use to be Data Lake (massive, semi structured data) and Data warehouse, Azure Synapse Analytics can be referred to as Data Lakehouse.  Basically all the sperate pieces are under 1 umbrella i.e. ADF, SSIS, Dataware house,..
  • SQL Serverless Pools is pay be use/pay per query.  Can use SSMS to manage data.  Good for small or new data ware housing.  T-SQL works perfectly so simple for smaller projects or PoCs. 
  • Go from 3NF in relational data to using Fact and dimension tables to put data into a star model for MPP.
  • Data Warehouse Units (DW) - Same as DTU, made up of CPU, Mem, IO.  Simple to upgrade.  Lowest is DW100 can be huge, and simple to scale up and down.  More DW's means more VM's/work process work on the data.

Azure Synapse Studio  - SaaS development experience provide code free and code first (C#, Scala, Python, SQL, Java), notebooks are used for working with data.




0 comments:

Post a Comment