Showing posts with label data lake. Show all posts
Showing posts with label data lake. Show all posts

Wednesday 22 September 2021

Azure Synapse Intro

I have not done any serious big data work in many years and I'm using this post to understand the Azure options and update my skills.   

Azure Synapse Analytics - built for limitless scale for unstructured and structure data for big data  - supports petabyte analytics.  Ingest and prepare data for BI.  Instance can be a dedicated or serverless PaaS service.  Data lake Gen 2 assigned to each synapse instance.  GA Dec 2020.


Use to be Data Lake (massive, semi structured data) and Data warehouse, Azure Synapse Analytics can be referred to as Data Lakehouse.  Basically all the sperate pieces are under 1 umbrella i.e. ADF, SSIS, Dataware house,..
  • SQL Serverless Pools is pay be use/pay per query.  Can use SSMS to manage data.  Good for small or new data ware housing.  T-SQL works perfectly so simple for smaller projects or PoCs. 
  • Go from 3NF in relational data to using Fact and dimension tables to put data into a star model for MPP.
  • Data Warehouse Units (DW) - Same as DTU, made up of CPU, Mem, IO.  Simple to upgrade.  Lowest is DW100 can be huge, and simple to scale up and down.  More DW's means more VM's/work process work on the data.

Azure Synapse Studio  - SaaS development experience provide code free and code first (C#, Scala, Python, SQL, Java), notebooks are used for working with data.




Sunday 6 October 2019

Common Azure Services

Azure Key Vault - Secure config storage and retrieval
There are SDK's for working with Azure Key Vault such as the "Azure Key Vault secret client library for .NET (SDK v4)".  Extremely easy to get secrets from the secure vault using C#.

Azure Storage
Microsoft Azure Storage Explorer is a great tool for reviewing your Azure Storage and in the case below I used it to add some Azure table storage for a demo customer list.
There is also a web edition of Storage explorer that is in preview as of 18 Nov 2020.

App Service - Host Web sites or WebAPI

Azure Artifacts - Code and share your packages via NuGet, and npm packages with Azure Artifacts for more reliable and scalable builds

Azure Data Factory (ADF) - Basically PaaS fully managed Azure ETL/SSIS.  Many connectors to ingest data.  Send to Azure Synapse Analytics.

Azure Big Data


Azure Synapse Analytics  - is a managed PaaS solution that brings together ADF, Data Lakes (both Storage and Analyse) and Azure Data Warehouse under single managed solution.  Easier than the individual pieces and scales as you need with almost unlimited capability.  Azure Purview - discover and analyses all your data, integrates with AIP.  Azure Synapses simplified analytics, sold as a PaaS (Serverless) or dedicated.  Easiest way to draw data out of Azure Synapse is Power BI.  Easy to bring data into Azure Synapse from CosmosDB and SQL databases (no affect on performance) can automatically push the data into Synapse, no need for ADF. And the data is in live time.



Azure Application Configuration - Feature Toggles/Feature flags are extremely useful in code.  This service is great for turning on experimental features, operation feature, environment/release features, and security features.  Feature Toggles (aka Feature Flags) (martinfowler.com)  Use for feature flags whereas KeyVault is for secrets.



Azure Resource Explorer - Documentation on Azure API's and ability to call the APIs.

Azure Policy - Azure Policy Templates can be custom created that apply rules to your subscription.  There are a lot out of pre-canned policies.  You can enforce naming conventions, tagging standards, enforce deployment of resources into specific regions, ....