Azure Data Engineer Preparation Course Outline (Duration 20 hours)
Introduction to the Course
• Introduction to Cloud Computing
• Azure and the Azure Free Account
• Quick tour of the Azure Portal
• Quick note - Security Defaults
• Batch vs Streaming Data
• OLTP vs OLAP
• Data Lake vs Data Warehouse
• Data Engineering Workflow
Design and Implementation of Data Storage/Data Lake Gen2
• Different Services for Azure Storage
• Azure Storage Platform
• Provision Azure Storage Account
• Explore Azure Storage
• What is Azure Blob Storage
• Data Replication in Storage
• Introduction to Azure Data Lake Gen 2
• Lifecycle Management
• Manual Failover
• Blob Access Tier
• Different file formats
• Lab - Uploading data to Azure Data Lake Gen2
Design and Implementation of Azure SQL and TSQL
• Azure SQL Introduction
• The internals of a database engine
• Explaining DB, Elastic Pool
• Lab - Setting up a new Azure SQL database
• Lab - Setting up SQL Server Management Studio
• Lab - Setting up a new Azure SQL database
• Lab - T-SQL
©Gaurav Gangwar
Design and Implementation of Synapse Analytics
• Explanation of Data Warehouse
• Welcome to Azure Synapse Analytics
• Let's open up some data
• External Tables - Parquet file
• Explain OPENROWSET
• Pausing the SQL Pool
• Creating a SQL pool
• SQL Pool - External Tables - Parquet
• SQL Pool - External Tables - CSV
• Loading data into the Dedicated SQL Pool
• Loading data into a table - COPY Command - CSV
• Building a Fact Table
• Building a dimension table
• Transfer data to our SQL Pool
• Using Power BI for Star Schema
• Understanding Azure Synapse Architecture
• Explore Table Distribution
Design and Implementation of Azure Data Factory
• Extract, Transform and Load
• Introduction to Azure Data Factory
• What is Azure Data Factory
• Create Data Factory Account
• Azure Data Factory Studio Overview
• Pipelines and Activities
• Introduction to Linked Services & Dataset
• Create Pipeline: Copy Data Activity
• Debug & Trigger Pipeline
• Introduction of Trigger
• Integration Runtime: Introduction
• Azure Integration Runtime
• Self-Hosted Integration Runtime
• SSIS Integration Runtime
• Pipeline Parameters and Variables: Introduction
• Explore System Variables
• Mapping Data Flow
• Azure Data Factory and Git
• Quick Note on other important aspects
• Note on partitions in the copy process
• Lab - Quick look at the web activity
• Lab - Get Metadata Activity
• Lab - For Each Activity
• Lab - Stored Procedures
• Lab - Using the Lookup Activity
©Gaurav Gangwar
Design and Implementation of Azure Event Hub
• Batch and Real-Time Processing
• What are Azure Event Hubs
• Lab - Creating an instance of Event hub
• Lab - .NET - Sending events
• Lab - .NET - Receiving events
Design and Implementation of Databricks Notebook
• Big Data History
• Spark History & Overview
• What is Databricks
• Databricks Architecture
• Databricks Workspace setup
• Explore Databricks workspace
• Notebook Fundamentals
• Databricks File System
• DBUTIL Overview
• Install Library
• Magic command
• Spark ArchitectureModes
• Spark benefits
• SparkRDD, Lazy Evaluation, DAG
• Spark deploy modes – Cluster Client
• Spark SQL, DataFrame, Operations in Spark
• Transformations, Actions, Column, Row,
• Functions – map, filter, where, withColumn
• Window Function.
• Running a DBSQL Query
• Deep Dive Lakehouse
• Introduction to Delta Lake.
• Introduction to Lakehouse Architecture.
• What is medallion architecture
• Deep Dive into Delta Lake using Pyspark Data frames
• Deep Dive into Delta Lake using Spark SQL
• Introduction to Databricks Autoloader and Cloud Files
• Creating delta table
• Running operation on Delta table
• Explaining features of delta
• Introduction to External and Managed Table
• Lab - Running an automated job
• Lab - Azure Data Factory - Running a notebook
©Gaurav Gangwar
Design and Implementation of Data Security
• Azure Data Lake Gen 2 Security - Account Keys
• Lab - Using the Azure Storage Explorer
• Azure Data Lake Gen 2 Security - Shared Access Signature
• Using Azure Active Directory
• Lab - Granting access via Azure AD
• Lab - Using Access Control Lists
• Lab - Azure Databricks - Secret Scope - Key Vault
• Lab - Azure Databricks - Secret Scope - Implementation
• Other ways of connecting to Azure storage
• About Managed Identities
• Azure Data Factory - Managed Identity
• Azure Storage Accounts - Network and Firewall
• Azure Storage Accounts - Virtual Network Service Endpoint
• Azure Data Factory - Encryption
Design and Implementation of Monitoring and Optimization
• Best practices for structing files in your data lake
• Azure Data Lake Gen2 - Access tiers
• Azure Data Lake Gen2 - Look at Access tiers
• Azure Data Lake Gen2 lifecycle policies
• View on Azure Monitor
• Lab - Azure Data Factory - Alert Rules
• Lab - Azure Data Factory - Persisting pipeline runs
• Azure Data Factory - Note on incremental data copy
• Azure Databricks - Monitoring
• Azure Databricks - Sending logs to Azure Monitor
• Azure Databricks - Pool
Other Topic Covered
• End to End Data Engineering Project
• Resume Review
• Mock Interview
©Gaurav Gangwar