0% found this document useful (0 votes)

8 views7 pages

Azure Data Engineer Technical Interview Questions

The document provides a comprehensive overview of Azure Data Engineering concepts, including Azure Data Factory, Synapse SQL, and data security measures. It covers key components such as pipelines, linked services, datasets, and various data transformation techniques. Additionally, it discusses the integration of Azure services with machine learning, data lakes, and security protocols, along with practical examples and applications.

Uploaded by

Vipul Khobragade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views7 pages

Azure Data Engineer Technical Interview Questions

Uploaded by

Vipul Khobragade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Azure Data Engineer Technical Interview Questions

What is the Azure Data Factory service?

Azure Data factory is the service that helps you to do the migrations and orchestrate the work. For
example, if you want to move the data from on-premise to cloud, maybe it is incremental or it will be lift
and shift work there you can leverage the Azure Data factory to do the work. It support around hundred of
the different data sources from which we can pull the data

What is a pipeline in the Azure data factory?

A pipeline in an Azure data factory is a single workflow or set of activities connected in a specific order to
serve the business purpose. Every pipeline has one unique name within the data factory account and
based on the requirement you can schedule the pipeline to execute. We can also track the history of the
pipeline execution and check for any error that occurred while execution.

What is the linked service in the Azure data factory?

Linked service is one of the components in the Azure data factory which is used to make a connection
hence to connect to any of the data sources you have to first create the linked service based upon the
type of data source. The linked service could have different parameters for example in the case of the
SQL Server linked service you probably have to give the server name, username, and password but for
connecting to the Azure blob storage you have to give the storage location details.

What is the dataset in the Azure Data factory?

Dataset needs to read-write data to any data source using the ADF. Dataset is the representation of the
type of data holds by the data source.

What are the parameters in the ADF?

There are parameters at the multiple levels in the ADF. Parameters are basically used to make the ADF
components dynamic rather than static. For example, using a parameter same dataset can be used to
read multiple tables. Different parameters are as follows:
> Linked Service Parameters
>Dataset Parameters
> Pipeline Parameters
> Global Parameters

Why you will use the data flow from the Azure data factory?

Data flow is used for a no-code transformation. For example when you are doing any ETL operation that
you wanted to do a couple of transformations and put some logic on your input data. You may have not
found it comfortable to type the query or when you are using the files as input then in that case you
cannot write the query at all. Hence data flow will come as a Savior in this situation. Using the data flow
you can just do drag and drop and write almost all your business logic without writing any code. Behind
the scene, data flow get converted into the spark code and it will run on the cluster.

What is Synapse SQL?

Synapse SQL is the ability to do T-SQL based analytics in the Synapse workspace. Synapse SQL has
two consumption models: dedicated and serverless. For the dedicated model, use dedicated SQL pools.
A workspace can have any number of these pools. To use the serverless model, use the serverless SQL
pools. Every workspace has one of these [Link] Synapse Studio, you can work with SQL pools by
running SQL scripts

How you can use Apache Spark through Azure Synapse Analytics?

In Azure analytics, you can run the spark code either using the notebook or you can create a job that will
run the spark code. For running the Spark code you need a Spark pool which is nothing just a cluster of
the nodes having a spark installed on it.

How we can copy the data in the synapse SQL table from the Azure blob storage file?

COPY INTO [dbo].[Trip] FROM '[Link]

WITH (

FIELDTERMINATOR='|',

ROWTERMINATOR='0x0A'

) OPTION (LABEL = 'COPY: [Link]');

What are the two different types of the cluster provided by the Databricks?

Two different types of the cluster provided by the Databricks are the interactive cluster and the job cluster.
To run the interactive notebook you will be going to use an interactive cluster and to run the job, we will
use the job cluster.

What is Azure Synapse Runtime?

Apache Spark pools in Azure Synapse use runtimes to tie together essential component versions, Azure
Synapse optimizations, packages, and connectors with a specific Apache Spark version. These runtimes
will be upgraded periodically to include new improvements, features, and [Link] runtimes have
the following advantages:
● Faster session startup times
● Tested compatibility with specific Apache Spark versions
● Access to popular, compatible connectors and open-source packages

Can you run machine learning algorithm using Azure Synapse Analytics?

Yes, it is possible to run the machine learning algorithm using Azure synapse Analytics. In the Azure
synapse analytics, we have an Apache Spark and there we can write the machine learning code and
which can be executed on the Spark cluster.

What are the two different types of execution modes provided by the Databricks?

You can run the Spark code in two modes that are interactive mode or job scheduled mode. In the
interactive mode, you can run the code line by line and see the output. In the job mode, it will run all code
together, and then you will see the output.

How you can connect the Azure data factory with the Azure Databricks?
To connect to Azure databricks we have to create a linked service that will point to the Azure databricks
account. Next in the pipeline, you will be going to use the notebook activity there you will provide the
linked service created for Databricks. You will also be going to provide the notebook path available in the
Azure Databricks workspace. That’s how you can use the Databricks from the Data factory.

Is it possible to connect MongoDB DB from the Azure data factory?

Yes, it is possible to connect MongoDB from the Azure data factory. You have to provide the proper
connection information about the MongoDB server. In case if this MongoDB server is residing outside the
Azure workspace then probably you have to create a self-hosted integration runtime, and through which
you can connect to the Mongo DB server.

How you can move your changes from one environment to another environment for the
Azure data factory?

We can migrate the code from one environment to another environment for the Azure data factory using
the ARM template. ARM template is the JSON representation of the pipeline that we have created.
Explain the data masking feature of Azure?

Data masking helps in preventing unauthorized access to delicate data by enabling customers to assign
how much of the delicate data to reveal with minimal impact on the application layer. Dynamic data
masking limits acute data exposure by masking it to non-privileged users. It is a policy-based security
feature that hides the delicate data in the result set of a query over designated database fields. In
contrast, the data in the database will not be changed.

A few data masking policies are:

● SQL users excluded from masking - A set of SQL users or Azure Active Directory
identities that get unmasked data in the SQL query results. Users with administrator
privileges are permanently banned from masking and seeing the original data without any
mask.
● Masking rules - A set of rules defining the designated fields to be masked and the masking
function used. The selected fields can be determined using a database schema, table, and
column names.
● Masking functions - A set of methods that control data exposure for different scenarios.

Describe various windowing functions of Azure Stream Analytics?

A window in Azure Stream Analytics is a block of instant events that enables users to perform various
operations on the event data. To analyze and partition a window in Azure Stream Analytics, There exist
four windowing functions:

● Hopping Window: In these windows, the data segments can overlap. So, to define a
hopping window, we need to specify two parameters:
○ Hop (duration of the overlap)
○ Window size (length of data segment)
● Tumbling Window: In this, the data stream is segmented into distinct time segments of
fixed length in the tumbling window function.
● Session Window: This function groups events based on arrival time, so there is no fixed
window size. Its purpose is to eliminate quiet periods in the data stream.
● Sliding Window: This windowing function does not necessarily produce aggregation after a
fixed time interval, unlike the tumbling and hopping window functions. Aggregation occurs
every time an existing event falls out of the time window, or a new event occurs.

What are the different security options available in the Azure SQL database?

Security plays a vital role in databases. Some of the security options available in the Azure SQL database
are:

● Azure SQL Firewall Rules: Azure provides two-level security. There are server-level
firewall rules which are stored in the SQL Master database. Server-level firewall rules
determine the access to the Azure database server. Users can also create database-level
firewall rules that govern the individual databases’ keys.
● Azure SQL TDE (Transparent Data Encryption): TDE is the technology used to encrypt
stored data. TDE is also available for Azure Synapse Analytics and Azure SQL Managed
Instances. With TDE, the encryption and decryption of databases, backups, and transaction
log files, happens in real-time.
● Always Encrypted: It is a feature designed to protect sensitive data stored in the Azure
SQL database, such as credit card numbers. This feature encrypts data within the client
applications using Always Encrypted-enabled driver. Encryption keys are not shared with
SQL Database, which means database admins do not have access to sensitive data.
● Database Auditing: Azure provides comprehensive auditing capabilities along with the
SQL Database. It is also possible to declare the audit policy at the individual database level,
allowing users to choose based on the requirements.

How data security is implemented in Azure Data Lake Storage(ADLS) Gen2?

Data security is one of the primary concerns for most organizations for moving data to cloud storage.
Azure data lake storage gen2 provides a multi-layered and robust security model. This model has 6 data
security layers:

● Authentication: The first layer includes user account security. ADLS Gen2 provides three
authentication modes, Azure Active Directory (AAD), Shared Access Token (SAS), and
Shared Key.
● Access Control: The next layer for restricting access to individual containers or files. This
can be managed using Roles and Access Control Lists (ACLs)
● Network Isolation: This layer enables administrators to manage access by disabling or
allowing access to only particular Virtual Private Networks (VPNs) or IP Addresses.
● Data Protection: This is achieved by encrypting in-transit data using HTTPS(Hypertext
Transfer Protocol Secure). Options to encrypt stored data are also available.
● Advanced Threat Protection: If enabled, ADLS Gen2 will monitor any unauthorized
attempts to access or exploit the storage account.
● Auditing: This is the sixth and final layer of security. ADLS Gen2 provides comprehensive
auditing features in which all account management activities are logged. These logs can be
later reviewed to ensure the highest level of security.

What do you mean by data modeling?

Data Modeling is creating a visual representation of an entire information system or parts to express
linkages between data points and structures. The purpose is to show the many types of data used and
stored in the system, the relationships between them, how the data can be classified and arranged, and
its formats and features. Data can be modeled according to the needs and requirements at various
degrees of abstraction. The process begins with stakeholders and end-users providing information about
business requirements. These business rules are then converted into data structures to create a concrete
database design.
There are two design schemas available in data modeling:

● Star Schema
● Snowflake Schema
What are the 2 levels of security in Azure data lake storage Gen2?
The two levels of security available in Azure data lake storage Gen2 are also adequate for Azure data
lake Gen1. Although this is not new, it is worth calling it two levels of security because it’s a fundamental
piece for getting started with the Azure data lake. The two levels of security are defined as:

● Role-Based Access Control (RBAC): RBAC includes built-in Azure roles such as reader,
owner, contributor, or custom. Typically, RBAC is assigned due to two reasons. One is to
permit the use of built-in data explorer tools that require reader permissions. Another is to
specify who can manage the service (i.e., update properties and settings for the storage
account).
● Control Lists (ACLs): ACLs specify exactly which data objects a user may write, read, and
execute (execution is required for browsing the directory structure). ACLs are POSIX
(Portable Operating System Interface) - compliant, thus familiar to those with a Linux or Unix
background.

Explain a few important concepts of the Azure data factory?

● Pipeline: It acts as a carrier in various processes occurring. An individual process is

considered an activity.
● Activities: It represents the processing steps of a pipeline. A pipeline can have one or many
activities. It can be a process like moving the dataset from one source to another or
querying a data set.
● Datasets: It is the source of data or, we can say it is a data structure that holds our data.
● Linked services: It stores information that is very important when connecting to an external
source.

Explain the process of creating ETL(Extract, Transform, Load)?

● Faster session startup times.

● Tested compatibility with specific Apache Spark versions.
● Access to popular, compatible connectors and open-source packages.

What is SerDe in the hive?

Serializer/Deserializer is popularly known as SerDe. For IO(Input/Output), Hive employs the SerDe
protocol. Serialization and deserialization are handled by the interface, which also interprets serialization
results as separate fields for processing.
The Deserializer turns a record into a Hive-compatible Java object. The Serializer now turns this Java
object into an HDFS (Hadoop Distributed File System) -compatible format. The storage role is then taken
over by HDFS. Anyone can create their own SerDe for their own data format.

Mention some common applications of Blob storage?

● Laboring images or documents straight to a browser.

● Saving files for shared access.
● Streaming audio and video.
● Collecting data for backup and recovery disaster restoration, and archiving.
● Saving data for analysis by an on-premises or Azure-hosted.

What are the main characteristics of Hadoop?

● It is an open-source structure that is ready for freeware.

● Hadoop is cooperative with the various types of hardware and simple to access distinct
hardware within a particular node.
● It encourages faster-distributed data processing.
● It saves the data in the group, which is unconventional of the rest of the operations.
● Hadoop supports building replicas for every block with separate nodes.

Azure Data Factory Interview Q&A Guide
No ratings yet
Azure Data Factory Interview Q&A Guide
5 pages
Azure Data Engineer Interview Questions
100% (1)
Azure Data Engineer Interview Questions
35 pages
Limitless Analytics with Azure Synapse
No ratings yet
Limitless Analytics with Azure Synapse
23 pages
Azure Data Engineer Interviews Questions
No ratings yet
Azure Data Engineer Interviews Questions
7 pages
ADF Interview Insights for 5-Year Pros
No ratings yet
ADF Interview Insights for 5-Year Pros
27 pages
Practice Test 5
No ratings yet
Practice Test 5
192 pages
Azure Data Factory Overview and Concepts
No ratings yet
Azure Data Factory Overview and Concepts
7 pages
ADF Interview Questions & Answers Guide
No ratings yet
ADF Interview Questions & Answers Guide
29 pages
Azure Data Engineer Interview Questions You Must Prepare
No ratings yet
Azure Data Engineer Interview Questions You Must Prepare
36 pages
DP-203 Azure Data Engineer Cheat Sheet
No ratings yet
DP-203 Azure Data Engineer Cheat Sheet
42 pages
Azure Data Engineer Certification Guide
No ratings yet
Azure Data Engineer Certification Guide
4 pages
Capgemini Data Engineer Interview Questions
No ratings yet
Capgemini Data Engineer Interview Questions
27 pages
Azure Data Factory & Synapse Activities Guide
No ratings yet
Azure Data Factory & Synapse Activities Guide
7 pages
Azure Data Factory & Synapse Activities Guide
No ratings yet
Azure Data Factory & Synapse Activities Guide
8 pages
Adf Cheatsheet
No ratings yet
Adf Cheatsheet
13 pages
Azure Data Engineer Interview Insights
No ratings yet
Azure Data Engineer Interview Insights
20 pages
ADF Basics Workshop Overview
No ratings yet
ADF Basics Workshop Overview
26 pages
Understanding Azure Data Factory Features
No ratings yet
Understanding Azure Data Factory Features
5 pages
Best Practices for Azure Data Factory
No ratings yet
Best Practices for Azure Data Factory
11 pages
Azure Data Factory Overview Guide
100% (2)
Azure Data Factory Overview Guide
26 pages
Azure Data Factory Interview Basics
No ratings yet
Azure Data Factory Interview Basics
18 pages
Overview of Azure Data Factory Features
No ratings yet
Overview of Azure Data Factory Features
10 pages
Data Engineering on Azure DP-203 Course
No ratings yet
Data Engineering on Azure DP-203 Course
8 pages
Azure Data Factory Overview and Key Features
No ratings yet
Azure Data Factory Overview and Key Features
15 pages
Azure Data Factory Overview and Features
No ratings yet
Azure Data Factory Overview and Features
21 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
35 pages
Azure Data Factory Interview Guide
No ratings yet
Azure Data Factory Interview Guide
12 pages
Azure Data Factory Overview and Guide
No ratings yet
Azure Data Factory Overview and Guide
11 pages
ADF Foundations Interview QA-1
No ratings yet
ADF Foundations Interview QA-1
3 pages
Azure Data Factory Overview and Basics
No ratings yet
Azure Data Factory Overview and Basics
54 pages
Understanding Azure Data Factory Components
No ratings yet
Understanding Azure Data Factory Components
16 pages
Azure Data Factory Overview Guide
No ratings yet
Azure Data Factory Overview Guide
12 pages
Overview of Azure Data Factory V2
100% (2)
Overview of Azure Data Factory V2
14 pages
Azure Data Factory Interview Questions 2025
No ratings yet
Azure Data Factory Interview Questions 2025
14 pages
MS Azure+Azure Data Engineering-Syllabus
No ratings yet
MS Azure+Azure Data Engineering-Syllabus
14 pages
Azure Data Engineering Overview
No ratings yet
Azure Data Engineering Overview
6 pages
Azure Data Lake & Power BI Overview
No ratings yet
Azure Data Lake & Power BI Overview
9 pages
Azure Data Engineering Course
No ratings yet
Azure Data Engineering Course
18 pages
DP 203T00A ENU PowerPoint - 01
No ratings yet
DP 203T00A ENU PowerPoint - 01
20 pages
Azure Data Factory: Data Integration Guide
No ratings yet
Azure Data Factory: Data Integration Guide
5 pages
Azure Data Factory Overview and Tutorial
100% (4)
Azure Data Factory Overview and Tutorial
16 pages
Opp. Satyam Theatre, Durga Bhavani Plaza, Ameerpet, Hyd-16: An ISO 9001: 2015 Certified Company
No ratings yet
Opp. Satyam Theatre, Durga Bhavani Plaza, Ameerpet, Hyd-16: An ISO 9001: 2015 Certified Company
11 pages
Pyspark: Exceptall vs Union Operations
No ratings yet
Pyspark: Exceptall vs Union Operations
8 pages
Azure Data Factory Overview and Features
No ratings yet
Azure Data Factory Overview and Features
26 pages
Azure Data Factory Overview and Components
No ratings yet
Azure Data Factory Overview and Components
43 pages
Azure Data Engineer Course Outline
No ratings yet
Azure Data Engineer Course Outline
4 pages
Azure Data Factory Overview and Features
No ratings yet
Azure Data Factory Overview and Features
22 pages
Understanding Azure Data Factory Features
No ratings yet
Understanding Azure Data Factory Features
12 pages
Top Azure Data Factory Interview Q&A
0% (1)
Top Azure Data Factory Interview Q&A
14 pages
Azure Data Factory Overview and Components
No ratings yet
Azure Data Factory Overview and Components
4 pages
Azure Data Factory Cheat Sheet
100% (1)
Azure Data Factory Cheat Sheet
14 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
4 pages
Azure Database Solutions Overview
No ratings yet
Azure Database Solutions Overview
3 pages
Azure Data Factory: Flattening JSON Guide
No ratings yet
Azure Data Factory: Flattening JSON Guide
11 pages
ADF Control Flow Activities Overview
No ratings yet
ADF Control Flow Activities Overview
88 pages
Exadata Cloud Service: Key Concepts and Actions
No ratings yet
Exadata Cloud Service: Key Concepts and Actions
34 pages
Oracle Database Security Overview
No ratings yet
Oracle Database Security Overview
34 pages
AZ-500 Assessment Guide for Instructors
No ratings yet
AZ-500 Assessment Guide for Instructors
15 pages
Oracle 12c Database Security Guide
No ratings yet
Oracle 12c Database Security Guide
19 pages
Oracle Transparent Data Encryption
No ratings yet
Oracle Transparent Data Encryption
9 pages
Always On SQL Server Troubleshooting Tips
No ratings yet
Always On SQL Server Troubleshooting Tips
16 pages
Always On SQL Server Interview Guide
No ratings yet
Always On SQL Server Interview Guide
8 pages
Oracle Database Migration Questionnaire
No ratings yet
Oracle Database Migration Questionnaire
2 pages
Final Report (170750107055-Android Internship)
No ratings yet
Final Report (170750107055-Android Internship)
74 pages
Azure Practice Exercises for Admins
No ratings yet
Azure Practice Exercises for Admins
24 pages
Oracle Database 11g DBA Training Overview
No ratings yet
Oracle Database 11g DBA Training Overview
45 pages
Migrate SQL Server to Azure SQL Managed Instance
No ratings yet
Migrate SQL Server to Azure SQL Managed Instance
17 pages
SQL Server Database Encryption Options
No ratings yet
SQL Server Database Encryption Options
6 pages
SQL Server Encryption & Key Management Guide
No ratings yet
SQL Server Encryption & Key Management Guide
32 pages
Synchronized Secondaries in Always On
No ratings yet
Synchronized Secondaries in Always On
18 pages
Database Security & Integrity Q&A Guide
No ratings yet
Database Security & Integrity Q&A Guide
23 pages
Enable TDE and Upgrade Oracle 12c to 19c
No ratings yet
Enable TDE and Upgrade Oracle 12c to 19c
41 pages
Understanding Transparent Data Encryption
No ratings yet
Understanding Transparent Data Encryption
18 pages
SQL Server Mission Critical Performance TDM White Paper
No ratings yet
SQL Server Mission Critical Performance TDM White Paper
23 pages
D97103GC10 Ag
No ratings yet
D97103GC10 Ag
394 pages
SQL Database Administration Course 20764C
No ratings yet
SQL Database Administration Course 20764C
6 pages
DP-203 Exam: Data Storage Design Guide
No ratings yet
DP-203 Exam: Data Storage Design Guide
31 pages
Oracle TDE: Implementation Guide
No ratings yet
Oracle TDE: Implementation Guide
25 pages
TDE Conversion for Oracle 11g/12c
No ratings yet
TDE Conversion for Oracle 11g/12c
18 pages
Transparent Data Encryption (TDE) in Oracle Database: Complete Guide
No ratings yet
Transparent Data Encryption (TDE) in Oracle Database: Complete Guide
19 pages
Oracle Cloud Infrastructure SSH Key Management
No ratings yet
Oracle Cloud Infrastructure SSH Key Management
21 pages
SQL Server Model Database Overview
No ratings yet
SQL Server Model Database Overview
10 pages
Oracle Autonomous Database Exam Guide
No ratings yet
Oracle Autonomous Database Exam Guide
21 pages
Azure Key Vault Admin Interview Guide
No ratings yet
Azure Key Vault Admin Interview Guide
139 pages
Azure Security Technologies Practice Test
No ratings yet
Azure Security Technologies Practice Test
75 pages