0% found this document useful (0 votes)

4 views12 pages

Dataengineer IFAQ

The document is a categorized reference of interview questions for data engineering roles, covering topics such as SQL, Azure Data Factory, Databricks, PySpark, and AWS. It includes sections on basic to advanced SQL concepts, performance tips, and specific Azure Data Factory functionalities. The content is sourced from LinkedIn posts and is designed to aid in interview preparation for aspiring data engineers.

Uploaded by

rajeshganta.de7799

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views12 pages

Dataengineer IFAQ

Uploaded by

rajeshganta.de7799

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Engineering

Interview Questions — Categorized Reference

SQL · Azure Data Factory · Databricks · PySpark · ADLS · Synapse · Python · AWS

Source: LinkedIn posts from data engineering community. Compiled and organized by category for interview preparation.

Table of Contents

1. SQL — Basic Concepts ·······················································

2. SQL — Intermediate ····························································
3. SQL — Advanced ······························································
4. SQL — Performance Tips ·······················································
5. Azure Data Factory — Basic to Intermediate ···································
6. Azure Data Factory — Advanced & Scenario ······································
7. Azure Databricks ··························································
8. PySpark / Spark ······························································
9. ADLS Gen2 & Azure Storage ····················································
10. Azure Synapse Analytics ····················································
11. Python for Data Engineering ················································
12. Azure DE Key Topics — Study Roadmap ······································
13. AWS Cloud (Bonus) ····························································

1. SQL — Basic
Concepts for 0–3 Years Experience

1. What is SQL, and why is it important in data analytics?

2. Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
3. What is the difference between WHERE and HAVING clauses?
4. How do you use GROUP BY and HAVING in a query?
5. Write a query to find duplicate records in a table.
6. How do you retrieve unique values from a table using SQL?
7. Explain the use of aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX().
8. What is the purpose of the DISTINCT keyword in SQL?

2. SQL — Intermediate
Window Functions, CTEs, Indexes

1. Write a query to find the second-highest salary from an employee table.

2. What are subqueries and how do you use them?
3. What is a Common Table Expression (CTE)? Give an example of when to use it.
4. Explain window functions like ROW_NUMBER(), RANK(), and DENSE_RANK().
5. How do you combine results of two queries using UNION and UNION ALL?
6. What are indexes in SQL, and how do they improve query performance?
7. Write a query to calculate the total sales for each month using GROUP BY.
8. What is the difference between RANK, DENSE_RANK, and ROW_NUMBER?
9. How do you use PARTITION BY and ORDER BY in window functions?
10. Find the nth Order/Salary from a table.
11. Find the number of output records in each join from given Table 1 and Table 2.
12. YOY (Year-over-Year) and MOM (Month-over-Month) growth related questions.
13. Find Employee–Manager Hierarchy using a self join, or employees earning more than their managers.
14. Row-level scanning using CTE or recursive CTE (e.g., finding missing numbers/items).
15. Number of matches played by every team or source-to-destination flights using CROSS JOIN.
16. Use window functions to calculate moving averages or detect outliers.
17. Implement logic to handle hierarchical data (finding all descendants in a tree structure).
18. Identify and remove duplicate records from a table.
19. SQL query to find department-wise top 3 salaries.
20. Running sum, year-on-year growth, and top N people by salary using window functions.
9. Write a SQL query to find the third highest salary from the table emp. (Columns: id, salary)
10. Write a SQL query to find the department-wise third highest salary from the table emp. (Columns: dept_id,
salary)
11. Write a SQL query to find numbers which consecutively occur 3 times from the table emp. (Columns: id,
numbers)
12. Write a SQL query to find the days when temperature was higher than its previous date’s temperature.
(Columns: Days, Temp)
13. Write a query to find gaps in a dataset and fill that gap from table emp. (Columns: date, id; year is 2022)
14. Find the Most Recent Purchase Date for Each Customer. (Columns: customer_id, purchase_date)
15. Get the Number of Days Between the First and Last Purchase for Each Customer. (Columns: customer_id,
purchase_date)
16. What is a correlated subquery, and how does it differ from a non-correlated subquery?
17. Find the highest and lowest salary department-wise using GROUP BY with MAX() and MIN().

3. SQL — Advanced
Optimization, Views, Stored Procedures

1. How do you optimize a slow-running SQL query?

2. What are views in SQL, and when would you use them?
3. What is the difference between a stored procedure and a function in SQL?
4. Explain the difference between TRUNCATE, DELETE, and DROP commands.
5. What are windowing functions, and how are they used in analytics?
6. How do you handle NULL values in SQL, and what functions help (e.g., COALESCE, ISNULL)?
7. How would you optimize a complex SQL query with multiple joins and subqueries?
8. How do you handle NULL values in SQL while performing aggregations?
9. What is the purpose of the GROUP BY and HAVING clauses?
10. Explain how indexes work and when to use them.
11. Explain the difference between INNER, LEFT, ANTI, and SEMI JOINS with examples.
18. Write a SQL query to find all employees who earn more than their manager. (Columns: EmployeeID,
ManagerID, Salary)
19. What is the difference between a correlated and non-correlated subquery? Which is more performant?
20. SQL questions on window functions — rolling sum and lag/lead based. How do window functions differ from
aggregate functions?
21. Explain Slowly Changing Dimension (SCD) Type 1 in a data warehouse. How does it differ from SCD Type
2?
22. Given two datasets, explain how the number of records will vary for each type of join (Inner, Left, Right, Full
Outer Join).
23. Discuss key concepts in data modeling, including normalization and denormalization.
24. How can you mask sensitive data in Azure SQL Database? What are the different masking techniques
available?

4. SQL — Performance Tips

7 Best Practices for Query Optimization

1. Avoid using SELECT *; always specify only the columns you need.
2. Minimize the use of wildcard characters in LIKE clauses.
3. Opt for appropriate JOINs rather than relying on subqueries.
4. Reduce multiple OR conditions; consider alternatives like IN or UNION where applicable.
5. Use TOP or LIMIT to retrieve only the required rows instead of fetching the entire dataset.
6. Prefer COUNT(1) over COUNT(*) for large tables to optimize performance.
7. Ensure your database indexes are properly optimized for frequently queried columns.

5. Azure Data Factory — Basic to Intermediate

Components, Linked Services, Activities

1. What are the main components of Azure Data Factory?

2. What are Linked Services in ADF, and why are they important?
3. What types of triggers are available in ADF for pipeline execution?
4. What is the Copy Activity in ADF, and how can you optimize its performance?
5. How does ADF handle running multiple pipeline tasks simultaneously?
6. What is the Lookup activity in ADF, and how is it used?
7. How do you monitor and troubleshoot ADF pipelines?
8. What are activities in ADF? Name some important ones you have used.
9. What is the tumbling window trigger in ADF, and when should you use it?
10. Can you explain the different ways to execute pipelines in ADF?
11. What are the main data sources supported by ADF?
12. What is the Get Metadata activity in ADF?
13. What is the difference between Mapping Data Flow and Wrangling Data Flow in ADF?
14. How do you work with dynamic content and parameters in ADF pipelines?
15. How do you manage schema changes in ADF?
16. What is the purpose of a Self-hosted Integration Runtime, and when would you use it?
17. How do you integrate ADF with Databricks or Synapse Analytics?
18. What are ARM templates, and how are they used in ADF?
19. How do you handle incremental data loading in ADF?
20. How would you implement SCD Type 2 in ADF?
21. What is the limit on the number of pipelines and entities in ADF?
22. Describe the steps for deploying ADF pipelines using CI/CD.
23. What new features does ADF v2 have compared to ADF v1?
24. How can you improve data movement speed in ADF?
25. How does nested looping work in ADF?
26. Azure Data Factory — end-to-end flow of data ingestion.
27. What are Linked Services, Integration Runtime, and Datasets in ADF?
28. To copy 10 tables from the same source DB, how many datasets need to be created?

6. Azure Data Factory — Advanced & Scenario-Based

Orchestration, Error Handling, SCD, Real-time

1. How do you schedule when pipelines run in ADF?

2. How can you set up a stored procedure to run automatically in an ADF pipeline?
3. What are the different types of Integration Runtimes in ADF, and when do you use each?
4. What are Data Flows in Azure Data Factory?
5. How do you connect a Databricks notebook to ADF?
6. How would you handle errors and retries in ADF pipelines?
7. How would you take data from a CSV file in Azure Blob Storage and load it into Azure SQL Database?
8. How can you use parameters and variables in ADF pipelines to make them flexible and reusable?
9. How would you make a pipeline run only after another pipeline finishes successfully?
10. How would you set up a solution that chooses source and destination based on input parameters?
11. How do you transform data in ADF using data flow transformations?
12. How would you implement slowly changing dimensions (SCD) in ADF?
13. How do you manage access and security in Azure Data Lake Storage (ADLS)?
14. How do you connect ADF with CI/CD for deployment?
15. What is data lineage, and how would you show it in an Azure data pipeline?
16. What would you do if your data pipeline sometimes fails because of network problems?
17. How would you improve the performance of data partitioning in ADLS?
18. How would you set up incremental data loading from an on-premises SQL Server to Azure SQL?
19. How would you create a data pipeline that processes and analyzes streaming data in real-time?
20. How would you copy data from several CSV files in ADLS Gen2 to Azure SQL Database?
21. How would you manage incremental data loading in ADF from a SQL database?
22. How do you design an effective ADF pipeline — what metrics and considerations matter?
23. We have ADF and Databricks both — why use Databricks if ADF can also do transformations?
24. Share an example of when data pipeline monitoring missed an issue. How did you find and fix it?
25. What are the error handling mechanisms in ADF pipelines?
26. What steps would you take to set up logging and monitoring in ADF?
27. How do you set up dependencies and run pipelines in parallel in ADF?
28. What are best practices to make ADF pipelines perform better?
29. How can we load multiple (50+) tables at a time using ADF?
25. Explain the differences between a Scheduled Trigger and a Tumbling Window Trigger in ADF. When would
you use each?
26. What are the Control Flow activities in ADF? Explain how they differ from Data Flow activities and their
typical use cases.
27. What are the different types of Integration Runtimes (IR) in ADF? Discuss use cases and limitations of
each.
28. Describe the architecture and key components of Azure Data Factory. How do these components interact
to orchestrate data workflows?
29. What is Azure Data Factory (ADF), and how does it enable ETL and ELT processes in a cloud
environment?
30. 20 must-know ADF topics: Integration Runtime, Self-Hosted IR, On-Prem to Cloud Migration, Schedule
Triggers, Event-Based Triggers, Incremental Pipeline, Batch Pipeline, Calling One Pipeline from Another,
Dataset Parametrization, Copy Activity, Performance Optimization of Pipeline, Data Flows, CI/CD of ADF,
Pipeline Monitoring, Rerun Failed Pipeline, Get Metadata Activity, Handling If/For Loop, Handle Nested
Loop, Dynamic Content Manipulation, Pipeline Parameters.

7. Azure Databricks
Clusters, Notebooks, Delta Lake, PySpark

1. What are the types of tables in Databricks?

2. What are the types of clusters in Databricks — interactive and job clusters?
3. How to connect Azure Data Lake to Databricks?
4. What are RBAC, Azure Key Vault, and secret scopes in Databricks?
5. What is Medallion Architecture (Bronze, Silver, Gold)?
6. What are the different ways to run a notebook from another notebook in Databricks?
7. If you encounter an Out Of Memory (OOM) exception when joining two large datasets, how do you resolve
it?
8. Given these transformations — Read CSV, Repartition, Filter, Select, GroupBy, Collect() — how many jobs,
stages, and tasks are created?
9. Have you worked on Spark optimization? Explain an issue you faced and how you resolved it.
10. How do you use Kafka in your project? Describe the architecture and daily pipeline runs.
11. What are the benefits of using Delta tables?
12. What file formats have you worked with, and why is Parquet preferable to others?
13. If you have 100 GB of data, how would you choose the cluster configuration?
14. What is the Catalyst optimizer, and explain its phases?
15. Explain memory management in Spark.
16. How to flatten a JSON file using PySpark in Databricks?
17. How to read CSV files using PySpark?
18. What is a Watermark column and Watermark table?
31. What is the Databricks File System (DBFS), and how is it used?
32. How do you schedule jobs in Azure Databricks?
33. Explain the role of Apache Spark in Azure Databricks.
34. How do you configure a Spark cluster in Azure Databricks?
35. What are the advantages of using PySpark in Azure Databricks for data processing?
36. Describe the concept of notebooks in Azure Databricks.
37. How do Azure Databricks workspaces enhance collaboration?
38. Explain the significance of Delta Lake in Azure Databricks.
39. Explain the architecture of Delta Lake in Databricks and its benefits.
40. How would you implement multi-hop architecture (Bronze → Silver → Gold) in Databricks?
41. Describe the process of implementing auto-scaling in Databricks clusters.
42. How do you optimize Databricks Delta table performance?
43. Explain the concept of Databricks Unity Catalog and how it enhances data governance.
44. How do you implement SCD Type 1, 2, and 3 in PySpark on Databricks?

8. PySpark / Spark
Optimization, Delta Lake, Schema Evolution

1. Steps to tune your long-running Spark jobs.

2. What is the purpose of broadcast variables in Spark, and when would you use them?
3. What is data skewness, and how do you handle data skew in Spark jobs?
4. How do you achieve schema evolution in Spark?
5. What is Spark's lazy evaluation, and how does it optimize execution?
6. How do you handle large text files with no tabular structure in Spark?
7. What is Z-ordering in Databricks, and when would you use it?
8. What is the difference between Parquet and Delta files?
9. What are the benefits of using Delta tables over regular Parquet?
10. What is the Catalyst optimizer and explain its phases?
11. Explain memory management in Spark (executors, driver, shuffle).
12. How do you build ETL jobs using PySpark?
13. How do you handle partitioning, caching, and optimization in Spark?
14. What are Data Lake optimizations using Delta Lake?
15. How do you work with window functions, joins, and aggregations in Spark and SQL?
45. Describe the PySpark architecture.
46. What are RDDs in PySpark?
47. How does PySpark differ from Apache Hadoop?
48. What are DataFrames in PySpark?
49. How do you initialize a SparkSession?
50. What is the significance of the SparkContext?
51. Describe the types of transformations in PySpark (narrow vs wide).
52. What are actions in PySpark, and how do they differ from transformations?
53. How can you filter rows in a DataFrame?
54. How do you aggregate data in PySpark?
55. What are UDFs (User Defined Functions), and how are they used?
56. How do you repartition a DataFrame, and why would you do it?
57. Describe how to cache a DataFrame. Why is it useful?
58. How do you save a DataFrame to a file?
59. What are accumulators, and how are they used?
60. What is the significance of the Tungsten execution engine?
61. Explain how checkpointing works in PySpark.
62. What is data lakehouse architecture?
63. How do you find the second highest salary in a DataFrame using PySpark?
64. How do you use window functions in PySpark to calculate running totals or moving averages?
65. How do you remove duplicate rows from a DataFrame in PySpark?
66. How do you join two large datasets in PySpark without running out of memory?
67. How do you aggregate data by multiple columns in PySpark?
68. How do you perform ETL operations to clean and transform raw data into a structured format in PySpark?
69. How do you partition a large DataFrame by a specific column and write the partitions to different locations?
70. How do you handle and process JSON data in PySpark?
71. Which do you prefer — DataFrame or RDD? Justify your answer.
72. How do you define the structure (schema) of a DataFrame in PySpark?
73. How do you read data from a Parquet file in PySpark?
74. How do you add a new column to a DataFrame in PySpark?
75. Describe the process of implementing a custom aggregation function in PySpark.
76. How do you handle and optimize joins in PySpark when dealing with large datasets?
77. Explain the Catalyst optimizer in Spark SQL and how it improves query performance.
78. What are the best practices for managing memory in PySpark?
79. How can you monitor the performance of a PySpark application?

9. ADLS Gen2 & Azure Storage

Data Lake, Blob Storage, Medallion Architecture

1. What is the difference between Azure Blob Storage and Azure Data Lake Gen2?
2. What is ADLS Gen2 — storage for structured, semi-structured, and unstructured data?
3. What is Medallion Architecture — Bronze, Silver, Gold layers?
4. How do you design folder structures in ADLS for an incremental load pipeline?
5. How do you manage access and security in ADLS (RBAC, ACLs, managed identities)?
6. How would you improve the performance of data partitioning in ADLS?
7. What are Azure Storage Tiers, and how do you manage costs?
8. What file formats are used in ADLS (Parquet, Delta, CSV, JSON, Avro)?
9. What is the difference between Gen1 and Gen2 of ADLS?

10. Azure Synapse Analytics

SQL Pools, Star Schema, T-SQL

1. What is Synapse Analytics?

2. What is a SQL Pool, and what are its types (Dedicated vs Serverless)?
3. What is the difference between a Dedicated SQL Pool and a Serverless SQL Pool?
4. How do you connect Power BI to Azure Synapse Analytics?
5. How do you write efficient T-SQL queries in Synapse Analytics?
6. What are recursive queries, window functions, and CTEs in Synapse SQL?
7. How do you use partitioning and indexing in Synapse SQL pools?
8. What is a Star schema vs Snowflake schema for data warehousing?
9. How do you handle slowly changing dimensions (SCD) in Synapse?
10. How do you load data into Synapse using COPY INTO?
11. What is the MERGE statement in Synapse, and when do you use it?
11. Python for Data Engineering
Functions, Data Quality, Error Handling

1. What are decorators in Python and how can they be used in data engineering tasks?
2. How do you write data quality checks and explain the use-case?
3. Given an array of integers, return the indices of two numbers that add up to a specific target.
4. Write a function to rotate an array of n elements to the right by k steps.
5. How do you handle exceptions in Python using try and except?
6. Write a function to count the frequency of each character in a string.

12. Azure Data Engineering — Key Topics

Study Roadmap for Azure DE Certification / Interviews

1. Azure Data Services Overview: ADF (ETL), Databricks (analytics/ML), Synapse (data warehouse), ADLS
(data lake), Event Hub + Stream Analytics (real-time), SQL DB / Cosmos DB / Synapse SQL Pools.
2. Data Ingestion & ETL Pipelines: ADF pipeline design, Source-to-Sink connections, Incremental loading
(watermark/delta), Integration with on-prem using Self-hosted IR.
3. Data Transformation & Processing: Databricks PySpark ETL jobs, Partitioning/caching/optimization in
Spark, Delta Lake optimizations, Window functions/joins/aggregations.
4. Orchestration & Automation: ADF Triggers (tumbling window, event-based, scheduled), Parameterization
and reusable pipelines, Azure Logic Apps and Functions for orchestration.
5. Performance Optimization & Monitoring: Tuning ADF pipelines (parallel copy, partitioning), Optimizing
Spark jobs (reduce shuffle, broadcast joins), Azure Monitor alerts and logs, Managing costs with Azure
Storage Tiers.
6. Security & Governance: RBAC and managed identities, Data encryption at rest and in transit, Azure Key
Vault for credentials, Data governance with Azure Purview.
7. SQL Skills & Querying Data: Efficient T-SQL queries in Azure SQL and Synapse Analytics, Recursive
queries, CTEs, window functions, Partitioning and indexing.
8. Data Modeling & Architecture: Star and Snowflake schema, Designing data lakes (Bronze/Silver/Gold),
Handling slowly changing dimensions (SCD).
9. Hands-on Exercises: Azure Data Factory pipeline, Azure Databricks PySpark notebook, Azure SQL /
Synapse SQL pool.

13. AWS Cloud (Bonus)

Glue, DataSync, Incremental Loading

1. How do you implement incremental data loading in AWS Glue or AWS Data Pipeline?
2. Explain the difference between AWS Glue ETL jobs and AWS DataSync.
3. How do you handle error logging and notifications in AWS Glue workflows or pipelines?
4. What are the best practices for parameterizing AWS Glue scripts and workflows?
Preparation Tips
How to use this document effectively

1. Practice SQL on platforms like LeetCode, HackerRank, and StrataScratch — focus on window functions
and CTEs.
2. For ADF, create a free Azure account and build at least one end-to-end incremental pipeline using Lookup
+ Copy + Stored Procedure activities.
3. For Databricks, practice PySpark notebooks with Bronze → Silver → Gold medallion architecture using
Delta tables.
4. For Synapse, practice T-SQL including MERGE, COPY INTO, and Views for Power BI.
5. Be ready to explain the WHY: Why watermark? Why Databricks over ADF for heavy transforms? Why Delta
over Parquet?
6. Prepare scenario-based answers for questions like 'How would you handle 100 GB of data?' or 'How do
you handle OOM in Spark?'
7. For SQL interviews: always think about edge cases — NULLs, duplicates, missing rows, and data skew.

Good luck with your interviews! Build the project, practice the code, and you will be ready. 🚀

14. Dimensional Modeling

SCD, Star Schema, Junk Dimensions, Hierarchy

80. Describe the process of designing a Slowly Changing Dimension (SCD) Type 2.
81. How would you handle a many-to-many relationship in dimensional modeling?
82. Explain the concept of a junk dimension and when you would use it.
83. Describe the pros and cons of using a snowflake schema versus a star schema.
84. How would you design a dimension to handle hierarchical data?
85. Explain SCD Type 1 vs SCD Type 2 vs SCD Type 3 — when would you choose each?
86. How do you implement SCD Type 2 in PySpark using union and is_current flag?
87. What is a bridge table, and when is it used in dimensional modeling?
88. What is a degenerate dimension? Give a real-world example.
89. How does dimensional modeling differ between OLAP and OLTP systems?

15. Performance Optimization — Distributed Systems

Skew, Joins, Predicate Pushdown, Incremental Loading

90. Describe techniques for optimizing distributed joins in a big data environment.
91. How would you diagnose and resolve data skew issues in a distributed processing system?
92. Explain the concept of predicate pushdown and how it improves query performance.
93. Describe the process of optimizing a Slowly Changing Dimension (SCD) Type 2 for both write and read
performance.
94. How would you implement and optimize incremental data processing in a large-scale data pipeline?
95. What strategies help with optimizing PySpark jobs for performance (e.g., caching, partitioning, broadcast
joins)?
96. When should you use coalesce() vs repartition() in PySpark?
97. What are the common causes of a PySpark job running slowly, and how do you fix them?
98. Explain the concept of Z-ordering in Delta Lake and how it improves query performance.

16. Project & Behavioral Questions

Infosys L2, Scenario-Based, End-to-End Azure Project

General Project Questions (Infosys L2 Style)

99. Tell me about your project. What is the business problem it solves?
100. What is your source system and how many types of source systems do you work with?
101. How many jobs do you have in your project?
102. How many ETL pipelines do you have in your project?
103. How do you schedule your pipeline? What scheduler do you use?
104. Have you faced any difficulty in your project? How did you overcome it?
105. If you have one query with CTE and one with subquery, which do you prefer and what are the
differences?
106. Which do you prefer — DataFrame or RDD? Why?
107. Find the second highest salary of each department with the employee name. (Columns: id, name, dept,
salary)

Scenario-Based: End-to-End Azure Data Engineering Project

108. Can you briefly explain the steps involved in an Azure Data Engineer project from data ingestion to
visualization?
109. How do you handle data ingestion in Azure? (ADF for batch, Event Hubs for streaming)
110. What are the key storage options available in Azure for a data engineering project? (ADLS, Blob,
Synapse, SQL DB)
111. Can you describe the data processing step in your project? (Azure Databricks, Synapse Analytics, ADF
Data Flows)
112. How do you handle data warehousing in Azure? (Synapse Analytics, fact and dimension tables)
113. How do you ensure data governance and security in your project? (AAD, Azure Key Vault, Azure Monitor)
114. Can you give an example of a challenge you faced in one of your Azure projects and how you overcame
it?
115. Which IR should be used for copying data from an on-premises database to Azure?
116. Describe an Azure pipeline design you consider best practice — what metrics and design considerations
matter?
17. System Design & Load Balancing (Bonus)
Common algorithms and patterns for distributed systems

117. What is load balancing, and why is it important in distributed systems?

118. Round Robin: Assigns requests to servers in sequential order. What are its advantages and limitations?
119. Least Connections: Directs traffic to the server with the fewest active connections. When is it better than
Round Robin?
120. Weighted Round Robin: Assigns weights based on server capacity and distributes requests
proportionally. How does it work?
121. Weighted Least Connections: Combines Least Connections and Weighted Round Robin — directs to the
server with the lowest active-connections-to-weight ratio.
122. IP Hash: Determines the server based on source/destination IP, ensuring session persistence for a user.
When is this critical?
123. Least Response Time: Routes requests to the server with the lowest response time and fewest active
connections.
124. Random: Routes incoming requests to a randomly selected server from the available pool.
125. How do load balancers differ between Layer 4 (transport) and Layer 7 (application)?
126. What is sticky session (session affinity), and when is it needed?

Data Engineering Interview Questions Guide
No ratings yet
Data Engineering Interview Questions Guide
11 pages
Azure Synapse with Cosmos DB Queries
100% (1)
Azure Synapse with Cosmos DB Queries
57 pages
Essential SQL Interview Questions List
No ratings yet
Essential SQL Interview Questions List
5 pages
SQL Questions VB
No ratings yet
SQL Questions VB
5 pages
SQL Questions VB
No ratings yet
SQL Questions VB
5 pages
Top 50 SQL Interview Questions & Answers
No ratings yet
Top 50 SQL Interview Questions & Answers
16 pages
Top 50 SQL Interview Questions for Data Engineers
No ratings yet
Top 50 SQL Interview Questions for Data Engineers
8 pages
Expert SQL Interview Questions & Answers
No ratings yet
Expert SQL Interview Questions & Answers
10 pages
Spark and Hadoop Architecture Insights
No ratings yet
Spark and Hadoop Architecture Insights
5 pages
Power BI Interview Questions ?SQL
No ratings yet
Power BI Interview Questions ?SQL
22 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
18 pages
Essential SQL Interview Questions
No ratings yet
Essential SQL Interview Questions
2 pages
SQL Complete Interview Questions
No ratings yet
SQL Complete Interview Questions
5 pages
Data Engineering and SQL Best Practices
No ratings yet
Data Engineering and SQL Best Practices
31 pages
100+ Azure Data Engineer Questions You Need To Prepare
No ratings yet
100+ Azure Data Engineer Questions You Need To Prepare
7 pages
Essential SQL Interview Questions Guide
No ratings yet
Essential SQL Interview Questions Guide
3 pages
Data Science Interview Prep Guide
60% (5)
Data Science Interview Prep Guide
63 pages
Top 50 SQL Interview Questions
No ratings yet
Top 50 SQL Interview Questions
7 pages
SQL Roadmap For Data Analysts
No ratings yet
SQL Roadmap For Data Analysts
4 pages
SQL Azure Interview Questions Guide
No ratings yet
SQL Azure Interview Questions Guide
8 pages
SQL Query and DBMS Concepts Guide
No ratings yet
SQL Query and DBMS Concepts Guide
5 pages
100+ SQL Interview Questions and Answers (2025)
No ratings yet
100+ SQL Interview Questions and Answers (2025)
52 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
3 pages
Barclays Data Analyst Interview Prep
No ratings yet
Barclays Data Analyst Interview Prep
5 pages
SQL Interview Prep for Data Analysts
No ratings yet
SQL Interview Prep for Data Analysts
34 pages
Essential SQL Concepts and Techniques
No ratings yet
Essential SQL Concepts and Techniques
9 pages
SQL Interview Questions for Data Analysts
No ratings yet
SQL Interview Questions for Data Analysts
29 pages
Top SQL Server Interview Questions
No ratings yet
Top SQL Server Interview Questions
8 pages
GenC Next SQL Practice Questions
No ratings yet
GenC Next SQL Practice Questions
6 pages
PySpark Basics and Interview Questions
No ratings yet
PySpark Basics and Interview Questions
6 pages
Interview Questions SQL - All Question Combined
No ratings yet
Interview Questions SQL - All Question Combined
27 pages
Essential SQL Interview Questions Guide
No ratings yet
Essential SQL Interview Questions Guide
3 pages
SQL, ADF, Databricks Interview Q&A
No ratings yet
SQL, ADF, Databricks Interview Q&A
7 pages
50 SQL Interview Questions
No ratings yet
50 SQL Interview Questions
5 pages
SQL Server Interview Questions Set 1
No ratings yet
SQL Server Interview Questions Set 1
3 pages
SQL and Spark Data Handling Techniques
No ratings yet
SQL and Spark Data Handling Techniques
3 pages
SQL Database Concepts and Optimization Techniques
No ratings yet
SQL Database Concepts and Optimization Techniques
2 pages
Top 100 SQL Interview Questions & Answers
No ratings yet
Top 100 SQL Interview Questions & Answers
10 pages
SQL Interview Questions and Answers Guide
No ratings yet
SQL Interview Questions and Answers Guide
8 pages
SQL Interview Questions for 10 Years Experience
No ratings yet
SQL Interview Questions for 10 Years Experience
6 pages
? 475+ SQL Interview Questions For All Jobs - Complete
No ratings yet
? 475+ SQL Interview Questions For All Jobs - Complete
48 pages
Azure Data Engineering Interview Questions
No ratings yet
Azure Data Engineering Interview Questions
12 pages
SQL Interview Questions & Answers Guide
No ratings yet
SQL Interview Questions & Answers Guide
2 pages
SQL Interview Q A Short Clear Job Ready Notes 1768564643
No ratings yet
SQL Interview Q A Short Clear Job Ready Notes 1768564643
8 pages
SQL Interview Questions for Data Consultants
No ratings yet
SQL Interview Questions for Data Consultants
3 pages
Database Interview Questions Guide
No ratings yet
Database Interview Questions Guide
86 pages
Essential DSA and SQL Topics Guide
No ratings yet
Essential DSA and SQL Topics Guide
4 pages
SQL Interview Questions and Concepts
No ratings yet
SQL Interview Questions and Concepts
3 pages
Software Engineer - INTERVIEW PREPARATION SHEET
No ratings yet
Software Engineer - INTERVIEW PREPARATION SHEET
39 pages
SQL Interview Guide: Key Concepts
No ratings yet
SQL Interview Guide: Key Concepts
9 pages
SQL DBA Interview Questions 3
No ratings yet
SQL DBA Interview Questions 3
9 pages
Advanced SQL Interview Questions for FAANG
No ratings yet
Advanced SQL Interview Questions for FAANG
3 pages
SQL Server Interview Questions Developers PDF
No ratings yet
SQL Server Interview Questions Developers PDF
142 pages
50 SQL Interview Questions & Answers
No ratings yet
50 SQL Interview Questions & Answers
8 pages
SQL Interview Questions and Answers
No ratings yet
SQL Interview Questions and Answers
8 pages
SQL Interview Question
No ratings yet
SQL Interview Question
63 pages
Essential Database Concepts Explained
No ratings yet
Essential Database Concepts Explained
20 pages
Databricks Unity Catalog Overview
No ratings yet
Databricks Unity Catalog Overview
8 pages
Spark Interview Questions: Driver & Data Skew
No ratings yet
Spark Interview Questions: Driver & Data Skew
34 pages
SQL & PySpark Interview Questions
No ratings yet
SQL & PySpark Interview Questions
57 pages
Big Data Concepts: 5V's & System Types
No ratings yet
Big Data Concepts: 5V's & System Types
27 pages
Spark Concepts and Interview Questions Guide
No ratings yet
Spark Concepts and Interview Questions Guide
21 pages
Optimizing Databricks Cluster for 1 TB Data
No ratings yet
Optimizing Databricks Cluster for 1 TB Data
4 pages
SQL Mastery: 110 Practical Questions
No ratings yet
SQL Mastery: 110 Practical Questions
59 pages
Python Programming Basics and Concepts
No ratings yet
Python Programming Basics and Concepts
21 pages
Database Design and ERD Examples
No ratings yet
Database Design and ERD Examples
7 pages
NGINX Plus: Load Balancing & Security Solutions
No ratings yet
NGINX Plus: Load Balancing & Security Solutions
2 pages
SQL Datatypes and Table Management Guide
No ratings yet
SQL Datatypes and Table Management Guide
16 pages
OJT Experience at Bauan Health Office
No ratings yet
OJT Experience at Bauan Health Office
8 pages
Oracle Database Operations in Visual Basic
No ratings yet
Oracle Database Operations in Visual Basic
13 pages
Excel for Financial Analysis Guide
No ratings yet
Excel for Financial Analysis Guide
16 pages
Informatica Interview Q&A Guide
No ratings yet
Informatica Interview Q&A Guide
4 pages
Access 2019 Query Criteria Guide
No ratings yet
Access 2019 Query Criteria Guide
15 pages
SSIS for Data Warehouse Creation
No ratings yet
SSIS for Data Warehouse Creation
62 pages
Business Intelligence Benefits and Challenges
No ratings yet
Business Intelligence Benefits and Challenges
8 pages
Importing CSV from BVS to V2PS Guide
No ratings yet
Importing CSV from BVS to V2PS Guide
3 pages
Epicor ERP10 Advanced Dashboards Management
100% (1)
Epicor ERP10 Advanced Dashboards Management
64 pages
MTK EMMC Layout Configuration Guide
No ratings yet
MTK EMMC Layout Configuration Guide
29 pages
Essential Backup Workflow Guide
No ratings yet
Essential Backup Workflow Guide
15 pages
Question Bank Introduction To Ms Office Packages
No ratings yet
Question Bank Introduction To Ms Office Packages
4 pages
Understanding Relational Data Models
No ratings yet
Understanding Relational Data Models
10 pages
Avnet Database Interview Insights
No ratings yet
Avnet Database Interview Insights
42 pages
Introduction to SQL Basics
No ratings yet
Introduction to SQL Basics
53 pages
OutSystems Database Entities Overview
No ratings yet
OutSystems Database Entities Overview
15 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
4 pages
Foundations of Data Science Guide
No ratings yet
Foundations of Data Science Guide
17 pages
SQL Server Backup Procedure Overview
No ratings yet
SQL Server Backup Procedure Overview
14 pages
Grokking System Design Interview Guide
0% (1)
Grokking System Design Interview Guide
25 pages
Distributed Database Design Principles
No ratings yet
Distributed Database Design Principles
61 pages
Trend Following with Moving Averages Guide
No ratings yet
Trend Following with Moving Averages Guide
3 pages
Data Structures Lab Manual for C/Java
No ratings yet
Data Structures Lab Manual for C/Java
75 pages
Object-Oriented Database Concepts Explained
No ratings yet
Object-Oriented Database Concepts Explained
34 pages
Database Performance Tuning Essentials
No ratings yet
Database Performance Tuning Essentials
50 pages
Understanding GDSS: Features & Benefits
No ratings yet
Understanding GDSS: Features & Benefits
16 pages
Using Microsoft File Server Migration Toolkit
No ratings yet
Using Microsoft File Server Migration Toolkit
2 pages

Dataengineer IFAQ

Uploaded by

Dataengineer IFAQ

Uploaded by

Data Engineering

Interview Questions — Categorized Reference

1. SQL — Basic Concepts ·······················································

1. What is SQL, and why is it important in data analytics?

1. Write a query to find the second-highest salary from an employee table.

1. How do you optimize a slow-running SQL query?

4. SQL — Performance Tips

5. Azure Data Factory — Basic to Intermediate

1. What are the main components of Azure Data Factory?

6. Azure Data Factory — Advanced & Scenario-Based

1. How do you schedule when pipelines run in ADF?

1. What are the types of tables in Databricks?

1. Steps to tune your long-running Spark jobs.

9. ADLS Gen2 & Azure Storage

10. Azure Synapse Analytics

1. What is Synapse Analytics?

12. Azure Data Engineering — Key Topics

13. AWS Cloud (Bonus)

14. Dimensional Modeling

15. Performance Optimization — Distributed Systems

16. Project & Behavioral Questions

General Project Questions (Infosys L2 Style)

Scenario-Based: End-to-End Azure Data Engineering Project

117. What is load balancing, and why is it important in distributed systems?

You might also like