0% found this document useful (0 votes)
9 views4 pages

ADF Trigger Types: Schedule vs Tumbling

The document contains a series of questions and outputs related to data processing and management using Azure Data Factory (ADF) and Azure Databricks (ADB). It covers topics such as data volume handling, pipeline creation, data cleaning, and performance optimization techniques. Additionally, it discusses various data structures like star and snowflake schemas, as well as technical challenges faced during projects.

Uploaded by

Deepak Murthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

ADF Trigger Types: Schedule vs Tumbling

The document contains a series of questions and outputs related to data processing and management using Azure Data Factory (ADF) and Azure Databricks (ADB). It covers topics such as data volume handling, pipeline creation, data cleaning, and performance optimization techniques. Additionally, it discusses various data structures like star and snowflake schemas, as well as technical challenges faced during projects.

Uploaded by

Deepak Murthy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Q1:Table 1 And table 2

and Ouptut (edited)


[12:38 PM] Table1 :
Category Min Vol Max Vol
1 100 109
2 110 119
3 120 129
4 130 139
5 140 149
6 150 159
7 160 169
8 170 179
9 180 189
10 190 200
10 (edited)
[12:42 PM] Table 2 :
Asset ID Asset Name Volume
1 Breant 188
2 Hazira 168
3 Gannet 199
4 Nelson 178
5 E11 163
6 Prelude 181
Output:
Assetname Category Volume
Q2: How we can remove Special Charaters from column using pysaprk Dynamically for
100 columns
Q3:
N P
1 2
3 2
6 8
9 8
2 5
8 5
5 null
Output:
1 leaf
2 Inner
3 Leaf
5 Root
6 Leaf
8 Inner
9 Leaf
trigger types in adf?
how we can run pipeline from one datafactory to another adf
clusters types and diff
work flow
sort merge jon and broadcast join diff
multi tables load in adf? (edited)

6:15
go thru this too
6:15
About Project?
What is IR? and IR types?
What are complexities you have faced in the Project?
what are different types of sources you have connected in project?
have you created parameterized linked services? and how do you parameterization for
linked services?
what is the most complexity you have faced while creating pipeline?
and how do you resolve it?
how do you find the load failed, and if the pipeline loaded partially?
Assume 100 tables are there, when you run the pipeline 10 tables are partially
loaded and after that it got stucked, how do you rerun the activity, cost problem
for you and how does common data model work around this?
Assume the remaining tables some 40 are incremental load and 50 are full load , it
got failed how do you fix this?
how can you enabled the audit log Pipelines?
what common data model is? how does it for common purpose?
in Synapse what structure you follow star schema or snowflake?
why you people are load the data in synapse?
how do you the check the source and sink data?
How can you find the duplicates ? and when you will find the duplicates before
doing transformation? before moving to sink?
Avro vs parquet, pros and cons of avro and parquet?
where you can place the configuration file in source or sink?
in synapse how do you map the data?
how much data you will get in synapse? what is the purpose of loading from source
to synapse?
what is dwu? how much dwu your synapse you have?
Have you faced any perfomance issues in data bricks?

Deepak V
6:16 PM
okay

SUNITHA R.M
12:18 PM
What are some issues you can face with Azure Databricks? How did you resolve it?
2) Did you implement delta lake on top of data lake? Explain the process?
3) Difference between Coalesce and repartition.
4) What was the volume of data that you handled in your previous project?
5) Explain the steps to mount the datalake into databricks environment?
6) What was the size of the compute cluster you created or you are using.
7) Gave two data frames and asked what the output would be if we join these
two, How it will work in case of broadcast join
8) Data frame manipulation questions.
9) Explain in detail about Spark architecture.
10) Explain coalesce operation in Apache Spark.
11) How would you read a csv file into a data frame? Can you write the syntax
for that?
12) What if two people are working on same delta lake object and they do
update insert and delete. What will happen.
13) Explain the latest pipeline created. What are the activities used in ADF
14) What is Start and Snowflake schema.
15) Can we implement star schema in Azure. If yes where we can create the star
structure
16) Data frame manipulation questions.
17) What are the ways to connect to ADLS and ADB
18) What access need to give to Connect ADLS and ADB
19) What is performance optimization technique will you adapt while loading
the High volume of data for a particular table in to ADLS
20) How to read a folder inside a folder using ADF Pipeline.
21) Which New activity has been added recently in the ADF.
22) How to use a for each loop inside a for each loop (edi

Deepak V
2:27 PM
tq

SUNITHA R.M
2:31 PM
how many years of exp? what all technologies I have used in ETL experience?
how many years of ADF and ADB experience?
very recent project E2E explanation.
-----------------------------------------------------------------------------------
----------------------
ADF: 15mins
2 different 2 regions, Europe and America, oracle is the database used. different
for each region,
how do you create a pipeline and how many pipeline??
target is adls gen 2..
how many datasets for source. how many linked services for source?
how many datasets for target. how many linked services for target?
-----------------------------------------------------------------------------------
----------------------
job scheduled every 5 mins
one day job will take long run due to technical issue.
other triggers scheduled will be inline queue.. how do you handle this situation?
-----------------------------------------------------------------------------------
----------------------
difference between scheduled and tumbling window trigger.
-----------------------------------------------------------------------------------
----------------------
how do you make connections for web sources.. HTTP connection.. how do you pass
credentials for linked services.
-----------------------------------------------------------------------------------
----------------------
ADB: 15 - 20 mins
how do you clean below data using databricks. handling blank and null
name, email, phoneNo, country
xyz, blank, 1234, null
null, abs@[Link], blank, blank
abc, null, null, india
-----------------------------------------------------------------------------------
----------------------
how notebooks talk to each other. Passing one notebook output to other.
-----------------------------------------------------------------------------------
----------------------
how do you mount your adls gen 2 on databricks?
what is scope is databricks? how do you get scope from key vault?
what are mount points?
what are application_id, service_id and scope, key? how are the generated.. how
mounting works?
-----------------------------------------------------------------------------------
----------------------
what are the clusters you have used and what precautions to be taken care while
creating a cluster?
difference between job and interactive cluster.
best practice for cluster creations.
-----------------------------------------------------------------------------------
----------------------
SQL: 15mins
what all you know in SQL and worked till now, create tables, aggregations, inbuilt
functions, group by, having , order by
-----------------------------------------------------------------------------------
----------------------
SQL - scenario:
location distance(kms)
banglore 100
mumbai 150
delhi 200
How do you find the least distance taken to travel to banglore. find by writing a
SQL query.
-----------------------------------------------------------------------------------
----------------------

You might also like