0% found this document useful (0 votes)
21 views8 pages

ETL Data Validation and Migration Process

The document outlines the ETL process, including data merging, validation, and migration activities performed by ETL developers and testers. It details various validation methods such as data quality checks, orphan data validation, and transformation level validation, alongside the tools used like Informatica and SSIS. Additionally, it describes handling slowly changing dimensions (SCD) and the importance of maintaining referential integrity between fact and dimension tables.

Uploaded by

duraibalaji1005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

ETL Data Validation and Migration Process

The document outlines the ETL process, including data merging, validation, and migration activities performed by ETL developers and testers. It details various validation methods such as data quality checks, orphan data validation, and transformation level validation, alongside the tools used like Informatica and SSIS. Additionally, it describes handling slowly changing dimensions (SCD) and the importance of maintaining referential integrity between fact and dimension tables.

Uploaded by

duraibalaji1005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd

OLTP Sources Source Stage Area ETL

( Data Merging) Data migration activity done by


Data validations has ETL developers
to be done by ETL In source Stage
FlatFiles testers data merging /
CSV Cleansing will be
Fixed Length Files completed. Data
Data Merging , All cleansing means
File data , avoiding
Database data has unwanted data ETL TOOLS ( informatica / ssis/
to be segregated from the source Talend / ODI)
ORACLE into a one oracle
ORACLE and keep only the
database, Data valid data
merging
perspective we
MySql migrated data into
Source stage Layer
Required Validations Required Validatations Between Source
Data Structure validati
if source as files then, all rows from the files coming to the
Row count validatio
source stage or not. Here We will take the number of rows
Data Quality validations Such as ( duplicate data valid
in the file ( using wc -l unix command) and number of rows
data validations )
in the source stage table ( using select count(*) Sql ) and
Entire data reconcilla
compare the counts.
Transformation Based Va

For Entire data validations, We will have source file (A) and
it loaded into the table, from the table we will export the Data Structure validations, Data Quality validations w
same data from the table and create as file ( B) after we will use those test cases from one jira ato anothe
compare both files (A) and (B) by using of DIFF or CMP
command in unix and use the windiff utility to compare the Transformation level validation will be differ fro
files, if it is a windows environment

For Entire Data Reconci


We can use minus operators to com
Here file to table data loading happening by using of ssis
tool ( any of ETL tools by developer)
Full outer join is also can be used for en
since full outer join will be faster tha

ORPHAN DATA VALIDATION


if the fact table contains the data and respective table is not available in the dimenstion table then, thaty is called t
the referential integrity. To ensure the integrity between those tables we have to perform the orphan data validatio

TRANSFORMATION LEVEL VALIDATION


We will understand the transformation logic from the mapping sheet,
and we will apply the transfomraion logic in the source sql query and we will do the minus operation
in the source sql query with the target table select query
We will understand the transformation logic from the mapping sheet,
and we will apply the transfomraion logic in the source sql query and we will do the minus operation
in the source sql query with the target table select query

FILE VALIDATION
we will use the external table to view the file in the table format in oracle database, for this we will keep the delimi
external table in the source stage layer
external tables will show the data from the file , so we can select the data from the external table and we will do th
sql query.
ETL Target Stage Target
ation activity done by Data Validations Done
TL developers by ETL testers Production
data
Transformation level validation
Transformed validaion done in the done by
data will be target stage itself. If ETL tester
migrated from testinf team given sign off sanity test
S ( informatica / ssis/ source stage then, data will move from
to target stage ORACLE target stage to target
ODI)
environemt
ORACLE

atations Between Source stage and Target stage Req Validations targetstage to target
Just a sanity check like Rowcount/
Data Structure validations and sample data validations for each
Row count validations and every migrated table from target
h as ( duplicate data validations . Null data Validations, Orphan stage to target environments
data validations )
Entire data reconcillation
Transformation Based Validations

ata Quality validations will be same for all projects we can re


from one jira ato another

alidation will be differ from each and every requirement

For Entire Data Reconcillation


e minus operators to compare 2 resultsets.

s also can be used for entire data reconcillation


uter join will be faster than minus operation

ble then, thaty is called the orphan data. Primary key - foreign key relationship is called
the orphan data validation in target stage environment

operation
operation

s we will keep the delimited file in the data base directory and we will create the

al table and we will do the normal minus operations and others validations through the
ID Trade_id Branch_code From_custoemr to_customer Amount transaction_date Version
1 1000 100 CUST001 CUST002 10000 6/15/2021 1
2 1001 100 CUST001 CUST002 20000 6/16/2021 1

3 1000 100 CUST001 CUST002 15000 6/15/2021 1

ID Trade_id Branch_code From_custoemr to_customer Amount transaction_date Version


1 1000 100 CUST001 CUST002 10000 6/15/2021 0
2 1001 100 CUST001 CUST002 20000 6/16/2021 1
3 1000 100 CUST001 CUST002 15000 6/15/2021 1

For the same tradeID latest record will be selected from the same table based on Max(start_date) column value
In that selected Row version will be updated as 0 and End date will be updated with sysdate
New record version will be as 1 and New record start date will be sysdate and new record end date will be Null
All older records start date and end date will be not null only
All New records will have start date as Not null and End date as null value
Start_date End_date
19-Jun-21 Null
19-Jun-21 Null

19-Jun-21 Null

Start_date End_date
19-Jun-21 20-Jun-21
19-Jun-21 Null
20-Jun-21 Null

Max(start_date) column value

w record end date will be Null


SCD-0 from To Aomunt total_amount date
gobi raja 500 500 1-Jul-21

scd-1 from To Aomunt total_amount date


gobi raja 300 1000 1-Jul-21

scd-2 from To Aomunt total_amount date version startDate End_date


gobi raja 500 500 1-Jul-21 0 1-Jul-21 4-Jul-21
gobi raja 200 700 5-Jul-21 0 5-Jul-21 6-Jul-21
gobi raja 300 1000 7-Jul-21 1 7-Jul-21 Null

scd-3 From to amount previous amount total amount previous total amount
gobi raja 500 null 500 null
gobi raja 200 500 700 500
gobi raja 300 200 1000 700

scd-4
from To Aomunt total_amount date version startDate End_date
gobi raja 500 500 1-Jul-21 0 1-Jul-21 4-Jul-21
gobi raja 200 700 5-Jul-21 0 5-Jul-21 6-Jul-21
gobi raja 300 1000 7-Jul-21 1 7-Jul-21 null

gobi raja 300 1000 7-Jul-21 1 7-Jul-21 null

scd-6
from To Aomunt total_amount previous amount version startDate End_date
gobi raja 500 500 null 0 1-Jul-21 4-Jul-21
gobi raja 200 700 500 0 5-Jul-21 6-Jul-21
gobi raja 300 1000 200 1 7-Jul-21 null
gobi raja 200 700 5-Jul-21 0 5-Jul-21 6-Jul-21
gobi raja 300 1000 7-Jul-21 1 7-Jul-21 null

You might also like