0% found this document useful (0 votes)

20 views11 pages

Informatica Super Store Data Analysis

Informatica Powercenter solution

Uploaded by

kuldeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views11 pages

Informatica Super Store Data Analysis

Informatica Powercenter solution

Uploaded by

kuldeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Informatica Hands-On Challenge: Super_Store Analysis

• Introduction: You are provided with a sample dataset from a retail store,
Super_Store. This dataset contains information about orders, customers, products,
and sales. Your task involves cleaning the data, analyzing sales, customer orders,
customer geography, and order processing time using Informatica PowerCenter.

Data Preparation:

• Oracle SQL Setup:

• Log in to Oracle SQL Developer in Admin connection using the credentials:

• Username: system

• Password: Admin

Create a table named Super_Store with the provided structure

Row_ID INT

Order_Date DATE

Ship_Date DATE

Ship_Mode VARCHAR(50)

Customer_ID VARCHAR(50)

Customer_Name VARCHAR(50)

Segment VARCHAR(50)

Country VARCHAR(50)

City VARCHAR(50)

State VARCHAR(50)

Postal_Code VARCHAR(50)

Region VARCHAR(50)
Product_ID VARCHAR(250)

Category VARCHAR(250)

Sub_Category VARCHAR(250)

Product_Name VARCHAR(250)

Sales INT

• NOTE : while loading data into table update order_date, ship_date Date Format to
DD/MM/YYYY

Load superstore_data.csv into the Super_Store table.

You are given data set is in the "~\Desktop\Project\miniproject-informaticasuper_store

Informatica Repository Setup:

Connect to the Informatica repository manager using the following credentials:

• Username: Administrator

• Password: Administrator

Create a folder named Super_Store in the repository manager.

How to Import Source Table in Source Analyzer

Following are the steps to import source table in Informatica Source Analyzer:

Step 1) Go to “Sources” option In

source analyzer
1. Click on tab “Sources” from the main menu
2. Select import from database option, after this ODBC Connection box will open.
Step 2) Create ODBC connection

• We will now create ODBC connection

1. Click on the button next to ODBC data Source(...).
2. On the next page, Select user DSN tab and click Add button.
3. Select oracle wire protocol
4. On the next page, select the general tab and enter the database
details. Then click connect.
• Data Source name : oracle
• Host : localhost
• port : 1521
• sid : xe

Create Connections for Workflow Manager

To Create a Relational Connection

Step 1: In Workflow Manager

• Click on the Connection menu
• Select Relational Option
Step 2: In the pop up window
• Select Oracle in type
• Click on the new button
Step 3: In the new window of connection object definition
• Enter Connection Name (oracle)
• Enter username - system
• Enter password – Admin
• Enter connection string - xe
• Leave other settings as default and Select OK button Note : For more credentials,
like for designer, knidly check in the Readme File.

Note : Please Follow the naming conventions in the problem statement

Data Cleaning:

• Mapping Name: Map_Cleaned_Data

• Workflow Name: Workflow_Cleaned_Data

• Session Name: Session_Cleaned_Data

• Target Table: Super_Store_Cleaned_Data

Operations:

• Remove duplicates from the dataset to ensure data integrity.

• Filter records where Country is 'United States' to focus on domestic orders.

• Extract numeric part from Customer_ID to standardize customer identification.(EX:

CH-1234, extract 1234)

• Concatenate Customer_ID and Customer_Name with '-' to create a unique

identifier for each customer.(Ex: 1234-Charlies, Extracted_ID-Customer_name) and
store it in Customer_Id_Name Column

• Drop the customer_id, Customer_name Columns

• After cleaning Load data into the Super_Store_Cleaned_Data target table (For
columns check sample output)

• Sample Output : Reamining columns and additional with 'CUSTOMER_ID_NAME'

COLUMN.

CUSTOMER_ID_NAME

21925-Zushuss Donatelli

16585-Ken Black

21520-Tracy Blumstein

NOTE : Super_Store_Cleaned_Data table data is used for the below every tasks.
Analysis Tasks:

Task 1: Sales Summary

• Mapping Name: Map_Sales_Summary

• Workflow Name: Workflow_Sales_Summary

• Session Name: Session_Sales_Summary

• Target Table: Sales_Summary

Problem Statement: Summarize total sales and average sales for each customer. Identify
customers with significant contribution to overall sales.

Operations:

• Filter the records region in East and state in New York to focus on a specific
customer base.

• Convert the sales amount from USD to INR (conversion rate= 84) and store it in
AMOUNT column.

• Calculate the interest rate for amount values greater than 5000 (interest rate is
10%) & store in new column INTREST. Sum up the amount and interest to SALES
column.

• Calculate the total sales and average sales for each customer. Filter customers
with total sales greater than 5000 and average sales greater than 500 to focus on
significant contributors.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Sales_Summary target table (For columns check sample
output)

• After completing of mapping, in the workflow manager.

Sample Output:

CUSTOMER_ID_ TOTAL_SALES AVG_SALES

NAME

10060-Adam 409458 81892

Bellavance

17470-Mark 154535 25756

Packer

14815-Harold 121136 60568

Pawlan

Task 2: Customer Order Analysis

• Mapping Name: Map_Order_Analysis

• Workflow Name: Workflow_Order_Analysis

• Session Name: Session_Order_Analysis

• Target Table: Order_Analysis

Problem Statement: Analyze customer orders to determine the most frequent buyers
and their order patterns.

Operations:

• Filter records for customers in category 'Furniture' and City in 'New York City' to
analyze local customer behavior.

• Create new column orders_count, Calculate the count of orders for each customer
to determine their order frequency.

• Categorize orders based on the number of orders. Orders are less than 10 then
'Low', orders are between 10-20 then 'Medium', orders are greater than 20 then
'High'.
• Sort the results by order count in descending order to identify the most frequent
buyers and get only top 8 records.

• Generate a unique number for each row into column Sno. values should start from
11.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Order_Analysis target table (For columns check sample output)

Sample output:
SNO CUSTOMER_ID_NAME
ORDERS_COUNT ORDERS_CATEGORY

11 18355-Nat Gilpin 3 Low

12 12805-Cynthia Voltz 2 Low

13 16435-Katrina Willman 2 Low

14 17470-Mark Packer 2 Low

15 10225-Alan Schoenberger 1 Low

Task 3: Customer Geography Analysis

• Mapping Name: Map_Geography_Analysis

• Workflow Name: Workflow_Geography_Analysis

• Session Name: Session_Geography_Analysis

• Target Table: Geography_Analysis

Problem Statement: Analyze customer distribution across different regions to identify
potential market segments.

Operations:

• Filter records for customers in Segment 'Consumer'.

• Combine column customer_id_name and region using ' _' and place them in new
column customer_region

• Store the output in respective tables based on the region.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the EAST_CUSTOMER_BASE, WEST_CUSTOMER_BASE,

SOUTH_CUSTOMER_BASE target table (For columns check sample output)

Sample output: EAST_CUSTOMER_BASE

PINCODE STATE
CUSTOMER_REGION CATEGORY

19960-Ryan Crowe_East 43229 Ohio Office Supplies

20725-Steven 19805 Delaware Office Supplies

Cartwright_East
Sample output: WEST_CUSTOMER_BASE
PINCODE STATE
CUSTOMER_REGION CATEGORY

16885-Lena 95661 California Office Supplies

Creighton_West

12130-Chad Sievert_West 90004 California Office Supplies

11710-Brosina 90032 California Furniture

Hoffman_West

Sample output: SOUTH_CUSTOMER_BASE

PINCODE STATE
CUSTOMER_REGION CATEGORY

16270-Karen 22153 Virginia Office Supplies

Daniels_South

18385-Natalie 39212 Mississippi Furniture

Fritzler_South

19780-Rose 38109 Tennessee Furniture

OBrian_South

Task 4: Order Processing Time Analysis

• Mapping Name: Map_Order_Processing

• Workflow Name: Workflow_Order_Processing

• Session Name: Session_Order_Processing

• Target Table: Order_Processing

Problem Statement: Evaluate order processing efficiency by analyzing the time taken
between order placement and shipment,

Operations:

• Calculate the repeated orders for each product subcategory and store them in
ORDERS_COUNT column.

• Categorize the repeat orders (e.g., less than 10 orders Low Sales, between 10-30
Average Sales, more than 30 orders Best sales).

• Count the number of orders falling with in each category to analyze product sales.
Load the data into REPEAT_ORDERS table.

• Calculate the processing days for each order by finding the difference between
order date and ship date and store it in new column Processing_days.

• Categorize processing days (e.g., Less than 1 day then One-Day Delivery, 1 to 2
days then Two-Day Delivery, 3 or more days then Standard Delivery).

• Count the number of orders falling with in each categorize processing days for
each to analyze processing days distributions. Load the data into Order_Processing
table.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the Order_Processing, REPEAT_ORDER target tables (For columns
check sample output)

Sample Output:
CATEGORISE_PROCESSING_ ORDERS_COUN
DAYS T
One-Day Delivery 17

Standard Delivery 765

Two-Day Delivery 208

SALES_CATEGORY PRODUCT_SUB_CATEGORY_COUNT

Average Sales 4

Best Sales 4

Low Sales 9

Informatica HandsOn Solution
No ratings yet
Informatica HandsOn Solution
10 pages
Informatica Super Store Data Analysis
No ratings yet
Informatica Super Store Data Analysis
14 pages
Super_Store Analysis with Informatica
No ratings yet
Super_Store Analysis with Informatica
13 pages
Superstore
No ratings yet
Superstore
8 pages
Informatica Discount Analysis Overview
No ratings yet
Informatica Discount Analysis Overview
6 pages
Visualizing Retail Sales Data Insights
No ratings yet
Visualizing Retail Sales Data Insights
5 pages
DWH Study Garage Education
No ratings yet
DWH Study Garage Education
104 pages
Data Warehouse Implementation Guide
No ratings yet
Data Warehouse Implementation Guide
31 pages
TASK 5: Input, Processing, Output (IPO) Table and Data Validation
No ratings yet
TASK 5: Input, Processing, Output (IPO) Table and Data Validation
3 pages
WideWorldImporters Database Setup Guide
No ratings yet
WideWorldImporters Database Setup Guide
4 pages
Sales Data Analysis with Python & SQL
No ratings yet
Sales Data Analysis with Python & SQL
15 pages
Customer Spending Analysis with SQL & Pandas
No ratings yet
Customer Spending Analysis with SQL & Pandas
5 pages
Notes Power Bi Internship
No ratings yet
Notes Power Bi Internship
9 pages
OLAP Sales Data Analysis Techniques
No ratings yet
OLAP Sales Data Analysis Techniques
44 pages
Build a Basic Data Warehouse in SQL
No ratings yet
Build a Basic Data Warehouse in SQL
4 pages
SQL Assignment: Data Analysis Tasks
100% (1)
SQL Assignment: Data Analysis Tasks
3 pages
OLAP vs. Vertipaq Modeling Insights
No ratings yet
OLAP vs. Vertipaq Modeling Insights
10 pages
Introduction To DataBases in Microsoft Access
No ratings yet
Introduction To DataBases in Microsoft Access
20 pages
Data Visualization Techniques in Tableau
No ratings yet
Data Visualization Techniques in Tableau
52 pages
BAIT3003 Data Warehouse Design Overview
No ratings yet
BAIT3003 Data Warehouse Design Overview
56 pages
Data Transformation and Analysis Guide
No ratings yet
Data Transformation and Analysis Guide
4 pages
Data Warehouse vs Data Mart Explained
No ratings yet
Data Warehouse vs Data Mart Explained
5 pages
Global Superstore Data Analysis Guide
No ratings yet
Global Superstore Data Analysis Guide
13 pages
Data Warehousing & Mining Exam Summary
No ratings yet
Data Warehousing & Mining Exam Summary
8 pages
Ass 13 Dbms
No ratings yet
Ass 13 Dbms
7 pages
Excel Data Analysis Lab Guide
No ratings yet
Excel Data Analysis Lab Guide
29 pages
ETL Process Steps with Informatica
100% (2)
ETL Process Steps with Informatica
34 pages
Restaurant Schema Insights Overview
No ratings yet
Restaurant Schema Insights Overview
10 pages
Understanding Big Data and Warehousing
No ratings yet
Understanding Big Data and Warehousing
178 pages
Data Warehousing and OLAP Overview
No ratings yet
Data Warehousing and OLAP Overview
10 pages
Data Warehousing and OLAP Concepts
No ratings yet
Data Warehousing and OLAP Concepts
5 pages
E-commerce Trends and Analysis in Brazil
No ratings yet
E-commerce Trends and Analysis in Brazil
10 pages
SQL Data Analytics Techniques Overview
No ratings yet
SQL Data Analytics Techniques Overview
43 pages
Mysql Practice Questions ANAND AMAZON Dataset1
No ratings yet
Mysql Practice Questions ANAND AMAZON Dataset1
3 pages
SQL Server Query Examples Guide
No ratings yet
SQL Server Query Examples Guide
7 pages
DWM Assignment
No ratings yet
DWM Assignment
6 pages
Data Miner Workflow Guide
No ratings yet
Data Miner Workflow Guide
19 pages
ETL Case Study: Customer Data Mapping
No ratings yet
ETL Case Study: Customer Data Mapping
37 pages
OEL01
No ratings yet
OEL01
8 pages
Target SQL Case Study Analysis
No ratings yet
Target SQL Case Study Analysis
19 pages
Clustering Data Mining in Customer Analysis
No ratings yet
Clustering Data Mining in Customer Analysis
5 pages
Waste Management Database Automation
No ratings yet
Waste Management Database Automation
19 pages
Data Visualization Techniques in Tableau
No ratings yet
Data Visualization Techniques in Tableau
16 pages
Emes Cafe Menu Database Setup Guide
No ratings yet
Emes Cafe Menu Database Setup Guide
9 pages
SAP Datasphere Data Builder Overview
No ratings yet
SAP Datasphere Data Builder Overview
22 pages
Calculate Time Between Same Column Dates
No ratings yet
Calculate Time Between Same Column Dates
4 pages
Big Data Insights by Kossmann & Tatbul
No ratings yet
Big Data Insights by Kossmann & Tatbul
209 pages
8915 Bi Patil Aniket Shankar
No ratings yet
8915 Bi Patil Aniket Shankar
74 pages
SQL for Data Analytics Explained
No ratings yet
SQL for Data Analytics Explained
13 pages
Data Warehouse Techniques in Oracle
No ratings yet
Data Warehouse Techniques in Oracle
59 pages
Tableau Business Intelligence Reports Guide
No ratings yet
Tableau Business Intelligence Reports Guide
16 pages
Data Extraction Process in Warehousing
No ratings yet
Data Extraction Process in Warehousing
14 pages
Database and Analytics Coursework Guide
No ratings yet
Database and Analytics Coursework Guide
29 pages
Superstore Data Analysis Dashboard Guide
No ratings yet
Superstore Data Analysis Dashboard Guide
11 pages
MySQL Database Setup and Analysis
No ratings yet
MySQL Database Setup and Analysis
5 pages
Databricks Cheatsheet
No ratings yet
Databricks Cheatsheet
2 pages
Excel Theme Change for Data Analysis
No ratings yet
Excel Theme Change for Data Analysis
3 pages
4d84e-5 Sebm035103
100% (17)
4d84e-5 Sebm035103
197 pages
Introduction to Object-Oriented Programming
No ratings yet
Introduction to Object-Oriented Programming
23 pages
USTP Entrance Exam Reviewer 2025
80% (5)
USTP Entrance Exam Reviewer 2025
5 pages
White Roses
No ratings yet
White Roses
86 pages
Operational Modal Analysis Course Overview
No ratings yet
Operational Modal Analysis Course Overview
219 pages
Understanding Data Flow Diagrams and CASE Tools
No ratings yet
Understanding Data Flow Diagrams and CASE Tools
32 pages
Forex Backtesting Spreadsheet Template
No ratings yet
Forex Backtesting Spreadsheet Template
230 pages
Rizal's Childhood Lessons and Memories
No ratings yet
Rizal's Childhood Lessons and Memories
24 pages
Effective Instructional Materials for Math
No ratings yet
Effective Instructional Materials for Math
7 pages
Grade 11 Chemistry Holiday Homework
No ratings yet
Grade 11 Chemistry Holiday Homework
25 pages
Internship Supervisor Evaluation Form
No ratings yet
Internship Supervisor Evaluation Form
1 page
ETImail - Online Geometric Dimensioning and Tolerancing (GD&T) Newsletter
No ratings yet
ETImail - Online Geometric Dimensioning and Tolerancing (GD&T) Newsletter
7 pages
Grade 12 Economics Term 2 Notes 10 Pages
No ratings yet
Grade 12 Economics Term 2 Notes 10 Pages
10 pages
Advancements in 5G Mobile Technology
No ratings yet
Advancements in 5G Mobile Technology
4 pages
Redox Reactions
No ratings yet
Redox Reactions
2 pages
Rethinking Happiness and Morality
No ratings yet
Rethinking Happiness and Morality
12 pages
MSK Modulation
No ratings yet
MSK Modulation
31 pages
Industrial Pharmacy Lab Manual
No ratings yet
Industrial Pharmacy Lab Manual
125 pages
Supervision for Quality in Nigerian UBE
No ratings yet
Supervision for Quality in Nigerian UBE
6 pages
Formative Assessment Review: Statistics
No ratings yet
Formative Assessment Review: Statistics
6 pages
A Cut Above The Rest
No ratings yet
A Cut Above The Rest
19 pages
Statistical Analysis of Student Exam Scores
0% (1)
Statistical Analysis of Student Exam Scores
13 pages
Understanding Tests of Significance
No ratings yet
Understanding Tests of Significance
13 pages
Understanding the Talent Curse in Business
No ratings yet
Understanding the Talent Curse in Business
22 pages
Geopolitical Risks Driving Oil Prices
No ratings yet
Geopolitical Risks Driving Oil Prices
48 pages
Isabella Giaquinto: Supply Chain Expert
No ratings yet
Isabella Giaquinto: Supply Chain Expert
2 pages
Fluid Film Lubrication Explained
100% (1)
Fluid Film Lubrication Explained
159 pages
Grade 1 Subtraction Lesson Plan
No ratings yet
Grade 1 Subtraction Lesson Plan
5 pages
Grade 6 Mathematics Assessment Task
No ratings yet
Grade 6 Mathematics Assessment Task
7 pages
Understanding Rural Sociology Dynamics
No ratings yet
Understanding Rural Sociology Dynamics
22 pages

Informatica Super Store Data Analysis

Uploaded by

Informatica Super Store Data Analysis

Uploaded by

Informatica Hands-On Challenge: Super_Store Analysis

• Oracle SQL Setup:

• Log in to Oracle SQL Developer in Admin connection using the credentials:

Create a table named Super_Store with the provided structure

Load superstore_data.csv into the Super_Store table.

You are given data set is in the "~\Desktop\Project\miniproject-informaticasuper_store

Informatica Repository Setup:

Connect to the Informatica repository manager using the following credentials:

Create a folder named Super_Store in the repository manager.

How to Import Source Table in Source Analyzer

Step 1) Go to “Sources” option In

• We will now create ODBC connection

Create Connections for Workflow Manager

To Create a Relational Connection

Step 1: In Workflow Manager

Note : Please Follow the naming conventions in the problem statement

• Mapping Name: Map_Cleaned_Data

• Session Name: Session_Cleaned_Data

• Target Table: Super_Store_Cleaned_Data

• Remove duplicates from the dataset to ensure data integrity.

• Filter records where Country is 'United States' to focus on domestic orders.

• Extract numeric part from Customer_ID to standardize customer identification.(EX:

• Concatenate Customer_ID and Customer_Name with '-' to create a unique

• Drop the customer_id, Customer_name Columns

• Sample Output : Reamining columns and additional with 'CUSTOMER_ID_NAME'

Task 1: Sales Summary

• Mapping Name: Map_Sales_Summary

• Workflow Name: Workflow_Sales_Summary

• Session Name: Session_Sales_Summary

• Target Table: Sales_Summary

• Drop the unnecessary columns, kindly check the sample output.

• After completing of mapping, in the workflow manager.

CUSTOMER_ID_ TOTAL_SALES AVG_SALES

10060-Adam 409458 81892

17470-Mark 154535 25756

14815-Harold 121136 60568

Task 2: Customer Order Analysis

• Mapping Name: Map_Order_Analysis

• Workflow Name: Workflow_Order_Analysis

• Session Name: Session_Order_Analysis

• Target Table: Order_Analysis

• Drop the unnecessary columns, kindly check the sample output.

11 18355-Nat Gilpin 3 Low

12 12805-Cynthia Voltz 2 Low

13 16435-Katrina Willman 2 Low

14 17470-Mark Packer 2 Low

15 10225-Alan Schoenberger 1 Low

Task 3: Customer Geography Analysis

• Mapping Name: Map_Geography_Analysis

• Workflow Name: Workflow_Geography_Analysis

• Session Name: Session_Geography_Analysis

• Target Table: Geography_Analysis

• Filter records for customers in Segment 'Consumer'.

• Store the output in respective tables based on the region.

• Drop the unnecessary columns, kindly check the sample output.

• Load data into the EAST_CUSTOMER_BASE, WEST_CUSTOMER_BASE,

Sample output: EAST_CUSTOMER_BASE

19960-Ryan Crowe_East 43229 Ohio Office Supplies

19960-Ryan Crowe_East 43229 Ohio Office Supplies

20725-Steven 19805 Delaware Office Supplies

16885-Lena 95661 California Office Supplies

12130-Chad Sievert_West 90004 California Office Supplies

11710-Brosina 90032 California Furniture

Sample output: SOUTH_CUSTOMER_BASE

16270-Karen 22153 Virginia Office Supplies

18385-Natalie 39212 Mississippi Furniture

19780-Rose 38109 Tennessee Furniture

Task 4: Order Processing Time Analysis

• Mapping Name: Map_Order_Processing

• Workflow Name: Workflow_Order_Processing

• Session Name: Session_Order_Processing

• Drop the unnecessary columns, kindly check the sample output.

Standard Delivery 765

You might also like