100% found this document useful (1 vote)

282 views8 pages

OLAP Operations in Data Mining Explained

The document discusses OLAP operations and data warehousing. It describes five OLAP operations performed on a data cube: roll-up, drill-down, slice, dice, and pivot. It also explains three common data warehouse schemas - star schema, snowflake schema, and fact/galaxy schema. Finally, it summarizes the ETL process used to integrate data from source systems into a data warehouse, including extract, transform, and load steps.

Uploaded by

Nithyan Nithya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

282 views8 pages

OLAP Operations in Data Mining Explained

Uploaded by

Nithyan Nithya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining

OLAP Operations

 OLAP stands for Online Analytical Processing.

 Applications of OLAP are

o Finance and accounting.

o Sales and marketing.

o Production.

There are 5 operations performed on data cube.

o Roll-up

o Drill down

o Slice

o Dice

o pivot

Operations

on cube

Roll-up Drill- Slice Dice Pivot

down

1) Roll-up: -

o When roll-up operation is performed on data cube one or more dimensions

from that cube are removed.

o Roll up operation performs aggregation on data cube by climbing up the hierarchy to this

Prepared By: K C Silpa Page 1

Data Mining

dimension are reduced

o Here more detailed data to less detailed data.

o Consider a cube of 3 dimensions.

o Here we observed that Tumkur and Mysore both cities are assumed to climbing up

“India” country and Pune and Bombay cities are into U.S.A, Hence number of

dimensions are reduced.

2) Drill-down: -
o It is reverse operation of roll-up.

o When we perform drill-down on data cube dimensions are added to the cube.

o It means lower level summary to higher level summery.

o Here less detailed data to more detailed data. Consider that we are going add dimensions

to time dimensions.

Prepared By: K C Silpa Page 2

Data Mining

3) Slice: -

It performs selection on one dimension from given cube and provides a new sub-

cube.

o We select only one particular dimension.

4) Dice: -

This operation selects two or more dimensions from a given cube and provides a new sub

cube.

Prepared By: K C Silpa Page 3

Data Mining

5. Pivot: -

o It is also called as rotation.

o It is a technique of changing from one-dimension orientation to another with value also.

Data ware house Schema

A schema is an overall structure or design of objects like tables views index etc.

The data warehouse is designed using either one of these three schemas. They are

o Star Schema

o Snow flake Schema

o Fact / Galaxy Schema

Prepared By: K C Silpa Page 4

Data Mining

1. Star Schema:

o It is the simplest schema and very easy to understand.

o It has only one fact table which stores foreign keys and it refers number ofdimension

tables.

o In this schema all dimension tables are "not normalized”. It means a less number of

tables are used for that less number of joins used in this.

o It is suitable for query processing.

Ex:

Consider 4-dimensional table like book, college, employee and student and one fact table that is

university.

o In the above star schema “University” is a table which referred to all other tables.

o This schema is most suitable for query processing because we can use simple query.

o Problem is more data redundancy because tables are denormalized.

Prepared By: K C Silpa Page 5

Data Mining

2. Snowflake schema:

o It is also same as the star schema, which is also having only one “fact” table which is

referred to number of dimension tables.

o But the difference is, in this schema all “dimension tables are normalized” and these

tables can have multiple levels.

o If the tables are normalized, more no of tables are used and more joins are used in order

to get the result.

o Advantage is, less redundancy because of normalized hence dimension tables are easy to

update and maintain.

Consider the same 4-dimension tables but again we are going to tale next level of book table.

3. Fact /Galaxy schema:

o In this we can use multiple fact tables that share common dimension tables

Prepared By: K C Silpa Page 6

Data Mining

o It is complex schema due to multiple fact table’s maintenance.

o Dimension tables are also very large in this suppose we have 2 fact tables sales and

publisher but one-dimension table book.

Primary key Foreign key

Foreign key
pid
Bid bid
Bid
title lid
transaction
price

Fact table Dimension table Publisher fact table

ETL process: [Extract Transformation/Transform Loading]

o The process of moving data from traditional databases to data warehouse is called ETL

process.

o Transactional databases cannot answer complex questions then we can use ETL.

o ETL provides a method of moving data from various sources to data warehouse.

Source 1
Different sources

extract

Transform Load data

Source 2
staging in to data
area warehouse

Source 3

1. Extract: [data extraction]

In this step, data is extracted from the multiple sources.

2. Data transformation:

Prepared By: K C Silpa Page 7

Data Mining

The second step of ETL process is transformation.

After extracting data, it is a raw and it’s not useful for that reason we need to do some

transformations like,

 Filtering: loading any particular things in to the data warehouse.

 Cleaning: the data should be accurate.

 Joining: combining multiple columns to one column.

 Splitting: dividing single column to multiple columns.

 Merging: merge the data from multiple sources

 Sorting: arrange the data in any one of the orders.

Here, staging area gives an opportunity to validate the extracted data before it is moved to data

warehouse.

3. Data loading:

The third step/ final step o ETL process is loading.

In this step, the transformed data is finally loaded into data warehouse

daily/weekly/monthly/yearly.

Prepared By: K C Silpa Page 8

Data Mining

Prepared By: K C Silpa
Page 1

OLAP Operations
 OLAP stands for Online Analytical Processing.
 Applicat

Data Mining

Prepared By: K C Silpa
Page 2

dimension are reduced
o Here more detailed data to less detailed data.
o C

Data Mining

Prepared By: K C Silpa
Page 3

3) Slice: -

It performs selection on one dimension from given cube

Data Mining

Prepared By: K C Silpa
Page 4

5. Pivot: -
o It is also called as rotation.
o It is a technique of chan

Data Mining

Prepared By: K C Silpa
Page 5

1. Star Schema:
o It is the simplest schema and very easy to understand.
o

Data Mining

Prepared By: K C Silpa
Page 6

2. Snowflake schema:
o It is also same as the star schema, which is also ha

Data Mining

Prepared By: K C Silpa
Page 7

pid
bid
lid
o It is complex schema due to multiple fact table’s maintenan

Data Mining

Prepared By: K C Silpa
Page 8

The second step of ETL process is transformation.
After extracting data, it

Google Calendar Overview and Insights
No ratings yet
Google Calendar Overview and Insights
30 pages
MCA Software Engineering Exam Notes
No ratings yet
MCA Software Engineering Exam Notes
5 pages
Overview of Data Warehousing Concepts
No ratings yet
Overview of Data Warehousing Concepts
66 pages
Chameleon Clustering Algorithm Overview
No ratings yet
Chameleon Clustering Algorithm Overview
18 pages
Stepwise Project Planning Guide
No ratings yet
Stepwise Project Planning Guide
17 pages
Candidate Generation and Pruning in Data Mining
100% (1)
Candidate Generation and Pruning in Data Mining
9 pages
Online Transfer Certificate Generation
100% (1)
Online Transfer Certificate Generation
15 pages
RDBMS Case Studies: Oracle, PostgreSQL, MySQL
No ratings yet
RDBMS Case Studies: Oracle, PostgreSQL, MySQL
18 pages
Cloud vs Grid Computing Features
100% (3)
Cloud vs Grid Computing Features
3 pages
Overview of Distributed Query Processing
100% (1)
Overview of Distributed Query Processing
3 pages
User Interface Design in Software Engineering
100% (1)
User Interface Design in Software Engineering
5 pages
Data Warehouse Overview and Applications
No ratings yet
Data Warehouse Overview and Applications
41 pages
Android Programming Essentials Guide
No ratings yet
Android Programming Essentials Guide
4 pages
Disk Attachment and Storage Types
No ratings yet
Disk Attachment and Storage Types
23 pages
Data Mining Notes for Kerala Students
No ratings yet
Data Mining Notes for Kerala Students
42 pages
Centralizing Email Communication in Cloud
No ratings yet
Centralizing Email Communication in Cloud
32 pages
Access Layer in DBMS Architecture
100% (1)
Access Layer in DBMS Architecture
8 pages
Evolving Role of Software in Engineering
No ratings yet
Evolving Role of Software in Engineering
19 pages
Software Engineering Question Bank
No ratings yet
Software Engineering Question Bank
3 pages
DAA Lab: Algorithm Design Insights
100% (1)
DAA Lab: Algorithm Design Insights
12 pages
Overview of Distributed Query Processing
No ratings yet
Overview of Distributed Query Processing
10 pages
OAD Class Notes and Syllabus Overview
No ratings yet
OAD Class Notes and Syllabus Overview
25 pages
Data Science Fundamentals for BCA Students
No ratings yet
Data Science Fundamentals for BCA Students
22 pages
Moodle Cloud Portal Architecture Analysis
No ratings yet
Moodle Cloud Portal Architecture Analysis
37 pages
MCA Web App Development Master Notes
No ratings yet
MCA Web App Development Master Notes
44 pages
Feed Forward Neural Networks Overview
No ratings yet
Feed Forward Neural Networks Overview
15 pages
Software Engineering Principles Overview
100% (1)
Software Engineering Principles Overview
3 pages
Online Course Registration System Report
No ratings yet
Online Course Registration System Report
35 pages
FIFO vs SCAN Disk Scheduling Methods
No ratings yet
FIFO vs SCAN Disk Scheduling Methods
4 pages
UML Activity Diagrams for POS System
No ratings yet
UML Activity Diagrams for POS System
27 pages
Driving School Management System Report
No ratings yet
Driving School Management System Report
58 pages
Fog and Edge Computing Hierarchy Explained
No ratings yet
Fog and Edge Computing Hierarchy Explained
19 pages
Predictor Variables for IMDB Ratings
No ratings yet
Predictor Variables for IMDB Ratings
43 pages
Parallel Processing in Data Warehousing
No ratings yet
Parallel Processing in Data Warehousing
15 pages
Overview of Routing Algorithms
No ratings yet
Overview of Routing Algorithms
64 pages
BCA Project Report Guidelines
No ratings yet
BCA Project Report Guidelines
6 pages
PHP Conditional and Numeric Functions
No ratings yet
PHP Conditional and Numeric Functions
14 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
121 pages
Summer Internship Report Guidelines
No ratings yet
Summer Internship Report Guidelines
6 pages
DBMS Schema Design Essentials
No ratings yet
DBMS Schema Design Essentials
173 pages
Database Normalization and Dependencies
100% (1)
Database Normalization and Dependencies
53 pages
Cyber Security: Network Tools Overview
100% (4)
Cyber Security: Network Tools Overview
52 pages
Simba Coach Online Ticketing System
0% (1)
Simba Coach Online Ticketing System
7 pages
Understanding Multi-Dimensional Data Models
100% (1)
Understanding Multi-Dimensional Data Models
4 pages
Data Classification and Decision Trees
No ratings yet
Data Classification and Decision Trees
11 pages
Data Preprocessing in Data Warehousing
100% (1)
Data Preprocessing in Data Warehousing
9 pages
Student Leave Management System Project
No ratings yet
Student Leave Management System Project
8 pages
Comparing SLR, CLR, and LALR Parsers
No ratings yet
Comparing SLR, CLR, and LALR Parsers
72 pages
ISRL Exam Q&A: Key Concepts Explained
No ratings yet
ISRL Exam Q&A: Key Concepts Explained
23 pages
Advanced Computer Architecture Course
100% (1)
Advanced Computer Architecture Course
2 pages
Big Data Analytics Lecture Notes 2023
No ratings yet
Big Data Analytics Lecture Notes 2023
75 pages
Classification and Prediction Issues
No ratings yet
Classification and Prediction Issues
42 pages
Python Programming Fundamentals
No ratings yet
Python Programming Fundamentals
99 pages
Mobile Computing Syllabus Overview
No ratings yet
Mobile Computing Syllabus Overview
16 pages
MCA Semester 3 Project Management Notes
0% (1)
MCA Semester 3 Project Management Notes
36 pages
Overview of Virtualization in Cloud Computing
No ratings yet
Overview of Virtualization in Cloud Computing
36 pages
OOP Fundamentals and Java Inheritance
100% (1)
OOP Fundamentals and Java Inheritance
66 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
15 pages
Data Warehouse Basics and Schemas
No ratings yet
Data Warehouse Basics and Schemas
17 pages
Data Warehousing Lecture Notes
No ratings yet
Data Warehousing Lecture Notes
30 pages
Statistical Analysis Exam Paper 2021
No ratings yet
Statistical Analysis Exam Paper 2021
2 pages
III BCA Approval List from Tumkur University
No ratings yet
III BCA Approval List from Tumkur University
5 pages
Tumkur University BCA 5th Sem Syllabus
No ratings yet
Tumkur University BCA 5th Sem Syllabus
38 pages
Online Bus Reservation System Overview
No ratings yet
Online Bus Reservation System Overview
81 pages
Bca 6TH Sem
No ratings yet
Bca 6TH Sem
57 pages
Bus Reservation System Project Overview
79% (14)
Bus Reservation System Project Overview
13 pages
C# Program Structure and Examples
No ratings yet
C# Program Structure and Examples
20 pages
Bca 3RD Sem
No ratings yet
Bca 3RD Sem
44 pages
5TH Sem English
No ratings yet
5TH Sem English
20 pages
Bca 1 Sem
No ratings yet
Bca 1 Sem
43 pages
5TH Sem Kannada
No ratings yet
5TH Sem Kannada
21 pages
Introduction to PHP Programming
No ratings yet
Introduction to PHP Programming
28 pages
Introduction to Computer Architecture
No ratings yet
Introduction to Computer Architecture
28 pages
BCA 5th Sem .NET Framework Overview
100% (2)
BCA 5th Sem .NET Framework Overview
7 pages
Data Warehouse Modeling Overview
No ratings yet
Data Warehouse Modeling Overview
6 pages
Singapore Cartographic Activities Report
No ratings yet
Singapore Cartographic Activities Report
9 pages
Senior Analyst Data Engineering Evaluation
No ratings yet
Senior Analyst Data Engineering Evaluation
5 pages
Flume for Hadoop Data Ingestion Guide
No ratings yet
Flume for Hadoop Data Ingestion Guide
13 pages
Managing Information Systems Today
No ratings yet
Managing Information Systems Today
8 pages
Milwaukee Shipping Cost Dataset
No ratings yet
Milwaukee Shipping Cost Dataset
29 pages
DSS in Data Warehousing Explained
No ratings yet
DSS in Data Warehousing Explained
53 pages
Power BI Developer Resume and Portfolio
No ratings yet
Power BI Developer Resume and Portfolio
1 page
Advanced PL/SQL Techniques for ETL
No ratings yet
Advanced PL/SQL Techniques for ETL
16 pages
Real-Time Loan Default Risk Analysis
No ratings yet
Real-Time Loan Default Risk Analysis
6 pages
Power Query Data Modeling Techniques
No ratings yet
Power Query Data Modeling Techniques
22 pages
Internship Report at ZiMetrics Technologies
No ratings yet
Internship Report at ZiMetrics Technologies
21 pages
SSIS Basics: Initial Package Setup
No ratings yet
SSIS Basics: Initial Package Setup
18 pages
Essential Excel Features for Data Management
No ratings yet
Essential Excel Features for Data Management
10 pages
Microsoft Certified Power BI Analyst Resume
No ratings yet
Microsoft Certified Power BI Analyst Resume
1 page
Understanding Big Data and Analytics
No ratings yet
Understanding Big Data and Analytics
35 pages
Syapse Data Factory Expertise in BI Solutions
No ratings yet
Syapse Data Factory Expertise in BI Solutions
7 pages
ETL Testing Scenarios and Cases Guide
No ratings yet
ETL Testing Scenarios and Cases Guide
5 pages
Data Warehousing & Business Intelligence Course
No ratings yet
Data Warehousing & Business Intelligence Course
3 pages
Unit Iii
No ratings yet
Unit Iii
10 pages
Oracle Data Integration Solutions Overview
No ratings yet
Oracle Data Integration Solutions Overview
37 pages
Key Factors: Metadata, Money, Love, Power
100% (1)
Key Factors: Metadata, Money, Love, Power
37 pages
SAP HANA Migration Guide: Key Aspects
No ratings yet
SAP HANA Migration Guide: Key Aspects
3 pages
Financial Data Analyst Resume Summary
No ratings yet
Financial Data Analyst Resume Summary
4 pages
Oracle PL/SQL Programmer Analyst Profile
No ratings yet
Oracle PL/SQL Programmer Analyst Profile
4 pages
Data Engineer Resume and Skills Overview
No ratings yet
Data Engineer Resume and Skills Overview
5 pages
PWX 910hf5 (CDC) Guidefor (I5os) en
No ratings yet
PWX 910hf5 (CDC) Guidefor (I5os) en
124 pages
Data Science Technology Stack Setup Guide
No ratings yet
Data Science Technology Stack Setup Guide
5 pages
Jafer Enterprises: Khazi's Techno-Functional Expertise
No ratings yet
Jafer Enterprises: Khazi's Techno-Functional Expertise
3 pages
Data Science & AI Skills Overview
No ratings yet
Data Science & AI Skills Overview
4 pages
Business Intelligence Overview and Tools
No ratings yet
Business Intelligence Overview and Tools
17 pages

OLAP Operations in Data Mining Explained

Uploaded by

OLAP Operations in Data Mining Explained

Uploaded by

Data Mining

 OLAP stands for Online Analytical Processing.

 Applications of OLAP are

o Finance and accounting.

o Sales and marketing.

There are 5 operations performed on data cube.

Roll-up Drill- Slice Dice Pivot

o When roll-up operation is performed on data cube one or more dimensions

from that cube are removed.

Prepared By: K C Silpa Page 1

dimension are reduced

o Here more detailed data to less detailed data.

o Consider a cube of 3 dimensions.

dimensions are reduced.

o It means lower level summary to higher level summery.

Prepared By: K C Silpa Page 2

o We select only one particular dimension.

Prepared By: K C Silpa Page 3

o It is also called as rotation.

o It is a technique of changing from one-dimension orientation to another with value also.

Data ware house Schema

o Snow flake Schema

o Fact / Galaxy Schema

Prepared By: K C Silpa Page 4

o It is the simplest schema and very easy to understand.

o It is suitable for query processing.

o Problem is more data redundancy because tables are denormalized.

Prepared By: K C Silpa Page 5

referred to number of dimension tables.

tables can have multiple levels.

to get the result.

update and maintain.

3. Fact /Galaxy schema:

Prepared By: K C Silpa Page 6

o It is complex schema due to multiple fact table’s maintenance.

publisher but one-dimension table book.

Primary key Foreign key

Fact table Dimension table Publisher fact table

ETL process: [Extract Transformation/Transform Loading]

Transform Load data

1. Extract: [data extraction]

In this step, data is extracted from the multiple sources.

Prepared By: K C Silpa Page 7

The second step of ETL process is transformation.

 Filtering: loading any particular things in to the data warehouse.

 Cleaning: the data should be accurate.

 Joining: combining multiple columns to one column.

 Splitting: dividing single column to multiple columns.

 Merging: merge the data from multiple sources

 Sorting: arrange the data in any one of the orders.

The third step/ final step o ETL process is loading.

Prepared By: K C Silpa Page 8

You might also like