Data Mining
OLAP Operations
OLAP stands for Online Analytical Processing.
Applications of OLAP are
o Finance and accounting.
o Sales and marketing.
o Production.
There are 5 operations performed on data cube.
o Roll-up
o Drill down
o Slice
o Dice
o pivot
Operations
on cube
Roll-up Drill- Slice Dice Pivot
down
1) Roll-up: -
o When roll-up operation is performed on data cube one or more dimensions
from that cube are removed.
o Roll up operation performs aggregation on data cube by climbing up the hierarchy to this
Prepared By: K C Silpa Page 1
Data Mining
dimension are reduced
o Here more detailed data to less detailed data.
o Consider a cube of 3 dimensions.
o Here we observed that Tumkur and Mysore both cities are assumed to climbing up
“India” country and Pune and Bombay cities are into U.S.A, Hence number of
dimensions are reduced.
2) Drill-down: -
o It is reverse operation of roll-up.
o When we perform drill-down on data cube dimensions are added to the cube.
o It means lower level summary to higher level summery.
o Here less detailed data to more detailed data. Consider that we are going add dimensions
to time dimensions.
Prepared By: K C Silpa Page 2
Data Mining
3) Slice: -
It performs selection on one dimension from given cube and provides a new sub-
cube.
o We select only one particular dimension.
4) Dice: -
This operation selects two or more dimensions from a given cube and provides a new sub
cube.
Prepared By: K C Silpa Page 3
Data Mining
5. Pivot: -
o It is also called as rotation.
o It is a technique of changing from one-dimension orientation to another with value also.
Data ware house Schema
A schema is an overall structure or design of objects like tables views index etc.
The data warehouse is designed using either one of these three schemas. They are
o Star Schema
o Snow flake Schema
o Fact / Galaxy Schema
Prepared By: K C Silpa Page 4
Data Mining
1. Star Schema:
o It is the simplest schema and very easy to understand.
o It has only one fact table which stores foreign keys and it refers number ofdimension
tables.
o In this schema all dimension tables are "not normalized”. It means a less number of
tables are used for that less number of joins used in this.
o It is suitable for query processing.
Ex:
Consider 4-dimensional table like book, college, employee and student and one fact table that is
university.
o In the above star schema “University” is a table which referred to all other tables.
o This schema is most suitable for query processing because we can use simple query.
o Problem is more data redundancy because tables are denormalized.
Prepared By: K C Silpa Page 5
Data Mining
2. Snowflake schema:
o It is also same as the star schema, which is also having only one “fact” table which is
referred to number of dimension tables.
o But the difference is, in this schema all “dimension tables are normalized” and these
tables can have multiple levels.
o If the tables are normalized, more no of tables are used and more joins are used in order
to get the result.
o Advantage is, less redundancy because of normalized hence dimension tables are easy to
update and maintain.
Consider the same 4-dimension tables but again we are going to tale next level of book table.
3. Fact /Galaxy schema:
o In this we can use multiple fact tables that share common dimension tables
Prepared By: K C Silpa Page 6
Data Mining
o It is complex schema due to multiple fact table’s maintenance.
o Dimension tables are also very large in this suppose we have 2 fact tables sales and
publisher but one-dimension table book.
Primary key Foreign key
Foreign key
pid
Bid bid
Bid
title lid
transaction
price
Fact table Dimension table Publisher fact table
ETL process: [Extract Transformation/Transform Loading]
o The process of moving data from traditional databases to data warehouse is called ETL
process.
o Transactional databases cannot answer complex questions then we can use ETL.
o ETL provides a method of moving data from various sources to data warehouse.
Source 1
Different sources
extract
Transform Load data
Source 2
staging in to data
area warehouse
Source 3
1. Extract: [data extraction]
In this step, data is extracted from the multiple sources.
2. Data transformation:
Prepared By: K C Silpa Page 7
Data Mining
The second step of ETL process is transformation.
After extracting data, it is a raw and it’s not useful for that reason we need to do some
transformations like,
Filtering: loading any particular things in to the data warehouse.
Cleaning: the data should be accurate.
Joining: combining multiple columns to one column.
Splitting: dividing single column to multiple columns.
Merging: merge the data from multiple sources
Sorting: arrange the data in any one of the orders.
Here, staging area gives an opportunity to validate the extracted data before it is moved to data
warehouse.
3. Data loading:
The third step/ final step o ETL process is loading.
In this step, the transformed data is finally loaded into data warehouse
daily/weekly/monthly/yearly.
Prepared By: K C Silpa Page 8