0% found this document useful (0 votes)

7 views10 pages

Data Warehousing Fundamentals Explained

The document provides an overview of data warehousing, including its definition, components, and the ETL process. It highlights the differences between operational databases and data warehouses, as well as various data warehouse models such as Enterprise Data Warehouse, Data Mart, and Virtual Data Warehouse. Additionally, it discusses the architecture of data lakes and the concept of a Data Ocean as a comprehensive storage solution for organizational data.

Uploaded by

kapsetanmay112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Data Warehousing Fundamentals Explained

Uploaded by

kapsetanmay112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Unit 1 : Basics of Data Warehousing

Data mining
Data mining is the process of discovering hidden patterns, trends, and valuable insights from
massive datasets using advanced techniques like statistics, machine learning, and AI, to support
better decision-making, predict outcomes, and solve complex business problems

Introduction to Data Warehouse

A Data Warehouse is a centralized repository used to store large volumes of historical data
collected from multiple sources. It is designed to support decision-making, analysis, and reporting,
not daily transactions.

In simple words:
Database = A database is an organized collection of current data used to store,delete,
update, and retrieve information for daily operations.

Data Warehouse = A data warehouse is a centralized repository that stores large amounts
of historical data collected from multiple sources for analysis and
decision-making.

Need for Data Warehousing

Handling Large Data Volumes: Traditional databases store limited data (MBs to GBs), while data
warehouses are built to handle huge datasets (up to TBs), making it easier to store and analyze
long-term historical data.

Enhanced Analytics: Databases handle transactions; data warehouses are optimized for complex
analysis and historical insights.

Centralized Data Storage: A data warehouse combines data from multiple sources, giving a
single, unified view for better decision-making.

Trend Analysis: By storing historical data, a data warehouse allows businesses to analyze trends
over time, enabling them to make strategic decisions based on past performance and predict
future outcomes.

Business Intelligence Support: Data warehouses work with BI tools to give quick access to
insights, helping in data-driven decisions and improving efficiency.

Components of Data Warehouse

a) Data Sources: These are the various operational systems, databases and external data feeds that
provide raw data to be stored in the warehouse.

b) ETL (Extract, Transform, Load) Process: The ETL process is responsible for extracting data from
different sources, transforming it into a suitable format and loading it into the data warehouse.
c) Data Warehouse Database: This is the central repository where cleaned and transformed data is
stored. It is typically organized in a multidimensional format for efficient querying and reporting.

d) Metadata: Metadata describes the structure, source and usage of data within the warehouse, making
it easier for users and systems to understand and work with the data.

e) Data Marts: These are smaller, more focused data repositories derived from the data warehouse,
designed to meet the needs of specific business departments or functions.

f) OLAP (Online Analytical Processing) Tools: OLAP tools allow users to analyze data in multiple
dimensions, providing deeper insights and supporting complex analytical queries.

g) End-User Access Tools: These are reporting and analysis tools, such as dashboards or Business
Intelligence (BI) tools, that enable business users to query the data warehouse and generate reports.

Differences between Operational Database Systems and Data Warehouses

Sr.
no.
Operational Database Data Warehouse
1) Operational database data is volatile (changes data warehouse data is non-volatile (stable)
often)
2) Relational databases are made for on-line Data Warehouse planned for on-line Analytical
value-based Preparing (OLTP) Processing (OLAP)
3) Operational database systems are generally While data warehouses are generally subject-
application-oriented. oriented.
4) Stores current data Stores historical data
5) Data comes from one application Data comes from multiple sources
6) Focus on speed & accuracy Focus on analysis & trends
7) Smaller data size Very large data size
8) Operational databases support insert, update, data warehouses mainly support read and
delete analysis.
Multi-Tiered Architecture of Data Warehouse(Three-Tier Architecture)

A multi-tiered architecture of a data warehouse means the system is divided into different layers
(tiers). Each tier has a specific role, which makes data storage, processing, and analysis efficient and
easy to manage.

 Bottom Tier (Data Sources and Data Storage)

 Middle Tier (OLAP Engine)
 Top Tier (Front-End Tools)

Bottom Tier Middle Tier Top Tier

Bottom Tier
Bottom Tier is the foundation of the data warehouse.
It is responsible for collecting, processing and storing data from multiple sources.
It plays a critical role in preparing data for analysis.

Key Components:

 Data Sources: These are the places where data is created.

Examples include operational databases, Excel files, spreadsheets, websites, apps, CRM/ERP systems,
and web logs.
This data is raw and used as input for the data warehouse.

 Data Storage: After cleaning and processing, the data is stored safely in special databases like RDBMS
or multidimensional databases.
This stored data is used for easy searching, analysis, and reporting.
ETL Process(Extract,Transform,Load)
 Extract: Gathers raw data from different, often incompatible sources.
 Transform: Converts data into a consistent format, applying business rules, cleansing
errors, handling missing values and resolving duplicates.
 Load: Loads the transformed data into the warehouse, organizing it for fast access and
analysis.
Metadata = “Data about Data”
Monitor and Integrator : It is like a security guard + manager for data flow.

Middle Tier(OLAP Engine)

The Middle Tier hosts the OLAP server, which processes complex analytical queries.
It acts as a bridge between the data storage layer (bottom tier) and the user interface (top tier),
ensuring data is quickly retrieved, aggregated and ready for reporting and analysis.
It is designed for high-speed analytical processing.

OLAP server models come in three different categories, including:

ROLAP (Relational OLAP):This model uses a relational database to store and manage warehouse
data. It is ideal for handling large data volumes as it operates directly on relational databases.

MOLAP (Multidimensional OLAP): This model stores data in a multidimensional cube. The storage
and retrieval processes are highly efficient, making MOLAP suitable for complex analytical queries
that require aggregation.

HOLAP (Hybrid OLAP): It is combination of relational and multidimensional online analytical

processing paradigms. HOLAP is the ideal option for a seamless functional flow across the database
systems when the repository houses both the relational database management system and the
multidimensional database management system.

Top Tier(Front-end tools)

The Top Tier in the Three-Tier Data Warehouse Architecture comprises the front-end client layer,
which is essential for interacting with the data stored and processed in the lower tiers.
This layer includes a variety of business intelligence (BI) tools and techniques designed to facilitate
easy access and manipulation of data for reporting, analysis, and decision-making.
Data Warehouse Models: Enterprise Warehouse, Data Mart, And virtual
Warehouse ---> (A Data Warehouse model defines how data is collected, stored, organized, and
accessed for decision-making.)

Enterprise Data Warehouse (EDW)

An Enterprise Data Warehouse is a central, large warehouse that stores all data of the entire
organization.
An Enterprise Data Warehouse (EDW) refers to a comprehensive data repository that integrates
data drawn from different areas of an organisation.
It holds all information for all business units required giving a consolidated and unified view of the
organization.

Advantages

✅ Single source of truth

✅ Better decision-making
✅ Consistent data across departments

Disadvantages

❌ High cost
❌ Long time to build
❌ Needs skilled professionals

Data Mart

A Data Mart is a smaller, subject-oriented data warehouse designed to serve the needs of a specific
department or business function.

It contains only the data that department needs, not everything from the whole organization.

Data Warehouse = big store

Data Mart = one section of that store
Advantages

✔ Faster response time

✔ Lower cost than full warehouse
✔ Easy for users to understand
✔ Quick implementation

Disadvantages

✖ Data duplication possible

✖ Data inconsistency (independent type)
✖ Limited analysis scope

Virtual Data Warehouse

A Virtual Data Warehouse is a logical (virtual) system that allows users to see and analyze data from
different databases together, without actually storing the data in one place.

Advantages (Explainable)

✅ Very low cost (no storage)

✅ Fast to implement

✅ Real-time or near real-time data

✅ No data duplication

Disadvantages (Exam Focus)

❌ Slow for complex queries

❌ Heavy load on source databases

❌ Not suitable for historical analysis

❌ Limited performance optimization

Extraction, Transformation and Loading (ETL)

The ETL process, which stands for Extract, Transform and Load.
It is a critical methodology used to prepare data for storage, analysis and reporting in a data
warehouse.
It involves three distinct stages that help to streamline raw data from multiple sources into a
clean, structured and usable form.

Extract

The Extract phase is the first step in the ETL process, where raw data is collected from various data
sources.
Data is collected from tables, files, and text, not just one type of source, Data can come from tables
(databases), formatted files (JSON/XML), or unorganized data like emails and text files.
The main goal of extraction is to gather data without altering its format, enabling it to be further
processed in the next stage.

Types of data sources can include:

 Structured: SQL databases, ERPs, CRMs
 Semi-structured: JSON, XML
 Unstructured: Emails, web pages, flat files

Transformation

Data extracted in the previous phase is often raw and inconsistent.

During transformation, the data is cleaned, aggregated and formatted according to business rules.
This is a crucial step because it ensures that the data meets the quality standards required for
accurate analysis.

Common transformations include:

 Data Filtering: Removing irrelevant or incorrect data.
 Data Sorting: Organizing data into a required order for easier analysis.
 Data Aggregating: Summarizing data to provide meaningful insights (e.g., averaging sales
data).
Loading

Once data has been cleaned and transformed, it is ready for the final step: Loading.
This phase involves transferring the transformed data into a data warehouse, data lake or
another target system for storage.

Depending on the use case, there are two types of loading methods:
 Full Load: loads all data at once (usually first time)
 Incremental Load: Only new or updated data is loaded, making this method more
efficient for ongoing data updates.

Metadata Repository
A Metadata Repository is a central storage system that keeps information about data, not the
actual business data itself.
It tells us what the data means, where it comes from, how it is stored, and how it is used.
If data is a book, metadata is the title, author, index, and instructions about that book.
It helps users understand the data, supports ETL processes, ensures data consistency, and improves
data warehouse management.

Data pond

A Data Pond is a temporary storage area used to collect and store raw, unprocessed data from
multiple sources before it is cleaned, transformed, and loaded into a Data Warehouse or Data Mart.
Data Pond also known as Data Puddles

Data lake

A Data Lake is a centralized repository that stores large volumes of raw data in its original format
(structured, semi-structured, and unstructured) for long-term storage and future analysis.
Unlike a data warehouse-which stores processed and modeled data-data lakes allow organizations
to store everything first and analyze later.
This flexibility makes data lakes ideal for big data, advanced analytics, machine learning, and real-
time processing scenarios.

Architecture of data lake

1. Data Sources

Data is generated from different sources such as structured data (relational databases, ERP, CRM), semi-
structured data (JSON, XML, CSV), and unstructured data (images, videos, audio files, social media, and
sensor data).

2. Data Ingestion Layer

This layer is responsible for extracting and loading data into the data lake. Data can be ingested using
batch/scheduled ingestion or real-time streaming, and it is loaded without any transformation.

3. Raw / Landing Zone

In this layer, data is stored in its original raw form. No schema or cleaning is applied at this stage. It
supports the schema-on-read approach and allows data to be reprocessed if needed.

4. Data Processing and Transformation Layer

This layer performs batch processing and real-time processing to clean, filter, and transform raw data into
a usable format.

5. Processed Data Layer

After transformation, clean and structured data is stored here. This data can be directly used for analytics
or moved to a data warehouse for reporting.

6. Analytical Sandboxes

This layer is used by analysts and data scientists for data discovery, exploratory analysis, and predictive
modeling, supporting advanced analytics and machine learning.

7. Data Consumption Layer

Processed data is used for BI analytics, reporting, querying, real-time alerting, and search operations by
business users and applications.

8. Security, Governance, and Monitoring

This layer ensures data security, access control, data governance policies, monitoring, and compliance
across all stages of the data lake.

Data ocean
Data Ocean is the largest form of data storage that combines many data lakes and warehouses
across an organization.

DMPM Unit 1
No ratings yet
DMPM Unit 1
43 pages
Introduction to Data Warehousing Concepts
No ratings yet
Introduction to Data Warehousing Concepts
35 pages
Data Mining Unit-3
No ratings yet
Data Mining Unit-3
12 pages
Understanding Data Warehousing Essentials
No ratings yet
Understanding Data Warehousing Essentials
17 pages
Data Warehouse and OLAP Overview
No ratings yet
Data Warehouse and OLAP Overview
33 pages
Data Warehousing Overview and Components
No ratings yet
Data Warehousing Overview and Components
27 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
32 pages
Data Warehouse Architecture Overview
No ratings yet
Data Warehouse Architecture Overview
60 pages
Data Warehouse
No ratings yet
Data Warehouse
8 pages
Data Warehousing: Concepts & Architecture
No ratings yet
Data Warehousing: Concepts & Architecture
9 pages
Data Warehousing and OLAP Overview
No ratings yet
Data Warehousing and OLAP Overview
52 pages
Document 18
No ratings yet
Document 18
45 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
64 pages
Understanding Data Warehouse Basics
No ratings yet
Understanding Data Warehouse Basics
18 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
40 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
66 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
21 pages
Module 1: Data Warehousing, Business Analysis and On-Line Analytical Processing (OLAP)
No ratings yet
Module 1: Data Warehousing, Business Analysis and On-Line Analytical Processing (OLAP)
84 pages
Understanding Data Warehousing Essentials
No ratings yet
Understanding Data Warehousing Essentials
40 pages
DWM Complete Notes All4Units
No ratings yet
DWM Complete Notes All4Units
27 pages
Data Warehousing: Key Concepts and Benefits
No ratings yet
Data Warehousing: Key Concepts and Benefits
21 pages
Data Warehousing and OLAP Explained
No ratings yet
Data Warehousing and OLAP Explained
50 pages
Data Warehousing Concepts and Architecture
No ratings yet
Data Warehousing Concepts and Architecture
6 pages
Data Warehousing and OLAP Basics
No ratings yet
Data Warehousing and OLAP Basics
40 pages
Introduction to Data Warehousing Concepts
No ratings yet
Introduction to Data Warehousing Concepts
19 pages
BI and ETL Fundamentals Explained
No ratings yet
BI and ETL Fundamentals Explained
46 pages
Data Warehousing & OLAP Fundamentals
No ratings yet
Data Warehousing & OLAP Fundamentals
40 pages
Data Warehouse and Mining Course Overview
No ratings yet
Data Warehouse and Mining Course Overview
99 pages
Data Warehousing Overview and Architecture
No ratings yet
Data Warehousing Overview and Architecture
46 pages
Introduction to Data Warehousing Concepts
No ratings yet
Introduction to Data Warehousing Concepts
34 pages
Data Warehouse Overview and Components
No ratings yet
Data Warehouse Overview and Components
59 pages
Data Warehouse Overview and Architecture
No ratings yet
Data Warehouse Overview and Architecture
34 pages
Data Warehousing Concepts and Architecture
No ratings yet
Data Warehousing Concepts and Architecture
85 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
25 pages
Data Warehousing and Modeling Overview
No ratings yet
Data Warehousing and Modeling Overview
13 pages
Data Warehouse and OLAP Overview
No ratings yet
Data Warehouse and OLAP Overview
37 pages
Data Warehousing Fundamentals Explained
No ratings yet
Data Warehousing Fundamentals Explained
13 pages
Understanding Data Warehouses and Types
No ratings yet
Understanding Data Warehouses and Types
26 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
38 pages
OLAP vs. OLTP in Data Warehousing
No ratings yet
OLAP vs. OLTP in Data Warehousing
12 pages
Data Warehouse Overview and Architecture
No ratings yet
Data Warehouse Overview and Architecture
31 pages
Data Warehousing in Business Intelligence
No ratings yet
Data Warehousing in Business Intelligence
31 pages
Data Warehouse Concepts and Modeling Guide
No ratings yet
Data Warehouse Concepts and Modeling Guide
26 pages
Unit I
No ratings yet
Unit I
13 pages
Data Warehousing Concepts and Models
No ratings yet
Data Warehousing Concepts and Models
31 pages
Data Warehousing Overview and Concepts
No ratings yet
Data Warehousing Overview and Concepts
15 pages
Data Warehousing Fundamentals Explained
No ratings yet
Data Warehousing Fundamentals Explained
14 pages
DataStage to Snowflake Migration Guide
No ratings yet
DataStage to Snowflake Migration Guide
75 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
25 pages
Data Warehouse and Operational Systems Overview
No ratings yet
Data Warehouse and Operational Systems Overview
33 pages
Introduction to Data Warehousing
No ratings yet
Introduction to Data Warehousing
58 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
34 pages
Data Warehousing Is The Process of Collecting
No ratings yet
Data Warehousing Is The Process of Collecting
13 pages
Data Warehousing and OLAP Overview
No ratings yet
Data Warehousing and OLAP Overview
16 pages
Data Warehouse and OLAP Technology Overview
No ratings yet
Data Warehouse and OLAP Technology Overview
74 pages
Understanding Data Warehousing Essentials
No ratings yet
Understanding Data Warehousing Essentials
4 pages
Database vs. Data Warehouse Explained
No ratings yet
Database vs. Data Warehouse Explained
12 pages
159DFU0190.4 Practitioner Manual EcoStar Info Auto
No ratings yet
159DFU0190.4 Practitioner Manual EcoStar Info Auto
32 pages
TOEIC Past Papers 2021: Part 5
No ratings yet
TOEIC Past Papers 2021: Part 5
16 pages
2012 ISOIEC GuidelinesForCybersecurity Preview
No ratings yet
2012 ISOIEC GuidelinesForCybersecurity Preview
8 pages
OBD II Vehicle Diagnosis System Development
No ratings yet
OBD II Vehicle Diagnosis System Development
9 pages
TD3 RL Control for SI-MIMO DC-DC Converters
No ratings yet
TD3 RL Control for SI-MIMO DC-DC Converters
14 pages
ONGC Vendor Offer Technical Clarifications
No ratings yet
ONGC Vendor Offer Technical Clarifications
4 pages
CS409 Mid Term by Math Zone For Vu
No ratings yet
CS409 Mid Term by Math Zone For Vu
46 pages
Liquidation Report for Cash Advances
No ratings yet
Liquidation Report for Cash Advances
24 pages
Healthcare Appointment Tracker App
No ratings yet
Healthcare Appointment Tracker App
13 pages
Formlabs Form 2 Tech Specifications
No ratings yet
Formlabs Form 2 Tech Specifications
6 pages
Overview of Built-In Self-Test (BIST)
No ratings yet
Overview of Built-In Self-Test (BIST)
53 pages
Software Development Project Plan
No ratings yet
Software Development Project Plan
10 pages
Dremio vs Other SQL Engines Overview
No ratings yet
Dremio vs Other SQL Engines Overview
57 pages
Essential Skills for Finance Resumes
67% (3)
Essential Skills for Finance Resumes
8 pages
Understanding Machine Learning Hyperparameters
No ratings yet
Understanding Machine Learning Hyperparameters
13 pages
Dynamic Indexing for High-Dimensional Databases
No ratings yet
Dynamic Indexing for High-Dimensional Databases
1 page
Smart Garden Lights Catalogue
No ratings yet
Smart Garden Lights Catalogue
9 pages
Mastering Excel Formula Criteria
No ratings yet
Mastering Excel Formula Criteria
39 pages
Deface SCRPT
No ratings yet
Deface SCRPT
7 pages
OOP Course Outline - Fall 2024
No ratings yet
OOP Course Outline - Fall 2024
9 pages
QFD in Software Requirements Management
No ratings yet
QFD in Software Requirements Management
46 pages
Understanding STEM and STEAM Education
No ratings yet
Understanding STEM and STEAM Education
26 pages
Overview of Search Engine Functionality
No ratings yet
Overview of Search Engine Functionality
23 pages
UCB1 Algorithm: Optimizing Uncertainty
No ratings yet
UCB1 Algorithm: Optimizing Uncertainty
4 pages
Project Communication Plan Overview
No ratings yet
Project Communication Plan Overview
5 pages
Mobile App for Campus Navigation
No ratings yet
Mobile App for Campus Navigation
13 pages
Cybercrime Prevention Act Overview
No ratings yet
Cybercrime Prevention Act Overview
19 pages
AP DSC 2025 Hall Ticket Guidelines
No ratings yet
AP DSC 2025 Hall Ticket Guidelines
2 pages
Multimedia Applications and Concepts
No ratings yet
Multimedia Applications and Concepts
2 pages
IT Project Management Essentials Guide
No ratings yet
IT Project Management Essentials Guide
4 pages

Data Warehousing Fundamentals Explained

Uploaded by

Data Warehousing Fundamentals Explained

Uploaded by

Unit 1 : Basics of Data Warehousing

Introduction to Data Warehouse

Need for Data Warehousing

Components of Data Warehouse

Differences between Operational Database Systems and Data Warehouses

 Bottom Tier (Data Sources and Data Storage)

Bottom Tier Middle Tier Top Tier

 Data Sources: These are the places where data is created.

Middle Tier(OLAP Engine)

OLAP server models come in three different categories, including:

HOLAP (Hybrid OLAP): It is combination of relational and multidimensional online analytical

Top Tier(Front-end tools)

Enterprise Data Warehouse (EDW)

✅ Single source of truth

Data Warehouse = big store

✔ Faster response time

✖ Data duplication possible

Virtual Data Warehouse

✅ Very low cost (no storage)

✅ Real-time or near real-time data

Disadvantages (Exam Focus)

❌ Slow for complex queries

❌ Heavy load on source databases

❌ Not suitable for historical analysis

❌ Limited performance optimization

Types of data sources can include:

Data extracted in the previous phase is often raw and inconsistent.

Common transformations include:

Architecture of data lake

2. Data Ingestion Layer

3. Raw / Landing Zone

4. Data Processing and Transformation Layer

5. Processed Data Layer

7. Data Consumption Layer

8. Security, Governance, and Monitoring

You might also like