0% found this document useful (0 votes)
16 views4 pages

UNIT 1 DMW

Data warehousing is the process of collecting and managing data from multiple sources in a central repository to support decision-making through efficient querying and analysis. It includes components such as data sources, ETL processes, and OLAP tools, and comes in various types like Enterprise Data Warehouses and Cloud Data Warehouses. Real-world applications can be seen in sectors like e-commerce and banking, where data warehousing aids in analyzing trends and improving operational efficiency.

Uploaded by

xodavo9027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views4 pages

UNIT 1 DMW

Data warehousing is the process of collecting and managing data from multiple sources in a central repository to support decision-making through efficient querying and analysis. It includes components such as data sources, ETL processes, and OLAP tools, and comes in various types like Enterprise Data Warehouses and Cloud Data Warehouses. Real-world applications can be seen in sectors like e-commerce and banking, where data warehousing aids in analyzing trends and improving operational efficiency.

Uploaded by

xodavo9027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT 1: Data Warehousing & Data Mining

Data Warehousing
Data warehousing is the process of collecting, integrating, storing, and managing data from
multiple sources in a central repository. It enables organizations to organize large volumes of
historical data for efficient querying, analysis, and reporting.
The main goal of data warehousing is to support decision-making by providing clean,
consistent, and timely access to data. It ensures fast data retrieval even when working with
massive datasets.

Need for Data Warehousing


 Handling Large Data Volumes: Traditional databases store limited data
(MBs to GBs), while data warehouses are built to handle huge datasets
(up to TBs), making it easier to store and analyze long-term historical
data.
 Enhanced Analytics: Databases handle transactions; data warehouses
are optimized for complex analysis and historical insights.
 Centralized Data Storage: A data warehouse combines data from
multiple sources, giving a single, unified view for better decision-making.
 Trend Analysis: By storing historical data, a data warehouse allows
businesses to analyze trends over time, enabling them to make strategic
decisions based on past performance and predict future outcomes.
 Business Intelligence Support: Data warehouses work with BI tools to
give quick access to insights, helping in data-driven decisions and
improving efficiency.
Components of Data Warehouse
The main components of a data warehouse include:
 Data Sources: These are the various operational systems, databases, and
external data feeds that provide raw data to be stored in the warehouse.
 ETL (Extract, Transform, Load) Process: The ETL process is responsible
for extracting data from different sources, transforming it into a suitable
format, and loading it into the data warehouse.
 Data Warehouse Database: This is the central repository where cleaned
and transformed data is stored. It is typically organized in a
multidimensional format for efficient querying and reporting.
 Metadata: Metadata describes the structure, source, and usage of data
within the warehouse, making it easier for users and systems to
understand and work with the data.
 Data Marts: These are smaller, more focused data repositories derived
from the data warehouse, designed to meet the needs of specific
business departments or functions.
 OLAP (Online Analytical Processing) Tools: OLAP tools allow users to
analyze data in multiple dimensions, providing deeper insights and
supporting complex analytical queries.
 End-User Access Tools: These are reporting and analysis tools, such as
dashboards or Business Intelligence (BI) tools, that enable business users
to query the data warehouse and generate reports.
Characteristics of Data Warehousing
Data warehousing plays a key role in modern data management by helping
organizations store, integrate, and analyze data effectively. Its main features
include:
 Centralized Data Storage: Combines data from various sources into one
place for a complete view.
 Query & Analysis: Supports fast and flexible data analysis for better
decision-making.
 Data Transformation: Cleans and formats data for consistency and
quality.
 Data Mining: Finds hidden patterns to discover insights and predict
trends.
 Data Security: Protects data with encryption, access control, and
backups.
Types of Data Warehouses
The different types of Data Warehouses are:
1. Enterprise Data Warehouse (EDW): A centralized warehouse that stores
data from across the organization for analysis and reporting.
2. Operational Data Store (ODS): Stores real-time operational data used for
day-to-day operations, not for deep analytics.
3. Data Mart: A subset of a data warehouse, focusing on a specific business
area or department.
4. Cloud Data Warehouse: A data warehouse hosted in the cloud, offering
scalability and flexibility.
5. Big Data Warehouse: Designed to store vast amounts of unstructured
and structured data for big data analysis.
6. Virtual Data Warehouse: Provides access to data from multiple sources
without physically storing it.
7. Hybrid Data Warehouse: Combines on-premises and cloud-based
storage to offer flexibility.
8. Real-time Data Warehouse: Designed to handle real-time data
streaming and analysis for immediate insights.
Real world Example of Data warehousing
Data Warehousing can be applied anywhere where we have a huge amount of
data and we want to see statistical results that help in decision making.
1. E-commerce: Flipkart
 Data Gathering: Orders, returns, payments, user clicks, delivery updates.
 Schema: Combines source data into a structured star schema for
analysis.
 Cleansing: Standardizes customer names, locations, and product
categories.
 Updates: Near real-time or scheduled loads for fresh insights.
 Summarization: Bestsellers by category, regional demand trends,
logistics performance.
2. Banking: HDFC Bank
 Data Gathering: ATM transactions, online banking, credit card usage,
loan records.
 Schema: Integrates data from core banking, CRM, and fraud detection
systems.
 Cleansing: Fixes inconsistencies in account info, transaction logs, and
addresses.
 Updates: Transaction data is batched and uploaded nightly.
 Summarization: Daily cash flow reports, high-risk account flags, and
customer profitability analysis.

You might also like