0% found this document useful (0 votes)

5 views16 pages

DataWarehousing Notes

This document provides a comprehensive overview of Data Warehousing concepts, including definitions, architecture, dimensional modeling, schema design, and various optimization techniques. It covers key topics such as ETL vs ELT, indexing, partitioning, and common SQL patterns used in data warehouses. Additionally, it discusses modern cloud data warehousing platforms and the advantages of using surrogate keys and materialized views.

Uploaded by

rp105263

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views16 pages

DataWarehousing Notes

Uploaded by

rp105263

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Data Warehousing

Quick Reference Notes

Concepts · Schemas · SQL · Indexing · Partitioning · Dimensional Modelling · Optimization

Index
# Topic What's Covered
1 What is a Data Warehouse? Definition, OLTP vs OLAP, use cases
2 DW Architecture Layers, Kimball vs Inmon, Lambda/Kappa
3 Dimensional Modelling Facts, dimensions, grain, conformed dims
4 Schema Design Star schema, snowflake schema, galaxy schema
5 Slowly Changing Dimensions SCD Type 1, 2, 3, 4 — definitions and SQL
6 ETL vs ELT Differences, when to use each, tools
7 Surrogate Keys & Natural Keys Why surrogate keys, sequences, identity columns
8 Indexing in DW Clustered, non-clustered, columnstore indexes
9 Partitioning Range, list, hash partitioning with examples
10 Aggregations & Materialized Pre-aggregation, indexed views, refresh strategies
Views
11 Window Functions ROW_NUMBER, RANK, LAG, LEAD, running totals
12 Common DW SQL Patterns Incremental load, deduplication, pivot, unpivot
13 Data Vault Modelling Hubs, links, satellites — concepts and example
14 Modern Cloud DW Platforms Snowflake, BigQuery, Synapse, Redshift — comparison
15 Performance Optimization Statistics, query hints, distribution, caching
1. What is a Data Warehouse?
A Data Warehouse (DW) is a central repository of integrated, historical data from multiple sources — optimised
for analytical queries (OLAP), not transactional processing (OLTP).

Feature OLTP (Operational DB) OLAP (Data Warehouse)

Purpose Day-to-day transactions Historical analysis & reporting
Data Current, real-time Historical, aggregated
Queries Simple, short (INSERT/UPDATE) Complex, long-running
(SELECT/JOIN/GROUP)
Schema Highly normalised (3NF) Denormalised (star/snowflake)
Users Thousands of concurrent users Dozens of analysts
Examples MySQL, PostgreSQL, SQL Server Snowflake, Synapse, BigQuery,
Redshift

💡 Data Warehouses are read-heavy. Denormalisation, columnstore indexes, and pre-aggregation are key
optimisations — unlike OLTP where write speed and normalisation matter.
2. DW Architecture
Most modern DW architectures follow a layered approach from raw ingestion to clean, business-ready data.

Medallion / Layered Architecture

Layer Also called What it contains
Bronze / Raw Landing / Staging Raw data as-is from source systems — no
transformation
Silver / Clean Cleansed / ODS Deduplicated, standardised, validated data
Gold / Curated DW / Data Mart Business-ready facts and dimensions,
aggregations

Kimball vs Inmon
Approach Philosophy Start with
Kimball (Bottom-up) Build data marts first; integrate later via Dimensional model (star
conformed dimensions schema)
Inmon (Top-down) Build enterprise DW first in 3NF; derive data Normalised enterprise DW
marts from it

💡 Most modern cloud DWs follow a Kimball-style dimensional model combined with a Medallion
architecture — Bronze/Silver/Gold layers feeding star-schema Gold tables.
3. Dimensional Modelling
Dimensional modelling organises data into Facts (measurable events) and Dimensions (context about those
events). It makes queries fast and intuitive.

Concept What it is Example

Fact Table Stores measurable, numeric fact_sales (sale_id, amount, qty,
events date_key, customer_key)
Dimension Table Descriptive context for fact rows dim_customer (customer_key, name,
city, country)
Grain The lowest level of detail one fact 'One row per product per customer per
row represents day'
Measure A numeric value in the fact table sales_amount, quantity_sold, discount
Surrogate Key System-generated integer PK for customer_key = 1001 (not the source
dimension rows CRM ID)
Conformed Dim Dimension shared across multiple dim_date used in both fact_sales and
fact tables / data marts fact_returns
Degenerate Dim Dimension attribute stored in fact order_number, invoice_id
(no separate dim table)
Junk Dimension Combines low-cardinality flags dim_flags (is_online, is_returned,
into one dimension is_discounted)

💡 Always define the grain before designing the fact table. Every measure and foreign key must be at the
same grain — mixing grains causes incorrect aggregations.
4. Schema Design
Star Schema
One central fact table surrounded by denormalised dimension tables. Simple joins, fast queries. Most common
DW pattern.
fact_sales
sale_id INT (PK)
date_key INT (FK → dim_date)
customer_key INT (FK → dim_customer)
product_key INT (FK → dim_product)
store_key INT (FK → dim_store)
amount DECIMAL
quantity INT

Snowflake Schema
Dimension tables are normalised into sub-dimensions. Saves storage but requires more joins. Less common in
modern cloud DWs where storage is cheap.
dim_product → dim_subcategory → dim_category
-- Instead of one flat dim_product with category columns

Galaxy / Fact Constellation Schema

Multiple fact tables sharing conformed dimension tables. Used when a DW covers more than one business
process.
fact_sales ─────┐
├── dim_date, dim_customer, dim_product
fact_returns ─────┘

💡 Prefer Star Schema for most DW projects — simpler SQL, better query optimiser hints, and easier for BI
tools to auto-generate joins.
5. Slowly Changing Dimensions (SCD)
SCDs handle how dimension data changes over time. The SCD type determines whether history is preserved.

Type Strategy History kept? Use when…

SCD Type 1 Overwrite old value with new value No History does not matter (fix typos,
correct data)
SCD Type 2 Add a new row for each change with Yes (full) Full history needed (customer
start/end date + is_current flag address, product price)
SCD Type 3 Add a new column for previous Partial Only previous value needed, not
value full history
SCD Type 4 Separate history table; current Yes Frequent changes; keep current
values in main dimension table fast

Example — SCD Type 2

-- dim_customer with SCD Type 2
customer_key INT -- surrogate PK
customer_id INT -- natural/source key
name VARCHAR(100)
city VARCHAR(100)
start_date DATE -- when this version became active
end_date DATE -- NULL means current record
is_current BIT -- 1 = active row

-- When customer moves city:

-- 1. UPDATE old row: end_date = today, is_current = 0
-- 2. INSERT new row: new city, start_date = today, end_date = NULL, is_current
= 1

💡 Always join fact tables to dimensions using the surrogate key (customer_key), not the natural key
(customer_id) — this ensures the fact row links to the correct historical version.
6. ETL vs ELT
ETL (Extract → Transform → Load) ELT (Extract → Load → Transform)
Transform where In a separate ETL server / tool Inside the target DW (SQL / Spark)
When to use Legacy on-prem DW, sensitive data Cloud DW with powerful compute
masking before load (Snowflake, BigQuery, Synapse)
Tools SSIS, Informatica, Talend, ADF Data dbt, Spark, Databricks SQL, BigQuery
Flows SQL
Advantage Data is clean before entering DW Simpler pipeline; DW compute does the
heavy lifting
Disadvantage ETL server becomes a bottleneck Raw/messy data lands in DW first

💡 Modern cloud DWs favour ELT — load raw data fast, then transform using SQL or dbt inside the
warehouse where compute scales elastically.
7. Surrogate Keys & Natural Keys
Key Type What it is Example
Natural Key Business identifier from the source system customer_id = 'CUST-7823' or
NationalID
Surrogate Key System-generated integer PK — customer_key = 10045
meaningless outside DW (IDENTITY / SEQUENCE)

Why use Surrogate Keys?

• Natural keys can change (customer number recycled, email changes)
• Integers join faster than strings or composite keys
• Support SCD Type 2 — multiple rows for one natural key need different surrogate keys
• Insulate DW from source system changes

Example — Generate Surrogate Keys

-- SQL Server / Synapse
customer_key INT IDENTITY(1,1) PRIMARY KEY

-- PostgreSQL
customer_key SERIAL PRIMARY KEY

-- Spark / Databricks
from [Link] import monotonically_increasing_id
df = [Link]('customer_key', monotonically_increasing_id())
8. Indexing in Data Warehouses
Index Type Best for Notes
Clustered Index Range scans on date/key columns Physically sorts table rows — only
one per table
Non-clustered Index Point lookups on foreign keys or Separate structure; multiple allowed
filters per table
Columnstore Index Analytical queries — Best for DW — compresses columns,
aggregations, scans batch reads
Bitmap Index Low-cardinality columns (gender, Used in Oracle DW; not in SQL
region) Server/Snowflake
Covering Index Queries that only need specific INCLUDE extra columns to avoid
columns table lookups

Example — Columnstore Index (SQL Server / Synapse)

-- Clustered columnstore on fact table (best for DW)
CREATE CLUSTERED COLUMNSTORE INDEX cci_fact_sales
ON fact_sales;

-- Non-clustered columnstore on existing heap/rowstore

CREATE NONCLUSTERED COLUMNSTORE INDEX ncci_sales_date
ON fact_sales (date_key, amount, quantity);

💡 Always use Clustered Columnstore Indexes on fact tables in SQL Server / Synapse. They compress data 5–
10x and speed up aggregation queries dramatically.
9. Partitioning
Partitioning splits a large table into smaller physical segments. Queries that filter on the partition column skip
irrelevant partitions (partition elimination).

Type How it works Best for

Range Rows assigned to partition based on Date columns — one partition per
value range month/year
List Rows assigned based on explicit value list Region, country, category — fixed set of
values
Hash Rows assigned by hash(column) mod N Even distribution when no natural
partitions range/list key
Round-Robin Rows assigned in rotation regardless of No filtering benefit — used for even
value distribution only

Example — Range Partitioning by Year (SQL Server)

-- Step 1: Partition function
CREATE PARTITION FUNCTION pf_sales_year (DATE)
AS RANGE RIGHT FOR VALUES ('2022-01-01','2023-01-01','2024-01-01');

-- Step 2: Partition scheme

CREATE PARTITION SCHEME ps_sales_year
AS PARTITION pf_sales_year ALL TO ([PRIMARY]);

-- Step 3: Create partitioned table

CREATE TABLE fact_sales (
sale_id INT, sale_dt DATE, amount DECIMAL
) ON ps_sales_year(sale_dt);

-- Partition elimination in action:

SELECT SUM(amount) FROM fact_sales WHERE sale_dt >= '2024-01-01'; -- reads only
2024 partition
10. Aggregations & Materialized Views
Pre-aggregating data avoids re-computing expensive GROUP BY operations on every query. Materialized Views
persist the result of a query as a physical table.

Approach Description Platform

Aggregate Table Manually created summary table populated All platforms
by ETL job
Materialized View DB auto-maintains aggregated result; Snowflake, BigQuery, Oracle
queries rewrite to use it
Indexed View SQL Server version of materialized view — SQL Server, Synapse
WITH SCHEMABINDING
Rollup Cube Pre-compute all GROUP BY combinations All SQL platforms
with ROLLUP / CUBE
Result Cache Warehouse caches query result; identical Snowflake, BigQuery
query returns instantly

Example — Materialized View (Snowflake)

CREATE MATERIALIZED VIEW mv_monthly_sales AS
SELECT
DATE_TRUNC('month', sale_dt) AS month,
product_key,
SUM(amount) AS total_sales,
COUNT(*) AS order_count
FROM fact_sales
GROUP BY 1, 2;

-- Queries on fact_sales with similar GROUP BY auto-use this MV

💡 Materialized Views in Snowflake and BigQuery are automatically refreshed and transparently used by the
query optimiser — no changes needed in your SQL queries.
11. Window Functions
Window functions perform calculations across a set of rows related to the current row — without collapsing
rows like GROUP BY does.

Function Purpose Example

ROW_NUMBER() Unique sequential number ROW_NUMBER() OVER (PARTITION BY dept
per partition ORDER BY salary DESC)
RANK() Rank with gaps for ties RANK() OVER (ORDER BY sales DESC) →
1,2,2,4
DENSE_RANK() Rank without gaps for ties DENSE_RANK() OVER (ORDER BY sales DESC)
→ 1,2,2,3
LAG(col,n) Value from n rows before LAG(amount,1) OVER (ORDER BY sale_dt)
current row
LEAD(col,n) Value from n rows after LEAD(amount,1) OVER (ORDER BY sale_dt)
current row
SUM() OVER() Running / cumulative total SUM(amount) OVER (PARTITION BY dept
ORDER BY sale_dt)
AVG() OVER() Moving average AVG(amount) OVER (ORDER BY sale_dt ROWS
BETWEEN 6 PRECEDING AND CURRENT ROW)
NTILE(n) Divide rows into n equal NTILE(4) OVER (ORDER BY salary) → quartile
buckets 1–4
FIRST_VALUE() First value in window FIRST_VALUE(amount) OVER (PARTITION BY
customer_key ORDER BY sale_dt)

Example — Deduplicate with ROW_NUMBER

-- Keep only the latest record per customer
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) AS rn
FROM staging_customers
)
SELECT * FROM ranked WHERE rn = 1;
12. Common DW SQL Patterns
Incremental Load — Watermark
-- 1. Get last loaded timestamp from control table
SELECT MAX(last_load_dt) FROM [Link] WHERE table_name = 'fact_sales';

-- 2. Load only new / changed rows

INSERT INTO fact_sales
SELECT * FROM [Link]
WHERE updated_at > '2024-06-01 00:00:00'; -- replace with watermark value

-- 3. Update watermark
UPDATE [Link] SET last_load_dt = GETDATE() WHERE table_name =
'fact_sales';

UPSERT / MERGE
MERGE INTO dim_customer AS target
USING staging_customer AS src
ON target.customer_id = src.customer_id AND target.is_current = 1
WHEN MATCHED AND [Link] <> [Link] THEN
UPDATE SET target.end_date = GETDATE(), target.is_current = 0 -- SCD2 expire
WHEN NOT MATCHED THEN
INSERT (customer_id, name, city, start_date, end_date, is_current)
VALUES (src.customer_id, [Link], [Link], GETDATE(), NULL, 1);

PIVOT
-- Turn row values into columns
SELECT * FROM (
SELECT region, product, amount FROM fact_sales
) src
PIVOT (
SUM(amount) FOR product IN ([Laptop],[Phone],[Tablet])
) AS pvt;
13. Data Vault Modelling
Data Vault is a DW modelling approach designed for agility, auditability, and handling multiple source systems. It
separates structure (Hubs, Links) from context (Satellites).

Component What it stores Example

Hub Unique business keys from source hub_customer (customer_hk,
systems customer_id, load_dt, record_src)
Link Relationships between Hubs link_order_customer (order_hk,
(associations) customer_hk, load_dt, record_src)
Satellite Descriptive attributes and history for sat_customer (customer_hk, name, city,
Hubs or Links load_dt, end_dt, record_src)

Data Vault vs Dimensional Model

Data Vault Dimensional Model (Kimball)
History All history by design in satellites Only if SCD Type 2 implemented
Flexibility Easy to add new sources / attributes Schema changes require more work
Query simplicity Complex — many joins Simple — star schema
Best for Enterprise DW, regulatory compliance Data marts, BI reporting

💡 Data Vault is great for raw vault layer (Silver) in a Medallion architecture. Build Business Vault or
Information Mart (Gold) on top for BI queries — simpler star schemas derived from Data Vault.
14. Modern Cloud DW Platforms
Platform Best for Key Feature
Snowflake Multi-cloud, data sharing, semi- Separate storage & compute; Time
structured data Travel; Zero-copy cloning
Google BigQuery Serverless analytics, massive scale, ML Serverless (no cluster mgmt); BQML;
inside SQL streaming inserts
Azure Synapse Microsoft ecosystem, Spark + SQL in Dedicated SQL Pools + Serverless
one platform SQL + Spark Pools
Amazon Redshift AWS ecosystem, tight S3 integration RA3 nodes with managed storage;
Redshift Spectrum for S3
Databricks SQL Lakehouse — SQL on Delta Lake with Delta Lake + Photon; Unity Catalog;
Photon engine open format
Clickhouse Real-time analytics, high-throughput Columnar, extremely fast
inserts aggregations on billions of rows

Quick Comparison — Key Differences

Feature Snowflake BigQuery Synapse Redshift
Pricing model Per second compute Per TB scanned DWU Per node-hour
+ storage / slot (dedicated) / per (RA3)
reservations TB (serverless)
Scaling Auto Fully serverless Manual scale Elastic resize
suspend/resume per — no config DWUs or (minutes)
query serverless
Semi-structured VARIANT type — JSON / ARRAY JSON via SUPER type for
native JSON/XML natively in SQL OPENJSON JSON
Time Travel Up to 90 days 7 days (table Not built-in (use Not built-in
snapshots) Delta)
15. Performance Optimization
Technique What it does Example / Note
Update Statistics Keeps query optimiser row count UPDATE STATISTICS fact_sales; or
estimates accurate ANALYZE TABLE
Partition Elimination Skip reading irrelevant partitions Filter on the partition column —
WHERE sale_dt >= '2024-01-01'
Columnstore Index Compress & batch-read columns CREATE CLUSTERED COLUMNSTORE
for aggregation queries INDEX ON fact_sales
Clustered Index Physically sort table for fast range Cluster on date_key for time-series
scans reads
Result Set Caching Return identical query instantly Snowflake & BigQuery cache query
from cache results automatically
Materialized / Indexed View Pre-compute expensive GROUP CREATE MATERIALIZED VIEW
BY mv_monthly_sales AS SELECT…
Avoid SELECT * Read only needed columns — SELECT sale_id, amount FROM
critical for columnar stores fact_sales (not *)
Avoid correlated subquery Rewrite as JOIN or CTE for set- Use CTE + JOIN instead of subquery
based execution in SELECT list
Query Hints (SQL Server) Override optimiser choices when WITH (NOLOCK), OPTION (HASH
plan is wrong JOIN), MAXDOP
Data Distribution Choose distribution key to avoid Synapse: DISTRIBUTE BY
data shuffling HASH(customer_key) on large fact
tables

Example — Common Optimisations

-- 1. Update stats after large loads
UPDATE STATISTICS fact_sales;

-- 2. Query only needed columns (columnar store benefit)

SELECT date_key, SUM(amount) FROM fact_sales
WHERE sale_dt >= '2024-01-01'
GROUP BY date_key;

-- 3. Synapse: Distributed table on fact join key

CREATE TABLE fact_sales
WITH (DISTRIBUTION = HASH(customer_key),
CLUSTERED COLUMNSTORE INDEX)
AS SELECT * FROM staging_sales;

-- 4. Avoid row-by-row RBAR (Row By Agonising Row)

-- Bad: CURSOR / WHILE loop updating one row at a time
-- Good: Single UPDATE or MERGE statement on the whole set

💡 In Synapse Analytics and Redshift, choosing the right distribution key is the single biggest performance
decision — distribute large fact tables on the column most commonly used in JOINs.

— End of Data Warehousing Notes —

DWH Bosscoder
No ratings yet
DWH Bosscoder
22 pages
Data Warehousing Essentials Guide
No ratings yet
Data Warehousing Essentials Guide
14 pages
Dimensional Modeling and ETL Overview
No ratings yet
Dimensional Modeling and ETL Overview
38 pages
Data Warehouse and Its Components Guide
No ratings yet
Data Warehouse and Its Components Guide
31 pages
Data Warehousing and ETL Fundamentals
No ratings yet
Data Warehousing and ETL Fundamentals
31 pages
Lecture 9 - Data Warehouse
No ratings yet
Lecture 9 - Data Warehouse
57 pages
Data Warehouse Architecture Overview
No ratings yet
Data Warehouse Architecture Overview
5 pages
Understanding Data Warehousing Basics
No ratings yet
Understanding Data Warehousing Basics
65 pages
Data Warehouse Concepts and Architecture
100% (1)
Data Warehouse Concepts and Architecture
32 pages
Text To PDF Aon
No ratings yet
Text To PDF Aon
18 pages
Data Warehousing & Business Intelligence
No ratings yet
Data Warehousing & Business Intelligence
32 pages
StudentID in Fact and Dimension Tables
No ratings yet
StudentID in Fact and Dimension Tables
18 pages
In 4440 - Lesson 01
No ratings yet
In 4440 - Lesson 01
36 pages
Data Warehousing and Analytics Overview
No ratings yet
Data Warehousing and Analytics Overview
63 pages
Data Warehousing Overview and Concepts
No ratings yet
Data Warehousing Overview and Concepts
5 pages
BOE095 DataWarehousing DataMining Notes
No ratings yet
BOE095 DataWarehousing DataMining Notes
16 pages
Data Warehouse: Architecture & Trends
No ratings yet
Data Warehouse: Architecture & Trends
6 pages
Data Warehouse Design Fundamentals
No ratings yet
Data Warehouse Design Fundamentals
87 pages
DW-BI - Type of Schemas
No ratings yet
DW-BI - Type of Schemas
50 pages
Understanding Data Warehouse Dimensions
No ratings yet
Understanding Data Warehouse Dimensions
85 pages
DWM (Full)
No ratings yet
DWM (Full)
50 pages
Data Warehousing & ETL Exam Notes
No ratings yet
Data Warehousing & ETL Exam Notes
3 pages
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
75 pages
Data Warehouse for Decision Support
No ratings yet
Data Warehouse for Decision Support
49 pages
Data Warehousing vs Operational Databases
No ratings yet
Data Warehousing vs Operational Databases
35 pages
Data Warehouse and ETL Concepts Explained
No ratings yet
Data Warehouse and ETL Concepts Explained
5 pages
DW Unit1 Notes
No ratings yet
DW Unit1 Notes
21 pages
ETL and Data Warehouse Schema Overview
No ratings yet
ETL and Data Warehouse Schema Overview
14 pages
Data Warehouse Fundamentals and Architecture
No ratings yet
Data Warehouse Fundamentals and Architecture
21 pages
Advanced Data Warehousing Course Overview
No ratings yet
Advanced Data Warehousing Course Overview
4 pages
Understanding Degenerate Dimensions
No ratings yet
Understanding Degenerate Dimensions
34 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
13 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
6 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
102 pages
Bisnis Intellegent - Week 4
No ratings yet
Bisnis Intellegent - Week 4
33 pages
Greenplum Data Warehouse Overview
No ratings yet
Greenplum Data Warehouse Overview
573 pages
Data Warehousing Concepts
No ratings yet
Data Warehousing Concepts
25 pages
Data Warehouse Architecture Overview
No ratings yet
Data Warehouse Architecture Overview
19 pages
Data Warehouse Essentials for Business Insights
No ratings yet
Data Warehouse Essentials for Business Insights
9 pages
DW DM Exam Answers
No ratings yet
DW DM Exam Answers
35 pages
Data Modeling For Data Engineering - 1
No ratings yet
Data Modeling For Data Engineering - 1
10 pages
Understanding Multidimensional Modeling
No ratings yet
Understanding Multidimensional Modeling
29 pages
Data Warehousing Exam Notes Guide
No ratings yet
Data Warehousing Exam Notes Guide
28 pages
DWDM Exam Style Answers and Architecture
No ratings yet
DWDM Exam Style Answers and Architecture
39 pages
Data Warehouse Concepts and Architecture
No ratings yet
Data Warehouse Concepts and Architecture
132 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
11 pages
Data Warehouse Overview and Architecture
No ratings yet
Data Warehouse Overview and Architecture
77 pages
Understanding Data Warehouses and Lakes
No ratings yet
Understanding Data Warehouses and Lakes
29 pages
Data Warehousing Fundamentals Explained
No ratings yet
Data Warehousing Fundamentals Explained
26 pages
Data Warehouse Concepts and Terminology
No ratings yet
Data Warehouse Concepts and Terminology
39 pages
Data Warehouse Overview and Architecture
No ratings yet
Data Warehouse Overview and Architecture
55 pages
Data Warehouse Design and Architecture
No ratings yet
Data Warehouse Design and Architecture
81 pages
Data Warehousing Concepts Explained
No ratings yet
Data Warehousing Concepts Explained
14 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
8 pages
Data Warehouse Concepts Explained
No ratings yet
Data Warehouse Concepts Explained
7 pages
Understanding Data Warehousing Concepts
No ratings yet
Understanding Data Warehousing Concepts
90 pages
Inmon Data Warehouse Model
No ratings yet
Inmon Data Warehouse Model
4 pages
Key Components of Data Warehousing
No ratings yet
Key Components of Data Warehousing
11 pages
Understanding Data Warehousing Concepts
100% (1)
Understanding Data Warehousing Concepts
44 pages
Windows 7 Activation Keys List
No ratings yet
Windows 7 Activation Keys List
6 pages
Essential Keyboard Shortcuts for Windows & Mac
No ratings yet
Essential Keyboard Shortcuts for Windows & Mac
17 pages
MR-J4-A-RJ Positioning Function Overview
No ratings yet
MR-J4-A-RJ Positioning Function Overview
16 pages
k-Rotation Symmetric Boolean Functions
No ratings yet
k-Rotation Symmetric Boolean Functions
30 pages
Performance Work Statement Overview
No ratings yet
Performance Work Statement Overview
10 pages
Coway Sales Monitoring File Access Guide
No ratings yet
Coway Sales Monitoring File Access Guide
6 pages
LLM Security and Privacy Challenges
No ratings yet
LLM Security and Privacy Challenges
15 pages
Akash Software DevOps Expertise
No ratings yet
Akash Software DevOps Expertise
3 pages
PLAXIS General Information Manual
No ratings yet
PLAXIS General Information Manual
23 pages
Track Your Android Phone by IMEI
No ratings yet
Track Your Android Phone by IMEI
9 pages
I.T. Devices Quiz Questions
No ratings yet
I.T. Devices Quiz Questions
2 pages
Basic Word Processing Skills Lesson
No ratings yet
Basic Word Processing Skills Lesson
10 pages
Types of Mathematical Functions Explained
No ratings yet
Types of Mathematical Functions Explained
24 pages
Linux Shell Variables and Control Statements
No ratings yet
Linux Shell Variables and Control Statements
23 pages
Evolution of Information Technology
No ratings yet
Evolution of Information Technology
8 pages
Understanding Rewriting Systems in Language
No ratings yet
Understanding Rewriting Systems in Language
7 pages
Understanding Karnaugh Maps for Logic Simplification
No ratings yet
Understanding Karnaugh Maps for Logic Simplification
3 pages
Understanding Cyber Warfare Dynamics
No ratings yet
Understanding Cyber Warfare Dynamics
15 pages
Network Discovery Lab Guide
No ratings yet
Network Discovery Lab Guide
7 pages
Best 10 Sites To Buy Telegram Accounts in 2025
No ratings yet
Best 10 Sites To Buy Telegram Accounts in 2025
7 pages
Problem-Solving and Programming Basics
No ratings yet
Problem-Solving and Programming Basics
31 pages
CapCut Video Editing Tutorial - Beginner To Advanced CapCut Skill
No ratings yet
CapCut Video Editing Tutorial - Beginner To Advanced CapCut Skill
8 pages
Summit Switch Models Installation Guide
No ratings yet
Summit Switch Models Installation Guide
85 pages
Install Active Directory on Windows Server
No ratings yet
Install Active Directory on Windows Server
19 pages
Overview of FTP and HTTP Protocols
No ratings yet
Overview of FTP and HTTP Protocols
10 pages
Extract SMS from Encrypted iTunes Backups
No ratings yet
Extract SMS from Encrypted iTunes Backups
15 pages
Database and BI Developer Resume
No ratings yet
Database and BI Developer Resume
2 pages
FBS Mapping Code for SAP FM
No ratings yet
FBS Mapping Code for SAP FM
2 pages
Lightweight Key Agreement for IoT Fog Security
No ratings yet
Lightweight Key Agreement for IoT Fog Security
15 pages
Cloud Accounting Models Explained
No ratings yet
Cloud Accounting Models Explained
15 pages

DataWarehousing Notes

Uploaded by

DataWarehousing Notes

Uploaded by

Data Warehousing

Quick Reference Notes

Feature OLTP (Operational DB) OLAP (Data Warehouse)

Medallion / Layered Architecture

Concept What it is Example

Galaxy / Fact Constellation Schema

Type Strategy History kept? Use when…

Example — SCD Type 2

-- When customer moves city:

Why use Surrogate Keys?

Example — Generate Surrogate Keys

Example — Columnstore Index (SQL Server / Synapse)

-- Non-clustered columnstore on existing heap/rowstore

Type How it works Best for

Example — Range Partitioning by Year (SQL Server)

-- Step 2: Partition scheme

-- Step 3: Create partitioned table

-- Partition elimination in action:

Approach Description Platform

Example — Materialized View (Snowflake)

-- Queries on fact_sales with similar GROUP BY auto-use this MV

Function Purpose Example

Example — Deduplicate with ROW_NUMBER

-- 2. Load only new / changed rows

Component What it stores Example

Data Vault vs Dimensional Model

Quick Comparison — Key Differences

Example — Common Optimisations

-- 2. Query only needed columns (columnar store benefit)

-- 3. Synapse: Distributed table on fact join key

-- 4. Avoid row-by-row RBAR (Row By Agonising Row)

— End of Data Warehousing Notes —

You might also like