1.
What is the role of Data Warehousing in Decision Support Systems
(DSS)?
● Answer:
○ Consolidates historical data from multiple sources.
○ Makes data readily available for complex analysis.
○ Supports OLAP for decision-making (both strategic and tactical).
2. What are the primary components of a Data Warehouse architecture?
● Answer:
1. Data sources: Operational systems and external data.
2. ETL (Extract, Transform, Load): The process for moving data into the
warehouse.
3. Data Warehouse: Central storage for processed data.
4. Data Marts: Subsets of data for specific departments.
3. What is data warehouse modeling, and what are the common models
used?
● Answer:
○ Data warehouse modeling: The design of schemas for efficient data
organization.
○ Common models:
1. Star Schema: Central fact table linked to dimension tables.
2. Snowflake Schema: A normalized version of the star schema.
3. Galaxy Schema: Combination of multiple star schemas.
4. What is granularity in the context of data warehousing, and why is it
important?
● Answer:
○ Granularity: Level of detail in the data stored (fine vs. coarse).
○ High granularity provides detailed data; low granularity provides aggregated data.
○ Affects both performance and analytical depth.
5. What are the factors that affect granularity in a data warehouse?
● Answer:
1. Business requirements: Determines the level of detail needed.
2. Storage costs: More granular data requires more storage.
3. Query performance: Fine granularity may slow down queries.
6. What are the key steps in building a data warehouse?
● Answer:
1. Data integration from multiple sources.
2. Data cleaning and transformation for consistency.
3. Schema design for optimized queries.
4. Storage system selection (cloud or on-premise).
5. Addressing challenges: data volume, consistency, and performance.
7. What is the difference between OLAP and OLTP systems?
● Answer:
○ OLTP (Online Transaction Processing): Handles transactional operations for
real-time consistency (e.g., ATM systems).
○ OLAP (Online Analytical Processing): Designed for complex analysis of large
datasets (e.g., Netflix recommendations).
8. What is the ETL process, and why is it important in data warehousing?
● Answer:
○ Extract: Gather data from various sources.
○ Transform: Clean, aggregate, and format data.
○ Load: Store the processed data in the data warehouse.
○ Ensures data quality, accessibility, and readiness for analysis.
9. What are the three types of OLAP servers, and what are the key
differences between them?
● Answer:
1. ROLAP (Relational OLAP): Uses relational databases, suitable for large
volumes.
2. MOLAP (Multidimensional OLAP): Uses multidimensional cubes for fast query
processing.
3. HOLAP (Hybrid OLAP): Combines ROLAP and MOLAP for flexibility.
10. What are the challenges and potential solutions for each tier in a
multi-tier data warehouse architecture?
● Answer:
○ Bottom Tier (Data Sources and Storage):
■ Challenges: Diverse data sources and scalable storage.
■ Solutions: Use robust databases and scalable cloud storage.
○ Middle Tier (OLAP Servers):
■ Challenges: Slow query processing for large datasets.
■ Solutions: Optimize OLAP models like MOLAP and HOLAP.
○ Top Tier (BI Tools):
■ Challenges: Providing fast, user-friendly access to data.
■ Solutions: Integrate advanced BI tools like IBM Cognos and Microsoft BI.
11. What is a Star Schema, and how is it used in data warehousing?
● Answer:
○ A Star Schema has a central fact table linked to dimension tables.
○ It is used for fast querying and efficient data analysis.
12. What is a Snowflake Schema, and how does it differ from a Star
Schema?
● Answer:
○ Snowflake Schema: A normalized version of the star schema, with dimension
tables broken into multiple related sub-tables.
○ Difference: Snowflake reduces redundancy, but is more complex than the Star
Schema.
13. What is a Galaxy Schema, and when is it used in data warehousing?
● Answer:
○ A Galaxy Schema (Fact Constellation Schema) uses multiple fact tables that
share common dimension tables.
○ Used for large, complex data warehouses with multiple business processes (e.g.,
sales and inventory).
14. What are the advantages and disadvantages of Star, Snowflake, and
Galaxy schemas?
● Answer:
○ Star Schema:
■ Advantages: Simple, fast querying.
■ Disadvantages: Data redundancy.
○ Snowflake Schema:
■ Advantages: Reduces redundancy, handles complex hierarchies.
■ Disadvantages: Slower performance, more complex queries.
○ Galaxy Schema:
■ Advantages: Suitable for complex multi-business process analysis.
■ Disadvantages: Complex to design and maintain.