0% found this document useful (0 votes)
9 views10 pages

Data Warehouse Normalization Explained

Name : Muhammad Younus Semester: 8 th Roll#: 16BS03 Subject: Data Warehouse Normalization is the process of organizing data in a database to minimize redundancy. It divides larger tables into smaller tables and links them using relationships. Normalization reduces data redundancy through three normal forms - first normal form requires single-valued attributes, second normal form removes partial dependencies, and third normal form removes transitive dependencies.

Uploaded by

younus hassani
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Data Warehouse Normalization Explained

Name : Muhammad Younus Semester: 8 th Roll#: 16BS03 Subject: Data Warehouse Normalization is the process of organizing data in a database to minimize redundancy. It divides larger tables into smaller tables and links them using relationships. Normalization reduces data redundancy through three normal forms - first normal form requires single-valued attributes, second normal form removes partial dependencies, and third normal form removes transitive dependencies.

Uploaded by

younus hassani
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Name : Muhammad Younus

Semester: 8 th

Roll#: 16BS03
Subject: Data Warehouse
Normalization

 Normalization is the process of organizing the data in the database.


 Normalization is used to minimize the redundancy from a relation or set of
relations.
 Normalization divides the larger table into the smaller table and links them
using relationship.
Why we use normalization?
We use normalization to reduce and eliminate data redundancy, an
important consideration for application developers because ti is incredibly
difficult to store objects in a relation database that maintains the same
information in several places.
Unorganized table
Why we use normalization?

In normalize form.
First Normal form (1NF)

For a table to be in the first normal form, it should follow the following 4 rules.
I. It should have single (atomic) valued attributes column.
II. Values stored in the column should be of the same domain.
III. All the columns in a table should have unique names.
IV. And the order in which data is stored does not matter.
First Normal form (1NF)
Unorganized relation We re-arrange the relation (table) as
below, to convert it to First Normal Form.
Relation in 1NF

Each attribute must contain only a single


value from its pre-defined domain.
Second Normal Form (2NF)
For a table to be in the Second normal form.
I. It should be in the first normal form.
II. And it should not have partial dependency.
Example: Relation not in 2NF

We see here in Student_Project relation that the prime key attributes are Stu_ID
and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and
Proj_Name must be dependent upon both and not on any of the prime key
attribute individually. But we find that Stu_Name can be identified by Stu_ID
and Proj_Name can be identified by Proj_ID independently. This is called
partial dependency, which is not allowed in Second Normal Form.
Second Normal Form (2NF)
Relation in 2NF
We broke the relation in two as depicted in the above picture. So there exists
no partial dependency.
Third Normal Form (3NF)
For a table to be in the third normal form.
I. It should be in the second normal form.
II. And it should not have transitive dependency.
Example: Relation not in 3NF

We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency.
Third Normal Form (3NF)
To bring this relation into third normal form, we break the relation into two
relations as follows −
Relation in 3NF

Common questions

Powered by AI

To achieve First Normal Form (1NF), a table must ensure that each column contains atomic values, all values are of the same domain, columns have unique names, and the order of data does not matter . In contrast, achieving Second Normal Form (2NF) requires the table to be in 1NF and also eliminate partial dependencies, where non-key attributes cannot depend on part of a composite primary key. This means all non-key attributes must be fully dependent on the entire key rather than any subset .

Normalization aims to reduce and eliminate data redundancy by organizing data in a way that minimizes repeated data entries. This process involves dividing larger tables into smaller, related tables, thus improving data integrity by ensuring that data updates, deletions, or insertions only need to occur in one place. By minimizing redundancy and ensuring consistent data organization, normalization enhances storage efficiency, helping to manage storage costs and increasing retrieval performance .

Achieving First Normal Form (1NF) lays the groundwork for subsequent normalization steps by ensuring that data is structured into atomic units with consistent domains and unique column names. This foundation prevents initial structural complexity and ambiguity, allowing further refinement processes like eliminating partial (2NF) and transitive dependencies (3NF) to be applied systematically. This layered approach ensures data is logically organized and prepared for complex relational database operations .

Partial dependency occurs when a non-key attribute is dependent on only a part of a composite primary key rather than the entire key, which is a violation of Second Normal Form (2NF). This is problematic because it can lead to redundancy and anomalies in updates, inserts, or deletions. For example, in a Student_Project relation where Stu_ID and Proj_ID form the composite primary key, if Stu_Name depends only on Stu_ID, it creates redundancy since changing Stu_Name would require updates throughout the entire database wherever Stu_ID appears .

Ensuring that all columns in a table have unique names when aiming for First Normal Form (1NF) is crucial for eliminating ambiguity in data retrieval and manipulation. Unique column names prevent confusion in queries that involve column identification and allow for precise data operations. This is essential for maintaining the integrity and clarity of data within the database .

Having a dataset not fully normalized implies significant challenges in maintaining data integrity and consistency. Non-normalized datasets lead to data redundancy, which increases storage costs and complicates maintenance. Update, insertion, and deletion anomalies are more frequent, causing inconsistencies across the database. Developers might require additional code logic to handle these issues, increasing complexity and the likelihood of errors in application development .

Normalization divides larger tables into smaller, related tables to eliminate redundancy, organize data more logically, and increase consistency and integrity across the dataset. This process enhances data retrieval efficiency, reduces the likelihood of anomalies during data modifications, and supports scalability as systems grow. As a result, applications can perform faster queries and require fewer resources, contributing to better overall performance .

A real-world scenario where failing to achieve Third Normal Form (3NF) can impact database performance is a retail company's inventory system. If the product details (such as location and salesperson information) depend on both product ID and another non-key attribute like category, a transitive dependency exists. This situation can cause performance issues as updates to salesperson details require complex operations across multiple tables, leading to slow retrieval times, increased risk of data anomalies, particularly if a salesperson moves departments resulting in inconsistent data unless manually updated everywhere .

Normalization enhances application development in relational databases by significantly reducing data redundancy, which simplifies data management and reduces storage costs. By organizing data efficiently and reducing duplications, developers can focus on writing cleaner, less error-prone code. This also lowers the complexity of maintaining consistency across the database, as changes made in one location automatically propagate through related tables .

To achieve Third Normal Form (3NF), a table must first be in Second Normal Form (2NF) and also eliminate transitive dependencies, where a non-key attribute depends on another non-key attribute rather than directly on the primary key. For example, if a Student_detail relation uses Stu_ID as the primary key but also associates City through Zip, which independently depends on Stu_ID, this creates a transitive dependency Stu_ID → Zip → City. By breaking the table into two relations, one for Stu_ID and Zip and another for Zip and City, you eliminate these dependencies, ensuring only direct dependency on primary keys .

You might also like