MODULE 4: NORMALISATION
● Inference Rule
● DBMS Normalisation
● 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Functional Dependency
● Relational Decomposition
● Multivalued Dependency
● Join Dependency
● Inclusion Dependence
● Canonical Cover
INTRODUCTION
Normalization is a systematic approach to organizing data in a database to minimize
redundancy and ensure data integrity. It divides large tables into smaller ones and
defines relationships among them. This module covers key concepts like functional
dependency, normalization rules, and dependency types.
INFERENCE RULES
Inference rules in databases are also known as Armstrong’s Axioms in Functional
Dependency. These rules govern the functional dependencies in a relational
database. From inference rules a new functional dependency can be derived. These
rules were introduced by William W. Armstrong. They ensure that we can simplify
and validate the dependencies to reduce redundancy.
Key Rules:
Reflexive Rule: According to this rule, if B is a subset of A Then A logically
determines B. Formally, B ⊆ A then A → B. Example: Let us take an example
1
of the Address (A) of a house, which contains so many parameters like House no,
Street no, City etc. These all are the subsets of A. Thus, address (A) → House
no. (B).
Augmentation Rule: It is also known as Partial dependency. According to this
rule, If A logically determines B, then adding any extra attribute doesn't change
the basic functional dependency. Example: A → B, then adding any extra
attribute let say C will give AC → BC and doesn't make any change.
Transitive rule: Transitive rule states that if A determines B and B determines
C, then it can be said that A indirectly determines B. Example: If A → B and B
→ C then A → C.
NORMALIZATION
• Normalization in Database Management Systems is the process of organizing data
in a database. It involves structuring data in a way that minimizes data redundancy
and emphasizes data integrity
• Data Redundancy is the unnecessary repetition of data.
• Data Integrity is the trustworthiness of data throughout its life cycle.
• The main goal of normalization is to avoid anomalies from additions, deletions
and modifications by allowing the aforementioned processes to be performed in one
table and then propagated through out the database through relationships.
Normalization allows the generation of a suitable model for database design.
• Database design refers to the organization of data according to a database model
i.e. the set of procedures and tasks involved in implementing a database.
• In the process of developing a database, the sequence of steps involves data
modeling, normalization and database design lastly although these steps often
overlap.
• Data modeling is the first step where you define how data is connected and what
it looks like. This stage involves identifying the key data entities and their
relationships. It’s about constructing a conceptual model, usually represented
through entity-relationship diagrams (ERDs) or UML diagrams. The goal here is to
2
map out the general data structures needed to support the requirements of the
business or application.
Normalization is applied to the data structures to reduce data redundancy and
improve data integrity. This process ensures that the database follows rules that
reduce anomalies and enhance the stability and efficiency of the database.
Normalization typically reshapes the data model by adjusting how data is stored in
tables and how the tables relate to each other.
• Finally, the detailed database design takes place, which includes the physical
implementation of the database. This step involves converting the conceptual and
logical design (from the data modeling and normalization steps) into a physical
database structure. It includes defining the actual database schema, creating tables,
indexes, constraints, and establishing security settings. This step also considers the
specific DBMS (Database Management System) features and optimizations that
could be utilized.
Objectives of Normalization
• Reduce Redundancy: Eliminate duplicate data, which not only reduces storage
needs but also minimizes the risk of inconsistencies cropping up in the database.
• Eliminate Anomalies: Make the database structure robust against anomalies in
transaction processing. This includes update, insert, and delete anomalies.
• Increase Data Integrity: Ensure the accuracy and consistency of data within the
database.
• Optimize Queries: Better organized data can be accessed quicker and more
efficiently, improving the performance of database queries.
Anomalies refer to problems that arise in a database due to poor database
design particularly in terms of how data is stored and managed. These anomalies can
lead to inconsistent, redundant, or inaccurate data, and generally manifest in three
primary types: insertion, update, and deletion anomalies. Here's a general
breakdown:
Insertion Anomalies: These occur when you are unable to insert data into the
database because other, perhaps unrelated data is not present or because the
3
schema restricts it. This can lead to situations where complete and necessary data
cannot be stored without the presence of other optional or currently unknown
data.
Update Anomalies: These occur when changes to data in one part of the database
do not propagate correctly or completely to other parts of the database that should
reflect the same data. This leads to inconsistency and can make the database
reflect outdated or conflicting information.
Deletion Anomalies: These occur when the deletion of data intended to remove
one piece of information unintentionally results in the removal of valuable data
that could be used in other contexts. This can happen when multiple pieces of
information are unnecessarily combined into a single record.
Addressing these anomalies through proper database normalization is crucial for
maintaining the integrity, reliability, and efficiency of data management within an
organization. The process of normalization adjusts the schema to ensure that each
type of data is stored logically and independently from unrelated data, which helps
in minimizing these problems and supporting the robustness of database operations.
Normalization tackles these issues by structuring data into logical relationships that
minimize redundancy. Normalization involves several stages called "normal forms,"
each imposing stricter conditions and designed to check for a particular type of
redundancy and data dependency. The most commonly used normal forms are:
First Normal Form (1NF): Ensures that the table has a primary key and that all
columns contain atomic (indivisible) values, meaning no repeating groups or
arrays hence each record within a table is unique.
Second Normal Form (2NF): Achieved when a table is in 1NF and all non-key
attributes are fully dependent on the primary key alone, not just a part of it. This
eliminates partial dependency.
Third Normal Form (3NF): A table is in 3NF if it is in 2NF and all its columns
are not only fully dependent on the primary key but also non-transitively
dependent (i.e., no in-direct dependency on the primary key).
4
Boyce-Codd Normal Form (BCNF): Sometimes considered an extension of
3NF, BCNF is stronger than 3NF. A table is in BCNF if, for every one of its
dependencies (A → B), A is a superkey—a subset of a key of the table.
Further forms like 4NF and 5NF address more complex situations involving
multi-valued dependencies and join dependencies, but they are less commonly
applied in practical scenarios.
Example:
You manage a database for a university that tracks students, courses, and their
enrollment details. The goal is to normalize the data step by step to eliminate
redundancy and anomalies.
Consider this simple table that is not normalized:
1NF (First Normal Form)
To achieve 1NF, ensure each column has atomic values (no multivalued attributes).
Split the repeating groups into separate rows.
StudentID Name Course Instructor
1 Alice Math John
1 Alice Science Smith
2 Bob Science Smith
2 Bob English Carter
2NF (Second Normal Form)
For a table to be in 2NF, a table has to be in 1NF and there are no partial
dependencies in it. A partial dependency occurs when a non-prime attribute
depends on part of a composite key.
5
In the table above, Name depends only on StudentID, not the combination of
StudentID and Course.
Decompose into two tables:
Students Table:
StudentID Name
1 Alice
2 Bob
Enrollment Table:
StudentID Course Instructor
1 Math John
1 Science Smith
2 Science Smith
2 English Carter
3NF (Third Normal Form)
To achieve 3NF, remove transitive dependencies. A transitive dependency exists
when a non-prime attribute depends on another non-prime attribute.
Here, Instructor depends on Course, not directly on StudentID.
6
Decompose into three tables:
Students Table:
StudentID Name
1 Alice
2 Bob
Courses Table:
Course Instructor
Math John
Science Smith
English Carter
Enrollment Table:
StudentID Course
1 Math
1 Science
2 Science
2 English
BCNF (Boyce-Codd Normal Form)
To achieve BCNF, ensure every determinant is a candidate key.
The current design already satisfies BCNF because all functional
dependencies are based on candidate keys.
7
Fourth Normal Form (4NF)
A table is in 4NF if it is in BCNF and has no multi-valued dependencies
(MVDs).
MVD occurs when one attribute depends on another independently of other
attributes.
If a table has multiple independent many-to-many relationships, it should be
split into separate tables.
Example (Before 4NF - MVD present):
Student Course Hobby
Alex Math Guitar
Alex Math Painting
Solution (After 4NF - Separate tables):
Student_Course Table:
Student Course
Alex Math
Student_Hobby Table:
Student Hobby
Alex Guitar
Alex Painting
Fifth Normal Form (5NF) (Project-Join Normal Form)
A table is in 5NF if it is in 4NF and cannot be further decomposed
without losing information.
It deals with join dependencies, where a table contains indirect
relationships between attributes.
If a table contains multiple independent relationships, it is split into three or
more tables to avoid redundancy.
Example (Before 5NF - Join Dependency present):
Employee Project Vendor
Alice Alpha IBM
Alice Beta HP
8
Solution (After 5NF - Further split):
Employee_Project Table:
Employee Project
Alice Alpha
Alice Beta
Project_Vendor Table:
Project Vendor
Alpha IBM
Beta HP
Benefits of Normalization
Eliminates Redundancy: Data is stored only once, reducing duplication.
Prevents Anomalies: Avoids insertion, deletion, and update anomalies.
Improves Data Integrity: Relationships between data are clear and
consistent.
FUNCTIONAL DEPENDENCY
Functional dependency is a relationship between two sets of attributes in a database,
where the value of one attribute (or set of attributes) is determined by the value of
another attribute (or set of attributes). It means that for each value of the determining
attribute, there is precisely one possible value of the determined attribute(s)
For example, in a student database, if you know the StudentID, you know the
StudentName expressed as StudentID -> StudentName
9
RELATIONAL DECOMPOSITION
Relational Decomposition refers to the process of breaking down a complex relation
(table) into smaller, simpler relations while preserving the essential information and
ensuring no data anomalies. This process is crucial in database normalization,
aiming to eliminate redundancy and dependency issues.
Goals of Relational Decomposition
Preserve Data Dependency: Ensure that functional dependencies in the original
relation are maintained in the decomposed relations.
Lossless Join: Guarantee that no data is lost during decomposition. The original
table can be reconstructed using the decomposed tables.
Dependency Preservation: Maintain all functional dependencies to avoid the
need for costly joins during queries.
Types of Relational Decomposition
Lossless-Join Decomposition:
Ensures that the original relation can be perfectly reconstructed by joining the
decomposed relations. A decomposition is lossless if at least one attribute is common
and is a candidate key in at least one of the decomposed relations.
Example:
Original Relation:
StudentID Name Course
1 Alice Math
2 Bob Science
10
Decomposed Relations:
Students:
StudentID Name
1 Alice
2 Bob
Enrollments:
StudentID Course
1 Math
2 Science
The join of these two tables recreates the original relation.
Dependency-Preserving Decomposition:
Ensures that all functional dependencies in the original relation are preserved in the
decomposed relations.
Functional dependencies should be enforceable without recombining the relations.
Steps in Relational Decomposition
Identify Functional Dependencies (FDs): Analyze the table to determine all
functional dependencies.
Apply Decomposition Rules: Split the table based on normalization
requirements (1NF, 2NF, 3NF, BCNF).
Test for Lossless Join: Verify that the decomposed relations can reconstruct the
original table.
Check Dependency Preservation: Ensure that all functional dependencies are
maintained across the decomposed relations.
11
Advantages of Relational Decomposition
Reduces Redundancy: Eliminates duplicate data, saving storage and
preventing inconsistencies.
Improves Query Performance: Smaller tables are faster to query.
Eliminates Anomalies: Prevents insertion, update, and deletion anomalies.
Challenges in Relational Decomposition
Complexity: Decomposition and normalization require careful analysis and
can increase design complexity.
Performance Trade-offs: Excessive decomposition can lead to frequent
joins, affecting performance.
Dependency Management: Ensuring dependency preservation across
multiple tables can be challenging.
MULTIVALUED DEPENDENCY (MVD)
A MVD is a type of dependency in a relational database where one attribute in a
table is independently associated with multiple values of another attribute, while
preserving other attributes in the table. It typically occurs when a table contains
attributes that do not depend on each other but share a common primary key.
Definition
For a relation R(A,B,C) a multivalued dependency A↠B exists if:
For a fixed value of A, the set of values for B is independent of the set of
values for C.
Properties of MVD
Symmetry: If A↠B, then A↠C also holds, where C is all other attributes except A and B.
Non-triviality: An MVD A↠B is non-trivial if B is not a subset of A and A∪B≠R.
Transitivity: If A↠B and B↠C, then A↠C.
12
Example of MVD
Consider a table of Students:
StudentID Subject Hobby
1 Math Painting
1 Science Painting
1 Math Reading
1 Science Reading
Here, StudentID ↠ Subject because each student can independently choose
multiple subjects.
Similarly, StudentID ↠ Hobby because each student can have multiple
hobbies independent of their chosen subjects.
Use of MVD
Multivalued dependencies are crucial in database normalization, especially in
achieving Fourth Normal Form (4NF), which eliminates redundancy caused by
MVDs.
JOIN DEPENDENCY (JD)
A Join Dependency (JD) occurs when a table can be reconstructed by joining
multiple tables without any loss of information. It is a generalization of
Multivalued Dependency (MVD) and is used to identify cases where a relation is
better decomposed into smaller relations.
Definition
For a relation R and its decomposition into sub-relations R1,R2,…,Rn , a Join
Dependency exists if:
13
The original table R can be perfectly reconstructed by performing a natural
join on R1,R2,…,Rn.
Example of JD
Consider a relation R(Teacher,Course,Department):
Teacher Course Department
Alice Math Science
Bob Physics Science
Alice Chemistry Science
This can be decomposed into:
1. R1(Teacher,Course):
Teacher Course
Alice Math
Bob Physics
Alice Chemistry
2. R2(Teacher,Department):
Teacher Department
Alice Science
Bob Science
Rejoining R1 and R2 will perfectly reconstruct the original relation R, satisfying
the Join Dependency.
14
Use of JD
Join Dependency is essential in achieving Fifth Normal Form (5NF), ensuring
that a table is decomposed to eliminate redundancy without losing any information.
INCLUSION DEPENDENCE
Inclusion Dependence is a type of constraint in a relational database that expresses
a relationship between two sets of attributes across two tables, ensuring that values
in one set are included in the values of another set. It is used to maintain referential
integrity between relations.
Key Features:
1. Referential Integrity: Inclusion dependence enforces that foreign key
values in one table match primary key values in another.
2. Notation: If R[X]⊆S[Y], it implies that every value of attribute X in relation
R must exist in attribute Y of relation S.
3. Real-world Example: In a school database:
o Students(StudentID, Name, ClassID)
o Classes(ClassID, ClassName, TeacherID)
Inclusion dependence ensures that every ClassID in Students exists in
Classes.
Example:
Consider the following tables:
Table 1: Employees
Employee_ID Name Department_ID
101 Alice 1
102 Bob 2
15
Table 2: Departments
Department_ID Department_Name
1 HR
2 IT
Inclusion dependence ensures that all Department_ID values in Employees exist in
Departments.
Significance:
Prevents inconsistencies records.
Enforces the logical relationships between tables in a database.
CANONICAL COVER
A Canonical Cover is a simplified representation of a set of functional
dependencies in a database. It removes redundancy while preserving the original
meaning and constraints.
Steps to Compute Canonical Cover:
1. Decompose Functional Dependencies:
Break composite attributes into atomic ones (e.g., A→BC becomes A→B
and A→C).
2. Eliminate Extraneous Attributes:
Remove unnecessary attributes from functional dependencies.
3. Remove Redundant Dependencies:
If a functional dependency can be inferred from others, eliminate it.
Example:
Given the functional dependencies:
1. A→BC
2. B→C
16
3. A→B
Canonical Cover:
1. Decompose A→BC into A→B and A→C.
o Result: A→B, A→C, B→C.
2. Remove redundancy:
o A→B and B→C imply A→C.
o Remove A→C.
Canonical Cover: A→B, B→C.
17