Unit – C Normalization: Normalization concept, First normal form(1NF), Second Normal
form(2NF),Third Normal form(3NF), BCNF, Fourth normal form(4NF), Normalization using Multi-
valued Dependencies and Normalization using Functional Dependencies.
Normalization - Database normalization is the process of structuring a relational database in
accordance with a series of so-called normal forms in order to reduce data redundancy and
improve data integrity.
Normalization is an important process in database design that helps improve the database's
e iciency, consistency, and accuracy. It makes it easier to manage and maintain the data and
ensures that the database is adaptable to changing business needs.
Database normalization is the process of organizing the attributes of the database to reduce or
eliminate data redundancy (having the same data but at di erent places).
Data redundancy unnecessarily increases the size of the database as the same data is repeated
in many places. Inconsistency problems also arise during insert, delete, and update
operations.
In the relational model, there exist standard methods to quantify how e icient a databases is.
These methods are called normal forms and there are algorithms to covert a given database
into normal forms.
Normalization generally involves splitting a table into multiple ones which must be linked each
time a query is made requiring data from the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below anomalies. Failure to
reduce anomalies results in data redundancy, which may threaten data integrity and cause
additional issues as the database increases. Normalization consists of a set of procedures that
assist you in developing an e ective database structure.
Insertion Anomalies: Insertion anomalies occur when it is not possible to insert data into a
database because the required fields are missing or because the data is incomplete. For
example, if a database requires that every record has a primary key, but no value is provided for
a particular record, it cannot be inserted into the database.
Deletion anomalies: Deletion anomalies occur when deleting a record from a database and can
result in the unintentional loss of data. For example, if a database contains information about
customers and orders, deleting a customer record may also delete all the orders associated
with that customer.
Updation anomalies: Updation anomalies occur when modifying data in a database and can
result in inconsistencies or errors. For example, if a database contains information about
employees and their salaries, updating an employee’s salary in one record but not in all related
records could lead to incorrect calculations and reporting.
Before Normalization: The table is prone to redundancy and anomalies (insertion, update, and
deletion).
After Normalization: The data is divided into logical tables to ensure consistency, avoid redundancy
and remove anomalies making the database e icient and reliable.
Features of Database Normalization
Elimination of Data Redundancy: One of the main features of normalization is to eliminate the
data redundancy that can occur in a database. Data redundancy refers to the repetition of data
in di erent parts of the database. Normalization helps in reducing or eliminating this
redundancy, which can improve the e iciency and consistency of the database.
Ensuring Data Consistency: Normalization helps in ensuring that the data in the database is
consistent and accurate. By eliminating redundancy, normalization helps in preventing
inconsistencies and contradictions that can arise due to di erent versions of the same data.
Simplification of Data Management: Normalization simplifies the process of managing data in
a database. By breaking down a complex data structure into simpler tables, normalization
makes it easier to manage the data, update it, and retrieve it.
Improved Database Design: Normalization helps in improving the overall design of the
database. By organizing the data in a structured and systematic way, normalization makes it
easier to design and maintain the database. It also makes the database more flexible and
adaptable to changing business needs.
Avoiding Update Anomalies: Normalization helps in avoiding update anomalies, which can
occur when updating a single record in a table a ects multiple records in other tables.
Normalization ensures that each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such anomalies.
Standardization: Normalization helps in standardizing the data in the database. By organizing
the data into tables and defining relationships between them, normalization helps in ensuring
that the data is stored in a consistent and uniform manner.
Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is stored in
only one place, reducing the risk of data inconsistency and making it easier to maintain data
accuracy.
By breaking down data into smaller, more specific tables, normalization helps ensure that
each table stores only relevant data, which improves the overall data integrity of the
database.
Normalization simplifies the process of updating data, as it only needs to be changed in one
place rather than in multiple places throughout the database.
Normalization enables users to query the database using a variety of di erent criteria, as
the data is organized into smaller, more specific tables that can be joined together as
needed.
Normalization can help ensure that data is consistent across di erent applications that use
the same database, making it easier to integrate di erent applications and ensuring that all
users have access to accurate and consistent data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for
additional join operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across multiple
tables and require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database design and
the normalization process.
Normalization can increase the complexity of a database design, especially if the data
model is not well understood or if the normalization process is not carried out correctly.
A functional dependency (FD) in a relational database management system (DBMS) is a
constraint between two sets of attributes (columns) in a table. It states that the value of one set
of attributes, called the determinant (left-hand side), uniquely determines the value of another
set of attributes, called the dependent (right-hand side).
Functional dependencies are crucial for database normalization, a process that minimizes
data redundancy and improves data integrity and consistency.
Notation
An FD is typically denoted using an arrow:
X -> Y
X is the Determinant (left-hand side).
Y is the Dependent (right-hand side).
Meaning: If two rows in the table have the same values for all attributes in X, then they must
also have the same values for all attributes in Y
Example: In an Employee table with attributes {EmployeeID, Name, Department}, if EmployeeID
is unique, then:
EmployeeID -> Name, Department
This means knowing the EmployeeID uniquely determines both the Name and Department.
Types of Functional Dependencies
Functional dependencies are classified into several types based on the relationship between
the determinant and the dependent set of attributes, which are particularly important in the
context of normalization:
Trivial Functional Dependency: An FD X -> Y is trivial if Y is a subset of X (Y \subseteq X).
This is always true and provides no new information about the data relationships.
o Example: {EmployeeID, Name\} -> EmployeeID
Non-Trivial Functional Dependency: An FD X -> Y is non-trivial if Y is not a subset of X (Y
\not\subseteq X). This shows an important relationship between attributes.
o Example: $EmployeeID \rightarrow Name$
Full Functional Dependency (used for 2NF): An FD $X \rightarrow Y$ holds if $Y$ is
functionally dependent on $X$ and is not functionally dependent on any proper subset of
$X$. This is only relevant when $X$ is a composite key.
o Example (Composite Key $\{OrderNo, ProductNo\}$): $\{OrderNo, ProductNo\}
\rightarrow Quantity$ (Quantity requires the full key).
Partial Functional Dependency (violates 2NF): An FD $X \rightarrow Y$ holds if a non-key
attribute $Y$ is functionally dependent on only a part of a composite candidate key $X$.
o Example (Composite Key $\{OrderNo, ProductNo\}$): $OrderNo \rightarrow
OrderDate$ (OrderDate only depends on part of the key).
Transitive Functional Dependency (violates 3NF): An FD $X \rightarrow Z$ holds when $X
\rightarrow Y$ and $Y \rightarrow Z$ and $Y$ is not a superkey. The dependency is indirect,
through the intermediate attribute set $Y$.
o Example: $EmployeeID \rightarrow DepartmentID$ and $DepartmentID \rightarrow
DepartmentName$. Therefore, $EmployeeID \rightarrow DepartmentName$ is a
transitive dependency.
Normal Forms in DBMS
First Normal Form (1NF)- A table is said to be in First Normal Form (1NF) if and only if:
All the attributes (columns) in the relation contain atomic (indivisible) values only.
There should be no repeating groups or arrays.
The First Normal Form (1NF) is the most basic rule for organizing data in a relational database. It
ensures that your data is structured correctly, making it easy to search, sort, and manage.
Core Principle: Atomic Values
The main rule of 1NF is that every column in every row of a table must contain atomic (indivisible)
values.
Atomic means that a column cannot hold multiple values, and it cannot hold a structure or
group of values that can be broken down further.
Think of it like a single cell in a spreadsheet: it should hold only one piece of information.
Violations of 1NF (What to Avoid)
A table violates 1NF in two main ways:
1. Repeating Groups
This is when a single row has multiple columns that are meant to hold the same kind of information.
StudentID StudentName Course1 Course2 Course3
101 Alice Math Physics History
102 Bob Chemistry Art
This table violates 1NF because Course1, Course2, and Course3 are essentially the same attribute
(Course) repeated across the columns. If Alice takes a fourth course, you'd have to add a new
column, which is bad database design.
2. Multi-Valued Attributes
This is when a single cell contains more than one value, often separated by commas or some other
delimiter.
StudentID StudentName Courses
101 Alice Math, Physics, History
102 Bob Chemistry, Art
This table violates 1NF because the Courses cell for Alice contains three separate courses. If you
wanted to find all students taking only Physics, it would be di icult to search this column. The value
is not atomic.
Achieving 1NF
To bring a table into 1NF, you must eliminate repeating groups and multi-valued attributes by
creating separate rows for each distinct atomic value.
The 1NF Solution Table - Instead of one row with multiple courses, you create a new row for every
course a student takes, duplicating the student's information as needed.
StudentID StudentName Course
101 Alice Math
101 Alice Physics
101 Alice History
102 Bob Chemistry
102 Bob Art
Why this works:
1. Atomicity: Every cell contains a single, indivisible value (one StudentID, one StudentName,
one Course).
2. Scalability: If a student takes a new course, you simply add a new row. You don't have to
change the structure (columns) of the table.
3. Searchability: It's now very easy to query the table, for example, to find all courses taken by
student '101' or all students taking 'Physics'.
In simple terms, 1NF is about ensuring that each piece of data gets its own dedicated, single slot in
the table.
Second Normal Form (2NF)
The Second Normal Form (2NF) is the next level of database normalization. It builds directly upon
1NF, adding a specific rule about how non-key columns relate to the primary key.
Prerequisite: You Must Already Be in 1NF
A table must satisfy 1NF before it can be evaluated for 2NF.
The Core Rule of 2NF
The rule for 2NF is:
Every non-key attribute must be fully functionally dependent on the entire Primary Key.
Let's break down this concept:
1. Primary Key vs. Composite Key
Primary Key: A column (or a set of columns) that uniquely identifies every row.
Composite Key: A Primary Key composed of two or more columns. 2NF is only relevant
when you have a composite key.
2. Functional Dependency
Definition: An attribute B is functionally dependent on an attribute A (written as $A
\rightarrow B$) if knowing the value of A uniquely determines the value of B.
o Example: $\text{StudentID} \rightarrow \text{StudentName}$ (If you know the ID, you
know the Name).
3. Partial Dependency (The 2NF Violation)
A partial dependency occurs when a non-key attribute is determined by only a part of the
composite Primary Key, not the entire key. This is the violation that 2NF eliminates.
Violation Example
Imagine a table used to track which instructors teach which courses, and what the tuition fee is.
CourseID InstructorID CourseName InstructorName TuitionFee
CSC101 I001 Database Dr. Smith $500
CSC101 I002 Database Ms. Jones $500
MTH202 I001 Calculus Dr. Smith $450
In this table:
Composite Primary Key: $(\text{CourseID}, \text{InstructorID})$ - You need both to
uniquely identify a row (e.g., to distinguish between Dr. Smith and Ms. Jones teaching the
same course).
Non-Key Attributes: $\text{CourseName}$, $\text{InstructorName}$, $\text{TuitionFee}$.
The Partial Dependencies:
1. $\text{CourseID} \rightarrow \text{CourseName}$ and $\text{CourseID} \rightarrow
\text{TuitionFee}$
o The $\text{CourseName}$ (Database) and $\text{TuitionFee}$ ($500) are determined
only by the $\text{CourseID}$ (CSC101), which is only part of the key.
o The $\text{TuitionFee}$ is repeated every time the course is listed with a di erent
instructor, which is redundancy.
2. $\text{InstructorID} \rightarrow \text{InstructorName}$
o The $\text{InstructorName}$ (Dr. Smith) is determined only by the
$\text{InstructorID}$ (I001), which is only part of the key.
o The $\text{InstructorName}$ is repeated for every course the instructor teaches.
Achieving 2NF (The Solution)
To achieve 2NF, you must remove the partial dependencies by splitting the original table into new
tables, where all non-key attributes are dependent on the entire Primary Key of their new table.
1. Course Table (Removing $\text{CourseID}$ Dependencies)
CourseID (PK) CourseName TuitionFee
CSC101 Database $500
MTH202 Calculus $450
Constraint: All columns are fully dependent on the entire Primary Key ($\text{CourseID}$).
2. Instructor Table (Removing $\text{InstructorID}$ Dependencies)
InstructorID (PK) InstructorName
I001 Dr. Smith
I002 Ms. Jones
Constraint: All columns are fully dependent on the entire Primary Key
($\text{InstructorID}$).
3. Enrollment/Teaching Table (The Relationship)
CourseID (Part of PK/FK) InstructorID (Part of PK/FK)
CSC101 I001
CSC101 I002
MTH202 I001
Constraint: This table only holds the key attributes and forms the relationship. Any
information you add here must be dependent on both ($\text{CourseID},
\text{InstructorID}$).
Benefits of 2NF:
Less Redundancy: The $\text{TuitionFee}$ and $\text{InstructorName}$ are now stored
only once.
Easier Updates: If the $\text{TuitionFee}$ for CSC101 changes, you only update one row in
the $\text{Course}$ table, not multiple rows in the original table.
Reduced Anomalies: You can now add a new course without assigning an instructor, or
add a new instructor without having them teach a course yet.
Third Normal Form (3NF)
The Third Normal Form (3NF) is the most widely applied level of normalization in practical
database design. It ensures that all data in a table directly describes the entity identified by the
primary key and nothing else.
Prerequisite: You Must Already Be in 2NF
A table must satisfy 2NF before it can be evaluated for 3NF.
1NF Rule: All cells must have atomic values.
2NF Rule: All non-key attributes must be fully dependent on the entire Primary Key
(relevant only with a composite key).
The Core Rule of 3NF
The rule for 3NF eliminates a specific type of dependency that still exists after 2NF:
There must be no transitive dependencies among non-key attributes.
Let's clarify what this means:
Transitive Dependency (The 3NF Violation)
A transitive dependency occurs when a non-key attribute is dependent on another non-key
attribute, instead of being directly dependent on the Primary Key.
In simpler terms: A non-key column shouldn't determine the value of another non-key
column.
The flow of dependency looks like this:
$$\text{Primary Key} \rightarrow \text{Non-Key Attribute A} \rightarrow \text{Non-Key Attribute B}$$
Violation Example (The Employee Department Table)
Imagine a table used to track employee information, including which department they work in and
where that department is located.
EmployeeI EmployeeNa Department DepartmentNa DepartmentLocati
D (PK) me ID me on
101 Alice D01 Sales New York
102 Bob D02 IT Chicago
103 Charlie D01 Sales New York
This table is in 2NF because:
The Primary Key is $\text{EmployeeID}$.
All other columns ($\text{EmployeeName}$, $\text{DepartmentID}$,
$\text{DepartmentName}$, $\text{DepartmentLocation}$) are directly dependent on
$\text{EmployeeID}$.
The Transitive Dependency:
1. The $\text{EmployeeID}$ determines the $\text{DepartmentID}$. ($\text{EmployeeID}
\rightarrow \text{DepartmentID}$)
2. The $\text{DepartmentID}$ determines the $\text{DepartmentName}$ and
$\text{DepartmentLocation}$. ($\text{DepartmentID} \rightarrow \text{DepartmentName},
\text{DepartmentLocation}$)
The non-key attributes $\text{DepartmentName}$ and $\text{DepartmentLocation}$ are
determined by another non-key attribute ($\text{DepartmentID}$), not directly by the
$\text{EmployeeID}$.
The Redundancy Problem:
The $\text{DepartmentName}$ and $\text{DepartmentLocation}$ (e.g., Sales, New York) are
repeated for every employee in that department.
If the Sales department moves to London, you would have to update multiple rows, leading
to potential update anomalies (if you miss one employee's row, the data becomes
inconsistent).
Achieving 3NF (The Solution)
To achieve 3NF, you must remove the transitive dependencies by moving the transitively
dependent attributes into a new, separate table. The determinant non-key attribute
($\text{DepartmentID}$) becomes the Primary Key of the new table.
1. Employee Table (The Primary Entity)
EmployeeID (PK) EmployeeName DepartmentID (FK)
101 Alice D01
102 Bob D02
103 Charlie D01
Constraint: All non-key attributes ($\text{EmployeeName}$ and $\text{DepartmentID}$) are
dependent only on the $\text{EmployeeID}$.
2. Department Table (The Secondary Entity)
DepartmentID (PK) DepartmentName DepartmentLocation
D01 Sales New York
D02 IT Chicago
Constraint: All columns are fully dependent on the $\text{DepartmentID}$.
Benefits of 3NF:
Minimal Redundancy: Department details are stored only once.
Easy Updates: If the Sales department moves, you only update one row in the
$\text{Department}$ table.
Reduced Anomalies: You can add a new department to the $\text{Department}$ table even
if no employee has been assigned to it yet.
Boyce-Codd Normal Form (BCNF)
The Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF. It addresses a specific,
uncommon, but critical anomaly that can still occur in tables that are already in 3NF, especially
when the table has multiple overlapping composite candidate keys.
Prerequisite: You Must Already Be in 3NF
A table must satisfy 3NF before it can be evaluated for BCNF.
3NF Rule: Eliminate transitive dependencies (No non-key column determines another non-
key column).
The Core Rule of BCNF
The BCNF rule simplifies the 3NF rule by applying the concept of functional dependency to all
determinants (attributes that determine others):
For every non-trivial functional dependency ($X \rightarrow Y$), $X$ must be a superkey.
Let's break down this concept:
1. Functional Dependency ($X \rightarrow Y$): Knowing the value of $X$ uniquely
determines the value of $Y$.
2. Trivial Dependency: When $Y$ is a subset of $X$ (e.g., $(\text{StudentID}, \text{Course})
\rightarrow \text{Course}$). BCNF only cares about non-trivial dependencies.
3. Superkey: Any attribute or set of attributes that uniquely identifies a row (this includes the
Primary Key and any other Candidate Keys).
In simpler terms: The only columns that should be able to determine the value of other
columns in the table are the ones that uniquely identify the entire row (the superkeys).
Violation Example
Consider a table tracking students and their advisors for specific subjects.
StudentID Subject Advisor
S100 Physics Dr. Smith
S100 Math Dr. Jones
S200 Math Dr. Jones
S300 Physics Dr. Smith
Facts and Dependencies:
1. A student can have multiple subjects, but a student has only one advisor for a given
subject.
o Candidate Key 1 (Primary Key): $(\text{StudentID}, \text{Subject})$
2. An advisor is only assigned to one specific subject (i.e., a Physics advisor only advises
on Physics).
o Dependency: $\text{Advisor} \rightarrow \text{Subject}$ (Non-trivial functional
dependency)
The Violation:
The table is in 3NF. There are no transitive dependencies ($\text{PK} \rightarrow \text{Non-
Key A} \rightarrow \text{Non-Key B}$).
The table violates BCNF because of the dependency: $\text{Advisor} \rightarrow
\text{Subject}$
o The determinant is $\text{Advisor}$.
o Is $\text{Advisor}$ a Superkey? No, because $\text{Advisor}$ (Dr. Smith) does not
uniquely identify the row (S100, S300 both have Dr. Smith as an advisor).
o Because $\text{Advisor}$ is not a Superkey but is determining another attribute
($\text{Subject}$), the table is not in BCNF.
The Problem:
Redundancy: The relationship between $\text{Advisor}$ and $\text{Subject}$ (e.g., Dr.
Smith, Physics) is repeated for every student advised by Dr. Smith.
Update Anomaly: If Dr. Smith switches subjects from Physics to Chemistry, you must
update multiple rows, risking inconsistency.
Achieving BCNF
To achieve BCNF, you must separate the attribute that is acting as a non-superkey determinant
($\text{Advisor}$) and the attribute it determines ($\text{Subject}$) into a new table.
1. Advisor-Subject Table (The New Entity)
Advisor (PK) Subject
Dr. Smith Physics
Dr. Jones Math
Constraint: Here, $\text{Advisor}$ is the Superkey (Primary Key), and it fully determines the
$\text{Subject}$. BCNF achieved for this table.
2. Student-Advisor Table (The Relationship)
StudentID (Part of PK/FK) Advisor (Part of PK/FK)
S100 Dr. Smith
StudentID (Part of PK/FK) Advisor (Part of PK/FK)
S100 Dr. Jones
S200 Dr. Jones
S300 Dr. Smith
Constraint: The Primary Key is $(\text{StudentID}, \text{Advisor})$. BCNF achieved for this
table. (No other columns, so the rule is vacuously true).
Fourth Normal Form (4NF) Explained
The Fourth Normal Form (4NF) is an advanced level of normalization that goes beyond the
concepts of functional and transitive dependencies addressed by 3NF and BCNF. It specifically
addresses a type of redundancy called Multi-valued Dependency (MVD).
Prerequisite: You Must Already Be in BCNF
A table must satisfy BCNF before it can be evaluated for 4NF.
BCNF Rule: Every determinant (an attribute that determines others) must be a Superkey.
The Core Rule of 4NF
The rule for 4NF is:
A table is in 4NF if and only if it is in BCNF and contains no non-trivial Multi-valued
Dependencies (MVDs) other than a candidate key.
What is a Multi-valued Dependency (MVD)?
An MVD exists when one attribute ($A$) determines multiple values for another attribute ($B$)
independently of other attributes ($C$) in the table. It is denoted as $A \twoheadrightarrow B$.
Key Insight: MVDs typically occur in tables with at least three columns where two of the
non-key attributes are independent of each other, but both are dependent on the key.
The problem is that this independence forces a Cartesian product of the independent
values to be listed for every row, leading to massive redundancy.
Violation Example (The Student Activity Table)
Imagine a table to track students, the courses they take, and their hobbies. Assume the following
real-world rules:
1. A student can take multiple courses.
2. A student can have multiple hobbies.
3. The courses a student takes are completely independent of their hobbies.
StudentID Course Hobby
S100 Math Cricket
S100 Math Chess
S100 Physics Cricket
S100 Physics Chess
S200 History Reading
The Dependencies:
Primary Key (Candidate Key): $(\text{StudentID}, \text{Course}, \text{Hobby})$ — All three
attributes are needed to uniquely identify a row.
Multi-valued Dependencies (MVDs):
o $\text{StudentID} \twoheadrightarrow \text{Course}$ (A student has multiple
courses).
o $\text{StudentID} \twoheadrightarrow \text{Hobby}$ (A student has multiple
hobbies).
The Redundancy Problem:
The table must list every combination of a student's courses with their hobbies to represent all the
facts.
Student S100 takes two courses ($\text{Math}$, $\text{Physics}$) and has two hobbies
($\text{Cricket}$, $\text{Chess}$).
The table must have $2 \times 2 = 4$ rows, even though $\text{Math}$ isn't related to
$\text{Cricket}$ any more than it is to $\text{Chess}$. The $\text{Course}$ and
$\text{Hobby}$ data are cross-multiplied unnecessarily.
If S100 takes a 3rd course, say Chemistry, you must add two new rows (one for
Chemistry/Cricket and one for Chemistry/Chess), leading to an Insertion Anomaly.
The table is in BCNF (since the only determinant is the $\text{StudentID}$ which is part of the
composite key, and $\text{StudentID}$ alone is not a superkey, but the dependencies are MVDs, not
Functional Dependencies). However, the table is not in 4NF because it contains two non-trivial
MVDs ($\text{StudentID} \twoheadrightarrow \text{Course}$ and $\text{StudentID}
\twoheadrightarrow \text{Hobby}$) determined by the same attribute ($\text{StudentID}$).
Achieving 4NF (The Solution)
To achieve 4NF, you must separate the independent multi-valued attributes into their own
tables.
1. Student-Course Table
StudentID (PK/FK) Course (PK)
S100 Math
S100 Physics
S200 History
2. Student-Hobby Table
StudentID (PK/FK) Hobby (PK)
S100 Cricket
S100 Chess
S200 Reading
Benefits of 4NF:
Eliminates Redundancy: Course and Hobby information are now stored only once per
relationship, eliminating the cross-product explosion.
No Anomalies: If Student S100 adds $\text{Chemistry}$ as a course, you add only one row
to the $\text{Student-Course}$ table, not multiple rows to the original table.
4NF is the goal for eliminating redundancy caused by multiple, independent one-to-many
relationships in a single table.
Fifth Normal Form (5NF) / Project-Join Normal Form (PJNF)
The Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is the highest level
of normalization concerned with eliminating redundancy. It deals with a more complex dependency
called Join Dependency.
Prerequisite: You Must Already Be in 4NF
A table must satisfy 4NF before it can be evaluated for 5NF.
4NF Rule: The table must be in BCNF and contain no non-trivial Multi-valued
Dependencies (MVDs).
The Core Rule of 5NF
The rule for 5NF is:
A table is in 5NF if and only if it is in 4NF, and every non-trivial Join Dependency in that table is
implied by the candidate keys.
In simpler terms, a table in 4NF is in 5NF if it cannot be losslessly decomposed into any number
of smaller tables without the decomposition being based on the table's candidate keys.
What is a Join Dependency (JD)?
A Join Dependency is a constraint on a table ($R$) stating that it can be losslessly reconstructed by
joining three or more smaller tables ($R_1, R_2, \dots, R_n$), where each smaller table is a
projection (subset of columns) of the original table.
$$R = R_1 \bowtie R_2 \bowtie \dots \bowtie R_n$$
Trivial JD: A JD is trivial if one of the decomposed tables ($R_i$) contains all of the attributes
of the original table ($R$). This is not a concern for 5NF.
Non-Trivial JD: A non-trivial JD exists if you need to join three or more projections to get
back the original table, and this decomposition is not a result of a superkey. MVD is a
special case of JD where the decomposition is into only two projections ($n=2$).
Violation Example
5NF violations are rare and typically occur when there is a compound constraint that restricts valid
data combinations, which cannot be expressed using functional or multi-valued dependencies.
Consider a table called Supplier-Part-Project with the schema (Supplier, Part, Project), where the
primary key is the combination of all three: $(\text{Supplier}, \text{Part}, \text{Project})$.
Supplier Part Project
S1 P1 J1
S1 P2 J2
S2 P1 J1
Assume a complex rule:
If supplier $S$ supplies part $P$, AND supplier $S$ supplies a part to project $J$, AND part $P$ is
used in project $J$, THEN supplier $S$ must supply part $P$ to project $J$.
This constraint implies a non-trivial Join Dependency $JD(\{\text{Supplier}, \text{Part}\},
\{\text{Supplier}, \text{Project}\}, \{\text{Part}, \text{Project}\})$.
The table is in 4NF because the only key is the entire table, and there are no non-trivial
MVDs (you can't decompose it into just two tables without a lossy join or creating spurious
tuples).
However, it violates 5NF because the table can be losslessly decomposed into three
relations, and this dependency is not implied by the candidate key.
Solution: Decompose to 5NF
To satisfy 5NF, we must decompose the original table into the three smaller tables:
1. $\text{R}_1(\text{Supplier}, \text{Part})$
2. $\text{R}_2(\text{Supplier}, \text{Project})$
3. $\text{R}_3(\text{Part}, \text{Project})$
This decomposition eliminates the redundancy and ensures that the complex constraint is
implicitly maintained, preventing the insertion of invalid combinations.
In Practice
5NF is highly theoretical and rarely applied in commercial database design, as tables that violate
4NF but not 5NF are infrequent. Normalization usually stops at BCNF or 4NF.
Normal
Forms Description of Normal Forms
First Normal A relation is in first normal form if every attribute in that relation is single-valued
Form (1NF) attribute.
Second A relation that is in First Normal Form and every non-primary-key attribute is
Normal Form fully functionally dependent on the primary key, then the relation is in Second
(2NF) Normal Form (2NF).
A relation is in the third normal form, if there is no transitive dependency for
non-prime attributes as well as it is in the second normal form. A relation is in
3NF if at least one of the following conditions holds in every non-trivial function
dependency X –> Y.
X is a super key.
Third Normal
Form (3NF) Y is a prime attribute (each element of Y is part of some candidate key).
Normal
Forms Description of Normal Forms
For BCNF the relation should satisfy the below conditions
The relation should be in the 3rd Normal Form.
Boyce-Codd
Normal Form X should be a super-key for every functional dependency (FD) X−>Y in a
(BCNF) given relation.
A relation R is in 4NF if and only if the following conditions are satisfied:
Fourth
It should be in the Boyce-Codd Normal Form (BCNF).
Normal Form
(4NF) The table should not have any Multi-valued Dependency.
A relation R is in 5NF if and only if it satisfies the following conditions:
R should be already in 4NF.
Fifth Normal
Form (5NF) It cannot be further non loss decomposed (join dependency).