0% found this document useful (0 votes)
9 views9 pages

Understanding Database Normalization Techniques

Normalization is the process of organizing data in a database to reduce redundancy and eliminate anomalies such as insertion, updation, and deletion anomalies. There are four main normal forms: 1NF, 2NF, 3NF, and BCNF, each with specific rules to ensure data integrity and reduce redundancy. The document also discusses the Fourth Normal Form (4NF) which addresses multi-valued dependencies, providing examples for better understanding.

Uploaded by

preethishree900
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Understanding Database Normalization Techniques

Normalization is the process of organizing data in a database to reduce redundancy and eliminate anomalies such as insertion, updation, and deletion anomalies. There are four main normal forms: 1NF, 2NF, 3NF, and BCNF, each with specific rules to ensure data integrity and reduce redundancy. The document also discusses the Fourth Normal Form (4NF) which addresses multi-valued dependencies, providing examples for better understanding.

Uploaded by

preethishree900
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Module 4

Normalization is the process of organizing the data and the attributes of a database. It is
performed to reduce the data redundancy in a database and to ensure that data is stored logically.
Data redundancy in DBMS means having the same data but at multiple places. It is necessary to
remove data redundancy because it causes anomalies in a database which makes it very hard for
a database administrator to maintain it.

Why Do We Need Normalization?

As we have discussed above, normalization is used to reduce data redundancy. It provides a


method to remove the following anomalies from the database and bring it to a more consistent
state:

A database anomaly is a flaw in the database that occurs because of poor planning and
redundancy.

1. Insertion anomalies: This occurs when we are not able to insert data into a database
because some attributes may be missing at the time of insertion.
2. Updation anomalies: This occurs when the same data items are repeated with the same
values and are not linked to each other.
3. Deletion anomalies: This occurs when deleting one part of the data deletes the other
necessary information from the database.

In the next section, we’ll see the types of normal forms in the database.

Normal Forms

There are four types of normal forms that are usually used in relational databases as you can see
in the following figure:
1. 1NF: A relation is in 1NF if all its attributes have an atomic value.
2. 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the candidate key in DBMS.
3. 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
4. BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS
is the super key.

To understand the above-mentioned normal forms, we first need to have an understanding of the
functional dependencies.

Functional dependency is a relationship that exists between two sets of attributes of a relational
table where one set of attributes can determine the value of the other set of attributes. It is
denoted by X -> Y, where X is called a determinant and Y is called dependent.

First Normal Form (1NF)

A relation is in 1NF if every attribute is a single-valued attribute or it does not contain any multi-
valued or composite attribute, i.e., every attribute is an atomic attribute. If there is a composite or
multi-valued attribute, it violates the 1NF. To solve this, we can create a new row for each of the
values of the multi-valued attribute to convert the table into the 1NF.

Let’s take an example of a relational table <EmployeeDetail> that contains the details of the
employees of the company.

<EmployeeDetail>

Employee Code Employee Name Employee Phone Number

101 John 98765623,998234123

101 John 89023467

102 Ryan 76213908

103 Stephanie 98132452

Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.

To convert this table into 1NF, we make new rows with each Employee Phone Number as a new
row as shown below:

<EmployeeDetail>
Employee Code Employee Name Employee Phone Number

101 John 998234123

101 John 98765623

101 John 89023467

102 Ryan 76213908

103 Stephanie 98132452

Second Normal Form (2NF)

The normalization of 1NF relations to 2NF involves the elimination of partial dependencies.
A partial dependency in DBMS exists when any non-prime attributes, i.e., an attribute not a part
of the candidate key, is not fully functionally dependent on one of the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent
attributes and move them to some other table where they fit in well.

Let us take an example of the following <EmployeeProjectDetail> table to understand what is


partial dependency and how to normalize the table to the second normal form:

<EmployeeProjectDetail>

Employee Code Project ID Employee Name Project Name

101 P03 John Project103

101 P01 John Project101

102 P04 Ryan Project104

103 P02 Stephanie Project102

In the above table, the prime attributes of the table are Employee Code and Project ID. We have
partial dependencies in this table because Employee Name can be determined by Employee Code
and Project Name can be determined by Project ID. Thus, the above relational table violates the
rule of 2NF.
The prime attributes in DBMS are those which are part of one or more candidate keys.

To remove partial dependencies from this table and normalize it into second normal form, we
can decompose the <EmployeeProjectDetail> table into the following three tables:

<EmployeeDetail>

Employee Code Employee Name

101 John

101 John

102 Ryan

103 Stephanie

<EmployeeProject>

Employee Code Project ID

101 P03

101 P01

102 P04

103 P02

<ProjectDetail>

Project ID Project Name

P03 Project103

P01 Project101

P04 Project104

P02 Project102

Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into
<EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above
tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is
fully dependent on the primary key.
The relations in 2NF are clearly less redundant than relations in 1NF. However, the decomposed
relations may still suffer from one or more anomalies due to the transitive dependency. We will
remove the transitive dependencies in the Third Normal Form.

Third Normal Form (3NF)

The normalization of 2NF relations to 3NF involves the elimination of transitive dependencies in
DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

 X -> Y
 Y does not -> X
 Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions hold:

 X is a super key of the table.


 Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.

Let us take an example of the following <EmployeeDetail> table to understand what is transitive
dependency and how to normalize the table to the third normal form:

<EmployeeDetail>

Employee Code Employee Name Employee Zipcode Employee City

101 John 110033 Model Town

101 John 110044 Badarpur

102 Ryan 110028 Naraina

103 Stephanie 110064 Hari Nagar


The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:

 Employee Code -> Employee Zipcode


 Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.

To remove transitive dependency from this table and normalize it into the third normal form, we
can decompose the <EmployeeDetail> table into the following two tables:

<EmployeeDetail>

Employee Code Employee Name Employee Zipcode

101 John 110033

101 John 110044

102 Ryan 110028

103 Stephanie 110064

<EmployeeLocation>

Employee Zipcode Employee City

110033 Model Town

110044 Badarpur

110028 Naraina

110064 Hari Nagar

Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t have any
transitive dependency.

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and remove
redundancy caused by that. However, there may still exist some dependencies that cause
redundancy in the database. These redundancies are removed by a more strict normal form
known as BCNF.

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains additional
constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table. That
means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a database table.

Let us take an example of the following <EmployeeProjectLead> table to understand how to


normalize the table to the BCNF:

<EmployeeProjectLead>

Employee Code Project ID Project Leader

101 P03 Grey

101 P01 Christian

102 P04 Hudson

103 P02 Petro

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.

To convert the given table into BCNF, we decompose it into three tables:

<EmployeeProject>

Employee Code Project ID

101 P03

101 P01

102 P04

103 P02
<ProjectLead>

Project Leader Project ID

Grey P03

Christian P01

Hudson P04

Petro P02

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.

Fourth Normal Form (4NF)

Fourth Normal Form comes into picture when Multi-valued Dependency occur in any relation.

Rules for 4th Normal Form


For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:
1. It should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued Dependency.

Multi-valued Dependency
A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B,
then B and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

Time for an Example


Below we have a college enrolment table with columns s_id, course and hobby.
s_id course hobby
1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
The two records for student with s_id 1, will give rise to two more records, as shown below,
because for one student, two hobbies exists, hence along with both the courses, these hobbies
should be specified.

s_id course hobby


1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket

And, in the table above, there is no relationship between the columns course and hobby. They are
independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other
anomalies as well.

How to satisfy 4th Normal Form?


To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.

CourseOpted Table
s_id course
1 Science
1 Maths
2 C#
2 Php

And, Hobbies Table,


s_id hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency. In that case,
the functionally dependent columns are moved in a separate table and the multi-valued
dependent columns are moved to separate tables.

You might also like