NORMALIZATION IN DBMS
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then
we will discuss normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.
emp_i
d emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing the
department details in which the employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having two
different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into the
table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then
deleting the rows that are having emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
Normal Forms:
Here are the most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
1. First Normal Form (1NF)
First Normal Form (1NF) is a simple form of Normalization.
It simplifies each attribute in a relation.
In 1NF, there should not be any repeating group of data.
Each set of column must have a unique value.
It contains atomic values because the table cannot hold multiple values.
Example: Employee Table
ECode Employee_Name Department_Name
1 ABC Sales, Production
2 PQR Human Resource
3 XYZ Quality Assurance,
Marketing
Employee Table using 1NF
ECode Employee_Name Department_Name
1 ABC Sales
1 ABC Production
2 PQR Human Resource
3 XYZ Quality Assurance
3 XYZ Marketing
2. Second Normal Form (2NF)
In 2NF, the table is required in 1NF.
The main rule of 2NF is, 'No non-prime attribute is dependent on the proper subset of
any candidate key of the table.'
An attribute which is not part of candidate key is known as non-prime attribute.
Example : Teacher Table using 1NF
teacher_id subject teacher_age
1 ABC 38
1 ABC 38
2 PQR 38
3 XYZ 40
3 XYZ 40
Candidate Key: teacher_id, subject
Non prime attribute: teacher_age
The above table is in 1NF. Each attribute has atomic values. However, it is not in 2NF because
non prime attribute teacher_Age is dependent on teacher_id alone, which is a proper subset of
candidate key. This violates the rule for 2NF as the rule says 'No non-prime attribute is
dependent on the proper subset of any candidate key of the table'.
2NF (Second Normal Form) : Teacher_details Table
teacher_id teacher_Age
1 38
2 38
3 40
Employee2 Table
teacher_id subject
1 ABC
1 ABC
2 PQR
3 XYZ
3 XYZ
Now, the above tables comply with the Second Normal Form (2NF).
3. Third Normal Form (3NF)
Third Normal Form (3NF) is used to minimize the transitive redundancy.
In 3NF, the table is required in 2NF.
While using the 2NF table, there should not be any transitive partial dependency.
3NF reduces the duplication of data and also achieves the data integrity.
Example : <Employee> Table
EId Ename DOB City State Zip
001 ABC 10/05/1990 Pune Maharashtra 411038
002 XYZ 11/05/1988 Mumbai Maharashtra 400007
In the above <Employee> table, EId is a primary key but City, State depends upon Zip
code.
The dependency between Zip and other fields is called Transitive Dependency.
Therefore we apply 3NF. So, we need to move the city and state to the new
<Employee_Table2> table, with Zip as a Primary key.
<Employee_Table1> Table
EId Ename DOB Zip
001 ABC 10/05/1990 411038
002 XYZ 11/05/1988 400007
<Employee_Table2> Table
City State Zip
Pune Maharashtra 411038
Mumbai Maharashtra 400007
The advantage of removing transitive dependency is, it reduces the amount of data
dependencies and achieves the data integrity.
In the above example, using with the 3NF, there is no redundancy of data while inserting
the new records.
The City, State and Zip code will be stored in the separate table. And therefore the
updation becomes more easier because of no data redundancy.