0% found this document useful (0 votes)
11 views16 pages

Understanding Normalization in DBMS

Normalization in DBMS is a method to organize data in tables to reduce redundancy, structure data into smaller tables, and establish proper relationships between them. It addresses issues like insertion, update, and deletion anomalies by breaking down tables and ensuring data integrity. The document outlines various normal forms, including First, Second, Third, Boyce-Codd, Fourth, and Fifth Normal Forms, each with specific rules to enhance database design.

Uploaded by

aparna.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

Understanding Normalization in DBMS

Normalization in DBMS is a method to organize data in tables to reduce redundancy, structure data into smaller tables, and establish proper relationships between them. It addresses issues like insertion, update, and deletion anomalies by breaking down tables and ensuring data integrity. The document outlines various normal forms, including First, Second, Third, Boyce-Codd, Fourth, and Fifth Normal Forms, each with specific rules to enhance database design.

Uploaded by

aparna.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DBMS BCS501

UNIT -3

Normalization in DBMS

Normalization in DBMS is a technique using which you can organize the data
in the database tables so that:

 There is less repetition of data,

 A large set of data is structured into a bunch of smaller tables,

 and the tables have a proper relationship between them.

DBMS Normalization is a systematic approach to decompose (break down)


tables to eliminate data redundancy(repetition) and undesirable
characteristics like Insertion anomaly in DBMS, Update anomaly in DBMS,
and Delete anomaly in DBMS.

It is a multi-step process that puts data into tabular form, removes duplicate
data, and set up the relationship between tables.

Why we need Normalization in DBMS?

Normalization is required for,

 Eliminating redundant(useless) data, therefore handling data integrity,


because if data is repeated it increases the chances of inconsistent
data.

 Normalization helps in keeping data consistent by storing the data in


one table and referencing it everywhere else.

 Storage optimization although that is not an issue these days because


Database storage is cheap.

 Breaking down large tables into smaller tables with relationships, so it


makes the database structure more scalable and adaptable.

 Ensuring data dependencies make sense i.e. data is logically stored.

Problems without Normalization in DBMS

If a table is not properly normalized and has data redundancy(repetition)


then it will not only eat up extra memory space but will also make it difficult
for you to handle and update the data in the database, without losing data.
Insertion, Updation, and Deletion Anomalies are very frequent if the
database is not normalized.

To understand these anomalies let us take an example of a Student table.

rollno name branch hod office_tel

401 Akon CSE Mr. X 53337

402 Bkon CSE Mr. X 53337

403 Ckon CSE Mr. X 53337

404 Dkon CSE Mr. X 53337

In the table above, we have data for four Computer Sci. students.

As we can see, data for the fields branch, hod(Head of Department),


and office_tel are repeated for the students who are in the same
branch in the college, this is Data Redundancy.

1. Insertion Anomaly in DBMS

 Suppose for a new admission, until and unless a student opts for a
branch, data of the student cannot be inserted, or else we will have to
set the branch information as NULL.

 Also, if we have to insert data for 100 students of the same branch,
then the branch information will be repeated for all those 100 students.

 These scenarios are nothing but Insertion anomalies.

 If you have to repeat the same data in every row of data, it's better
to keep the data separately and reference that data in each row.

 So in the above table, we can keep the branch information separately,


and just use the branch_id in the student table, where branch_id can
be used to get the branch information.

2. Updation Anomaly in DBMS


 What if Mr. X leaves the college? or Mr. X is no longer the HOD of the
computer science department? In that case, all the student records will
have to be updated, and if by mistake we miss any record, it will lead
to data inconsistency.

 This is an Updation anomaly because you need to update all the


records in your table just because one piece of information got
changed.

3. Deletion Anomaly in DBMS

 In our Student table, two different pieces of information are kept


together, the Student information and the Branch information.

 So if only a single student is enrolled in a branch, and that student


leaves the college, or for some reason, the entry for the student is
deleted, we will lose the branch information too.

 So never in DBMS, we should keep two different entities together,


which in the above example is Student and branch,

The solution for all the three anomalies described above is to keep
the student information and the branch information in two different tables.
And use the branch_id in the student table to reference the branch.

Primary Key and Non-key attributes

As you can see in the table above, the student_id column is


a primary key because using the student_id value we can uniquely
identify each row of data, hence the remaining columns then
become the non-key attributes.

Types of DBMS Normal forms

Normalization rules are divided into the following normal forms:

1. First Normal Form

2. Second Normal Form

3. Third Normal Form

4. BCNF

5. Fourth Normal Form

6. Fifth Normal Form


1. First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4
rules:

1. It should only have single(atomic) valued attributes/columns.

2. Values stored in a column should be of the same domain.

3. All the columns in a table should have unique names.

4. And the order in which data is stored should not matter.

example.

If we have an Employee table in which we store the employee


information along with the employee skillset, the table will look like this:

emp_id emp_name emp_mobile emp_skills

1 John Tick 9999957773 Python, JavaScript

2 Darth Trader 8888853337 HTML, CSS, JavaScript

3 Rony Shark 7777720008 Java, Linux, C++

The above table has 4 columns:

 All the columns have different names.

 All the columns hold values of the same type like emp_name has all
the names, emp_mobile has all the contact numbers, etc.

 The order in which we save data doesn't matter

 But the emp_skills column holds multiple comma-separated values,


while as per the First Normal form, each column should have a single
value.

Hence the above table fails to pass the First Normal form.

So how do you fix the above table? There are two ways to do this:

1. Remove the emp_skills column from the Employee table and keep it in
some other table.
2. Or add multiple rows for the employee and each row is linked with one
skill.

1. Create Separate tables for Employee and Employee Skills

So the Employee table will look like this,

emp_id emp_name emp_mobile

1 John Tick 9999957773

2 Darth Trader 8888853337

3 Rony Shark 7777720008

And the new Employee_Skill table:

emp_id emp_skill

1 Python

1 JavaScript

2 HTML

2 CSS

2 JavaScript

3 Java

3 Linux

3 C++

2. Add Multiple rows for Multiple skills


You can also simply add multiple rows to add multiple skills. This will lead to
repetition of the data, but that can be handled as you further Normalize your
data using the Second Normal form and the Third Normal form.

emp_id emp_name emp_mobile emp_skill

1 John Tick 9999957773 Python

1 John Tick 9999957773 JavaScript

2 Darth Trader 8888853337 HTML

2 Darth Trader 8888853337 CSS

2 Darth Trader 8888853337 JavaScript

3 Rony Shark 7777720008 Java

3 Rony Shark 7777720008 Linux

3 Rony Shark 7777720008 C++

2. Second Normal Form (2NF)

For a table to be in the Second Normal Form,

1. It should be in the First Normal form.

2. And, it should not have Partial Dependency.

What is Partial Dependency?

When a table has a primary key that is made up of two or more columns,
then all the columns(not included in the primary key) in that table should
depend on the entire primary key and not on a part of it. If any column(which
is not in the primary key) depends on a part of the primary key then we say
we have Partial dependency in the table.
Confused? Let's take an example.

If we have two tables Students and Subjects, to store student information


and information related to subjects.

Student table:

student_id student_name branch

1 Akon CSE

2 Bkon Mechanical

Subject Table:

subject_id subject_name

1 C Language

2 DSA

3 Operating System

And we have another table Score to store the marks scored by students in
any subject like this,

student_id subject_id marks teacher_name

1 1 70 Miss. C

1 2 82 Mr. D

2 1 65 Mr. Op

Now in the above table, the primary key is student_id + subject_id, because
both these information are required to select any row of data.
But in the Score table, we have a column teacher_name, which depends on
the subject information or just the subject_id, so we should not keep that
information in the Score table.

The column teacher_name should be in the Subjects table. And then the
entire system will be Normalized as per the Second Normal Form.

Updated Subject table:

subject_id subject_name teacher_name

1 C Language Miss. C

2 DSA Mr. D

3 Operating System Mr. Op

Updated Score table:

student_id subject_id marks

1 1 70

1 2 82

2 1 65

3. Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,

1. It satisfies the First Normal Form and the Second Normal form.

2. And, it doesn't have Transitive Dependency.

What is Transitive Dependency?

In a table we have some column that acts as the primary key and other
columns depends on this column. But what if a column that is not the
primary key depends on another column that is also not a primary key or
part of it? Then we have Transitive dependency in our table.

Let's take an example. We had the Score table in the Second Normal Form
above. If we have to store some extra information in it, like,

1. exam_type

2. total_marks

To store the type of exam and the total marks in the exam so that we can
later calculate the percentage of marks scored by each student.

The Score table will look like this,

student_id subject_id marks exam_type total_marks

1 1 70 Theory 100

1 2 82 Theory 100

2 1 42 Practical 50

 In the table above, the column exam_type depends on


both student_id and subject_id, because,

o a student can be in the CSE branch or the Mechanical branch,

o and based on that they may have different exam types for
different subjects.

o The CSE students may have both Practical and Theory for
Compiler Design,

o whereas Mechanical branch students may only have Theory


exams for Compiler Design.

 But the column total_marks just depends on the exam_type column.


And the exam_type column is not a part of the primary key. Because
the primary key is student_id + subject_id, hence we have a Transitive
dependency here.

How to Transitive Dependency?


You can create a separate table for ExamType and use it in the Score table.

New ExamType table,

exam_type_id exam_type total_marks duration

1 Practical 50 45

2 Theory 100 180

3 Workshop 150 300

We have created a new table ExamType and we have added more related
information in it like duration(duration of exam in mins.), and now we can
use the exam_type_id in the Score table.

4. Boyce-Codd Normal Form (BCNF)

 Boyce and Codd Normal Form is a higher version of the Third Normal
Form.

 This form deals with a certain type of anomaly that is not handled by
3NF.

 A 3NF table that does not have multiple overlapping candidate keys is
said to be in BCNF.

 For a table to be in BCNF, the following conditions must be satisfied:

o R must be in the 3rd Normal Form

o and, for each functional dependency ( X → Y ), X should be a


Super Key.

5. Fourth Normal Form (4NF)

A table is said to be in the Fourth Normal Form when,

1. It is in the Boyce-Codd Normal Form.

2. And, it doesn't have Multi-Valued Dependency.

You can also watch our YouTube video to learn about Fourth Normal Form
- DBMS
5. Fifth Normal Form (5NF)

 The fifth normal form is also called the PJNF - Project-Join Normal Form

 It is the most advanced level of Database Normalization.

 Using Fifth Normal Form you can fix Join dependency and reduce data
redundancy.

 It also helps in fixing Update anomalies in DBMS design.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is an advanced version of 3NF used to


reduce redundancy in databases. It ensures that for every functional
dependency, the left side must be a superkey. This helps create cleaner and
more consistent database designs, especially when there are multiple
candidate keys.

Rules for BCNF

 Rule 1: The table should be in the 3rd Normal Form.

 Rule 2: X should be a super-key for every functional dependency (FD)


X−>Y in a given relation.

Note: To test whether a relation is in BCNF, we identify all the determinants


and make sure that they are candidate keys.

Key Notes:

1. To verify BCNF, identify all determinants (left side of FDs) and check
whether each is a candidate key.

2. If a relation is in BCNF, it is automatically in 3NF, 2NF, and 1NF as well.

The normal forms become stricter as we move from 1NF → 2NF → 3NF →
BCNF:

 1NF: Each field must hold atomic (indivisible) values.

 2NF: No partial dependency on a primary key.

 3NF: No transitive dependency on a primary key.

 BCNF: Every determinant must be a candidate key.


This progression ensures better structure and removes redundancy at each
level.

Why Do We Need BCNF?

 2NF and 3NF may allow anomalies if a functional dependency exists


where the determinant is not a superkey.

 BCNF handles edge cases where 3NF fails to remove all redundancy,
especially in tables with multiple candidate keys.

 Prevents update, insert, and delete anomalies by ensuring every


determinant is a superkey.

 Makes database design more robust and easier to maintain over time.

 Improves data consistency and clarity by removing hidden or indirect


dependencies.

We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.

Example 1

Consider a relation R with attributes (student, teacher, subject).

FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) -
> subject}

 Candidate keys are (student, teacher) and (student, subject).

 The above relation is in 3NF (since there is no transitive dependency).


A relation R is in BCNF if for every non-trivial FD X->Y, X must be a key.

 The above relation is not in BCNF, because in the FD (teacher-


>subject), teacher is not a key. This relation suffers with anomalies −

 For example, if we delete the student Tahira , we will also lose the
information that [Link] teaches C. This issue occurs because the
teacher is a determinant but not a candidate key.

R is divided into two relations R1(Teacher, Subject) and R2(Student, Teacher).

How to Satisfy BCNF?


For satisfying this table in BCNF, we have to decompose it into further tables.
Here is the full procedure through which we transform this table into BCNF.
Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.

Stu_Branch Table

Stu_I
D Stu_Branch

101 Computer Science & Engineering

Electronics & Communication


102
Engineering

Candidate Key for this table: Stu_ID.

Stu_Course Table

Branch_Num Stu_Course_
Stu_Course ber No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile
B_003 402
Communication

Candidate Key for this table: Stu_Course.

Stu_Enroll Table
Stu_I Stu_Course_
D No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.

After decomposing into further tables, now it is in BCNF, as it is passing the


condition of Super Key, that in functional dependency X−>Y, X is a Super
Key.

Example 3

Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:

{ BC->D, AC->BE, B->E }

Explanation:

 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets


can determine all attributes of the relation, So AC will be the candidate
key. A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.

 Step-2: Prime attributes are those attributes that are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in
this example.

 Step-3: The relation R is in 1st normal form as a relational DBMS does


not allow multi-valued or composite attributes.

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC
is not a proper subset of candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper
subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a
super key nor D is a prime attribute) and in B->E (neither B is a super key
nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD
should be super key or RHS should be a prime attribute. So the highest
normal form of relation will be the 2nd Normal form.

Note: A prime attribute cannot be transitively dependent on a key in BCNF


relation.

Consider these functional dependencies of some relation R

AB ->C
C ->B
AB ->B

From the given functional dependencies, the candidate keys of relation R are
AB and AC. On close observation, we see that B depends transitively on AB
through C, making it a transitive dependency.

 The first and third dependencies are in BCNF as their left sides are
candidate keys.

 The second dependency is not in BCNF, but it's in 3NF since the right
side is a prime attribute.

So, the highest normal form of relation R is 3NF.

Example 3

For example consider relation R(A, B, C)

A -> BC,
B -> A

A and B both are super keys so the above relation is in BCNF.

Note: BCNF decomposition may always not be possible with dependency


preserving, however, it always satisfies the lossless join condition. For
example, relation R (V, W, X, Y, Z), with functional dependencies:

V, W -> X
Y, Z -> X
W -> Y

It would not satisfy dependency preserving BCNF decomposition.


Note: Redundancies are sometimes still present in a BCNF relation as it is
not always possible to eliminate them completely.

There are also some higher-order normal forms, like the 4th Normal Form and
the 5th Normal Form.

You might also like