0% found this document useful (0 votes)
39 views13 pages

Database Normalization Fundamentals

Uploaded by

mlistener36
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views13 pages

Database Normalization Fundamentals

Uploaded by

mlistener36
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Fundamentals of Database Systems Lecture Note

Chapter 4
Logical Database Design

4.1 Normalization
A relational database is merely a collection of data, organized in a
particular manner. One of the best ways to determine what information
should be stored in a database is to clarify what questions will be
asked of it and what data would be included in the answers.
Database normalization is a series of steps followed to obtain a
database design that allows for consistent storage and efficient access
of data in a relational database. These steps reduce data redundancy
and the risk of data becoming inconsistent.
Normalization is the process of identifying the logical associations
between data items and designing a database that will represent such
associations but without suffering the anomalies which are;
1. Insertion Anomalies
2. Deletion Anomalies
3. Modification/ Updating Anomalies
Normalization may reduce system performance since data will be cross
referenced from many tables. Thus, denormalization is sometimes
used to improve performance, at the cost of reduced consistency
guarantees.
All the normalization rules will eventually remove the anomalies that
may exist during data manipulation after the implementation.
The type of problems that could occur in insufficiently normalized table
is called anomalies which includes;
(1) Insertion anomalies
An "insertion anomaly" is a failure to place information about a new
database entry into all the places in the database where information
about that new entry needs to be stored. In a properly normalized
database, information about a new entry needs to be inserted into

1
Fundamentals of Database Systems Lecture Note
only one place in the database; in an inadequately normalized
database, information about a new entry may need to be inserted
into more than one place and, human fallibility being what it is,
some of the needed additional insertions may be missed. In other
words, due to human fallibility, additional insertions may be missed.
(2) Deletion anomalies
A "deletion anomaly" is a failure to remove information about an
existing database entry when it is time to remove that entry. In a
properly normalized database, information about an old, to-be-
gotten-rid-of entry needs to be deleted from only one place in the
database; in an inadequately normalized database, information
about that old entry may need to be deleted from more than one
place, and, human fallibility being what it is, some of the needed
additional deletions may be missed.
(3) Modification/Updating anomalies
A modification of a database involves changing some value of the
attribute of a table. In a properly normalized database table, what
ever information is modified by the user, the change will be effected
and used accordingly.

N.B: The purpose of normalization is to reduce the chances for


anomalies to occur in a database.

Example for anomalies problems (problems related with anomalies)


EmpI FName LName SkillI Skill SkillType Schoo SchoolAdd
Skil
D D l l
Leve
l
12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5
16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sidist_Kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
24 Dereje Tamiru 8 Oracle Database Unity Gerji 5
51 Selam Belay 4 Prolog Programming Jimma Jimma City 8
94 Alem Kebede 3 Cisco Networking AAU Sidist_Kilo 7

2
Fundamentals of Database Systems Lecture Note
18 Girma Dereje 1 IP Programming Jimma Jimma City 4
13 Yared Gizaw 7 Java Programming AAU Sidist_Kilo 6

Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We can
not decide weather Pascal is allowed as a value for skill and we have
no clue about the type of skill that Pascal should be categorized as.
In addition, we are required to record/insert values for the primary key
column (i.e. we can’t insert value for skill column without inserting
values for EMPID column of employee).
Deletion Anomalies:
If employee with EMPID 16 is deleted then ever information about skill
C++ and the type of skill is deleted from the database. Then we will
not have any information about C++ and its skill type.
Modification Anomalies:
What if the address for Helico is changed from Piazza to Mexico? We need to look for
every occurrence of Helico and change the value of SchoolAdd from Piazza to Mexico,
which is prone to error.

Note: Database-Management System (DBMS) can work only with the


information that we put explicitly into its tables for a given database
and into its rules for working with those tables, where such rules are
appropriate and possible.

3
Fundamentals of Database Systems Lecture Note
Functional Dependency (FD)
Before moving to the definition and application of normalization, it is
important to have an understanding of "functional dependency."
Functional Dependencies (FDs) are derived from the real-world
constraints on the attributes.
Example
Dinner Type of
Wine
Meat Red
Fish White
Cheese Rose

Since the type of Wine served depends on the type of Dinner, we say
Wine is functionally dependent on Dinner. And this can be expressed
as:
Dinner  Wine
Dinner Type of Type of
Wine Fork
Meat Red Meat fork
Fish White Fish fork
Cheese Rose Cheese fork

Since both Wine type and Fork type are determined by the Dinner,
we say Wine is functionally dependent on Dinner and Fork is
functionally dependent on Dinner. And this is expressed as follows:
Dinner  Wine and Dinner  Fork

Partial Dependency
If we have composite primary key and if an attribute which is not a
member of all the primary key (i.e. is dependent on some part of the
primary key then that attribute is partially functionally dependent on
the primary key.

Let {A, B} is the Primary Key and C is non key attribute.

Then if it should be {A, B}C but BC or AC . Then C is partially


functionally dependent on {A, B}.

4
Fundamentals of Database Systems Lecture Note

Full Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if we
have composite primary key) then that attribute is fully functionally
dependent on the primary key.
Let {A, B} is the Primary Key and C is non key attribute.

Then if {A, B}C and BC and AC does not hold ( if B cannot
determine C and A cannot determine C). Then C Fully functionally
dependent on {A, B}.

Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of
the following form: "If A implies B, and if also B implies C, then A
implies C."
Example:
If Mr. X is a Human, and if every Human is an Animal, then Mr. X must
be an Animal.

Generalized way of describing transitive dependency is that:


If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C.
Provided that neither C nor B determines A i.e. (B / A and C / A)

In the normal notation:


{(AB) AND (BC)} ==> AC provided that B / A and C / A

5
Fundamentals of Database Systems Lecture Note

4.2 Process of normalization (1NF, 2NF, 3NF)


We have various levels or steps in normalization called Normal Forms. The level of
complexity, strength of the rule and decomposition increases as we move from one lower
level normal form to the higher level normal form.

A table in a relational database is said to be in a certain normal form if


it satisfies certain constraints. Normal form below represents a
stronger condition than the previous one.

Normalization towards a logical design consists of the following steps:


Unnormalized Form: Identify all data elements.
First Normal Form: Find the key with which you can find all data.
Second Normal Form: Remove part-key dependencies. Make all data
dependent on the whole key.
Third Normal Form: Remove non-key dependencies. Make all data
dependent on nothing but the key.

Note: For most practical purposes, databases are considered


normalized if they adhere to third normal form.

First Normal Form (1NF)


Requires that all column values in a table are atomic (e.g., a number
is an atomic value, while a list or a set is not).
We have tow ways of achieving this:
1. Putting each repeating group into a separate table and
connecting them with a primary key-foreign key relationship.
2. Moving this repeating groups to a new row by repeating the
common attributes. If so then find the key with which you can
find all data.
Definition: a table (relation) is in 1NF
If
o There are no duplicated rows in the table. Unique identifier.

6
Fundamentals of Database Systems Lecture Note
o Each cell is single-valued (i.e., there are no repeating
groups).
o Entries in a column (attribute, field) are of the same kind.

7
Fundamentals of Database Systems Lecture Note
Example for First Normal Form (1NF): Consider the following
unnormalized database table.
Stud_i Stud_nam Subject_na teache mar Exam_typ Total_mar
d e me r k e k
101 Akon OS,CNN Mr X 85 Practical A
102 Bkon Java Mr Y 60 Written C
103 Ckon C, C++ Mr Z 70 Interview B

First normal form (1NF)


Remove all repeating groups. Distribute the multi-valued attributes
into different rows and identify a unique identifier for the relation so
that is can be said is a relation in relational database.
Stud_i Stud_nam Subj_i Subject_nam teache mark Exam_typ Total_mar
d e d e r e k
101 Akon 1 OS Mr X 85 Practical A
101 Akon 2 CNN Mr X 85 Practical A
102 Bkon 3 Java Mr Y 60 Written C
103 Ckon 4 C Mr Z 70 Interview B
103 Ckon 5 C++ Mr Z 70 Interview B

Remember you can add additional fields/attributes in normalizing


database tables. As you can see in the above Subj_id is added to
organize the database table effectively and efficiently.
The (Stud_id, Subj_id) are a Composite primary key for the above
table.

8
Fundamentals of Database Systems Lecture Note

Second Normal form (2NF)


No partial dependency of a non-key attribute on part of the primary
key. This will result in a set of relations with a level of Second Normal
Form (2NF).

Note: Any table that is in 1NF and has a single-attribute (i.e., a non-
composite) primary key is automatically in 2NF.
Definition: a table (relation) is in 2NF
If
o It is in 1NF and
o If all non-key attributes are dependent on the entire
primary key i.e. if the primary key is composite key.
 No partial dependency.

Example 1:
 In the given table, non-prime attribute Teacher_name is
dependent on Subject_id which is a proper subset of a candidate
key. That's why it violates the rule for 2NF.
 To convert the given table into 2NF, we decompose it into two
tables:
Stud_i Stud_na Subj_i Teacher_na mar Exam_typ Total_mar
d me d me k e k
101 Akon 1 Mr X 85 Practical A
101 Akon 2 Mr X 85 Practical A
102 Bkon 3 Mr Y 60 Written C
103 Ckon 4 Mr Z 70 Interview B
103 Ckon 5 Mr Z 70 Interview B

Subject_Table
Subj_id Subj_nam Teacher_nam
e e
1 OS Mr X
2 CNN Mr X
3 Java Mr Y
4 C Mr Z
5 C++ Mr Z

9
Fundamentals of Database Systems Lecture Note

Or we can split/decompose the Techer table like this.


Teacher_i Teacher_nam
d e
001 Mr X
002 Mr Y
003 Mr Z

In the above Second Normal Form (2NF): Consider the following


database tables.
But in addition to this we have the following dependencies
FD1: {Subj_id} Subj_name
FD2: { Stud_id, Subj_id } teacher_name Partial dependency here.

Third Normal Form (3NF)


Eliminate columns dependent on another non-primary key - If
attributes do not contribute to a description of the key, remove them
to a separate table. In other words, avoid transitive dependency. This
level avoids update and delete anomalies.
Definition: a table (relation) is in 3NF
If
o It is in 2NF and
o There are no transitive dependencies between a primary
key and non-primary key attributes.

Example for Third Normal Form (3NF): Consider the following example:
Students of same batch (same year) live in one building or dormitory.

Stud_i Stud_nam Subj_i Teacher_i mar Exam_typ Total_mar


d e d d k e k
101 Akon 1 Mr X 85 Practical A
101 Akon 2 Mr X 85 Practical A
102 Bkon 3 Mr Y 60 Written C

10
Fundamentals of Database Systems Lecture Note
103 Ckon 4 Mr Z 70 Interview B
103 Ckon 5 Mr Z 70 Interview B

Subject_Table
Subj_id Subj_nam Teacher_nam
e e
1 OS Mr X
2 CNN Mr X
3 Java Mr Y
4 C Mr Z
5 C++ Mr Z

The above table is in the 2NF


The to avoid Transitive dependency here.
(Stud_id, Subj_id) is Composite primary key for the table
Non-key attribute Exam_type is dependent by composite key and non-key attribute
Total_mark also dependent by non-key attribute Exam_type.
Then (Stud_id, Subj_id)Exam_type and Exam_typeTotal_mark
Or
Non-key attribute Exam_type dependent on Stud_id and Exam_type dependent on
Subj_id.
Then (Stud_id, Subj_id)-Exam_type means that Subj_id  Exam_type and
Subj_id  Exam_type
To convert it to a 3NF we need to remove all transitive dependencies
of non-key attributes on another non-key attribute.

The non-primary key attributes, dependent on each other will be


moved to another table and linked with the main table using candidate
key- foreign key relationship.
Adding a primary key Score_id for Score_Table for relationship.

11
Fundamentals of Database Systems Lecture Note

Student_Table Score_Table
Stud_i Stud_na Score_i Stud_i Subj_i Teacher_ mar Exam_ty
d me d d d id k pe
101 Akon S1 101 1 001 85 Practical
102 Bkon S2 101 2 001 85 Practical
S3 102 3 002 60 Written
103 Ckon S4 103 4 003 70 Interview
S5 103 5 003 70 Interview
Exam_Table
Teacher_Table
Teacher_i Teacher_nam Teacher_i Total_mark
d e d
001 Mr X Practical A
002 Mr Y Written C
003 Mr Z Interview B
Subject_Table

Subj_id Subj_nam
e
1 OS
2 CNN
3 Java
4 C
5 C++

12
Fundamental Database System Lecture Note WLDU

Generally, there are other four additional levels of normalization, a table is


said to be normalized if it reaches 3NF. A database with all tables in the 3NF
is said to be Normalized Database.
Mnemonic for remembering the rationale for normalization up to 3NF could
be the following:
1. No repeating or redundancy: No repeating fields in the table.
2. The fields depend upon the key: The table should solely depend on
the key.
3. The whole key: No partial dependency. (No dependency on part of
primary key).
4. And nothing but the key: No inter-data dependency. There should not
be transitive dependency.
Advantages of Normalization
 Normalization helps to minimize data redundancy.
 Greater overall database organization.
 Data consistency within the database.
 Much more flexible database design.
Disadvantages of Normalization
 You cannot start building the database before knowing what the user needs.
 The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
 It is very time-consuming and difficult to normalize relations of a higher degree.
 Careless decomposition may lead to a bad database design, leading to serious problems.

Page 13

You might also like