3.
1
Normalization and Database Design:
Functional Dependencies: Armstrong's Axioms, Definition,
Properties (Reflexivity, Augmentation, Transitivity), Types
(Trivial, Non-Trivial, Partial and Full Functional Dependency),
Closure of Functional Dependencies, Normal Forms (1NF,
2NF, 3NF, BCNF), Denormalization.
Informal Design Guidelines for Relation Schemas
• In designing database schema, 2 major pitfalls must ne avoided:
➢ Redundancy
➢ Incompleteness
• The four major informal guidelines to be followed are:
1. Impart clear Semantics to attributes in a relation.
2. Reduce redundant information in tuples.
3. Reduce NULL values in tuples.
4. Disallow the possibility of generating Spurious(false) tuples.
1. Impart Clear Semantics to Attributes in a
Relation.
• The Semantics of a relation refers to its meaning resulting from
the interpretation of attribute values in a tuple.
• Guideline 1
1. Design a relation schema so that it is easy to explain its
meaning.
2. Do not combine multiple entity types and relationship types
into a single table.
Example
EmpId Name Designation Phone
Tbl_employee
2. Redundant Information in Tuples & Update
Anomalies
• One goal of schema design is to minimize the
space used by base relations.
• Also the following anomalies must be avoided in
database design:
1. Insertion Anomalies
2. Deletion Anomalies
3. Update Anomalies
Problems Caused by Redundancy
Hourly_Emps
(ssn , name , lot , rating , hourly_wages, hours_worked)
• Redundant Storage :The rating value 8
corresponds to the hourly wage 10 , and this
association is repeated three times
• Update Anomalies :The hourly_wages in the
first tuple could be updated without making a
similar change in the second tuple.
• Insertion Anomalies : We cannot insert a tuple
for an employee unless we know the hourly
wage for the employee’s rating value.
• Deletion Anomalies : If we delete all tuples
with a given rating value we lose the
association between that rating value and its
hourly_wage value
Problems Caused by Redundancy
❖ Main Refinement Technique:
❖Decomposition
Decomposition
• Decomposition of a relation schema R consists
of replacing the relation schema by two ( or
more) relation schemas that each contain a
subset of the attributes of R and together
include all attributes in R.
We can decompose Hourly_Emps into two relations
4. Generation of spurious Tuples
• Spurious Tuples are those rows in a table, which
occur as a result of joining two tables in wrong
manner.
Guideline 4
Design relations so that they can be joined with
equality conditions on attributes that are
appropriately related( primary key, foreign key)
pairs in a way that guarantees that no spurious
tuples are generated.
NORMALIZATION
o Normalization is the process of organizing the data in
the database.
o Normalization is used to minimize the redundancy
from a relation or set of relations.
o It is also used to eliminate undesirable characteristics
like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller
and links them using relationships.
o The normal form is used to reduce redundancy from
the database table.
• The main reason for normalizing the relations is
removing anomalies.
• Failure to eliminate anomalies leads to data
redundancy and can cause data integrity and other
problems as the database grows.
• Normalization consists of a series of guidelines that
helps to guide you in creating a good database
structure.
NEED FOR NORMALIZATION
• The Process of decomposing relations with anomalies to
produce smaller , well structured relations.
• Normalization is primarily a technique to validate and
improve a logical design, so that it satisfies certain
constraints that avoid unnecessary duplication of data.
15
Advantages of Normalization
❑ Normalization helps to minimize data redundancy.
❑ Greater overall database organization.
❑ Data consistency within the database.
❑ Much more flexible database design.
❑ Enforces the concept of relational integrity.
Disadvantages of Normalization
• You cannot start building the database before
knowing what the user needs.
• The performance degrades when normalizing the
relations to higher normal forms, i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize
relations of a higher degree
Levels of Normalization
• Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)
– Third Normal Form (3NF)
– Boyce-Codd Normal Form (BCNF)
• Most databases should be 3NF or BCNF in order
to avoid the database anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each higher level is a subset of the lower level
First Normal Form(1NF)
• First Normal Form disallows multi valued attributes ,
composite attributes and their combinations.
• It states that the domain of an attribute must include
only atomic ( simple Indivisible ) values and that the
value of any attribute in a tuple must be a single value
from the domain of that attribute.
20
1NF (Continued)
EXAMPLE :
A Department in a company could have any number of locations.
Consider the following relation DEPARTMENT_LOCATION
DNO DNAME DLOCATION
RE RESEARCH { CHENNAI, Kochi , PUNE}
AD ADMINISTRATION { CHENNAI}
DC DATA COLLECTION {CHENNAI , Delhi , Mumbai}
Is the above relation in 1NF ?
21
1NF (Continued)
No , The above relation DEPARTMENT_LOCATION is not in 1NF
Justification :
❑ The domain of DLOCATION contains set of values and hence is
non atomic.
22
1NF (Continued)
How to Normalize ?
There are three main techniques to achieve first normal form
for such a relation.
Solution 1
❑ Remove the attribute DLOCATION that violates 1NF and place it
in a separate relation along with the primary key DNO of
DEPARTMENT_LOCATION.
❑ The primary key of the new relation is a combination of { DNO ,
DLOCATION}. This decomposes the non 1NF relation in to two
1NF relations.
23
1NF (Continued) DNO DLOCATION
Solution 1 : RE Kochi
R1 R2 RE CHENNAI
RE PUNE
DNO DNAME
RE RESEARCH AD CHENNAI
AD ADMINISTRATION DC CHENNAI
DC DATA COLLECTION DC Delhi
DC Mumbai
24
1NF (Continued)
How to Normalize ?
Solution 2.
❑ Expand the key so that there will be a separate tuple in the
original relation for each location of a department.
❑ In this case , the primary key becomes a combination of {DNO ,
DLOCATION}.
❑ This solution has the disadvantage of introducing redundancy in
the relation.
25
1NF (Continued)
Solution 2 :
DNO DNAME DLOCATION
R RE RESEARCH CHENNAI
RE RESEARCH Kochi
RE RESEARCH PUNE
AD ADMINISTRATION CHENNAI
DC DATA COLLECTION CHENNAI
DC DATA COLLECTION Delhi
DC DATA COLLECTION Mumbai
26
1NF (Continued)
Solution 3 :
❑ If a maximum of values is known for the attribute , for
example, if it is known that at most three locations can exist
for a department – replace the DLOCATION attribute by
three atomic attributes : DLOCATION1 , DLOCATION2,
DLOCATION3.
❑ This solution has the disadvantage of introducing null
values if most departments have fewer than three locations
27
1NF (Continued)
Solution 3 :
R
DNO DNAME DLOCATION1 DLOCATION2 DLOCATION3
RE RESEARCH CHENNAI Kochi PUNE
AD ADMINISTRATION CHENNAI
DC DATA CHENNAI Delhi Mumbai
COLLECTION
28
Functional Dependencies
• Functional dependency FD is a set of constraints
between two attributes in a relation.
• If one set of attributes in a table determines another set
of attributes in the table, then the second set of
attributes is said to be functionally dependent on the
first set of attributes.
• Functional dependency says that if two tuples have
same values for attributes A1, A2,..., An, then those
two tuples must have to have same values for attributes
B1, B2, ..., Bn.
• Functional dependency is represented by an
arrow sign → that is, X→Y, where X functionally
determines Y.
• The left-hand side attributes determine the
values of attributes on the right-hand side.
Example 1
Armstrong’s Axioms
• Armstrong's axioms are a set of inference
rules used to derive new functional
dependencies from existing ones in relational
database design.
• They are fundamental for understanding and
manipulating data relationships within a
database.
• The three primary axioms are
– Reflexivity,
– Augmentation, and
– Transitivity
1. Reflexivity:
• If a set of attributes Y is a subset of another set X, then X
functionally determines Y.
• For example, if X = {A, B} and Y = {A}, then X -> Y is
true because A is part of X.
• This is also known as the "trivial dependency".
• Reflexivity: If Y ⊆ X, then X → Y.
2. Augmentation:
• If X functionally determines Y, then adding the
same attributes to both sides of the dependency
still results in a valid dependency.
• For instance, if X -> Y is true, then adding Z
to both sides (XZ -> YZ) also results in a valid
dependency.
Augmentation: If X → Y , then XZ → YZ.
3. Transitivity:
• If X functionally determines Y, and Y
functionally determines Z, then X functionally
determines Z.
• For example,
– if X -> Y and Y -> Z, then X -> Z.
Transitivity: If X → Y and Y → Z, then X → Z
Types of functional dependencies
1. Trivial Functional Dependency
2. Non-Trivial Functional Dependency
3. Partial Functional Dependency and
4. Full Functional Dependency
Trivial Functional Dependency
• A trivial FD exists when a set of attributes is
functionally dependent on a subset of itself.
• For example, if {A, B} -> {A}, it's trivial because
A is part of {A, B}.
• Symbolically, if X -> Y and Y is a subset of X,
then it's trivial.
Example
Rollno Studentname Age
42 Serah 17
43 Rohit 18
44 Meera 18
• {Rollno,Studentname} → Studentname
• Rollno → Rollno
Non-Trivial Functional Dependency
•A non-trivial FD exists when a set of attributes is functionally
dependent on another set of attributes that is not a subset of
itself.
•For example, if {A, B} -> {C}, and {C} is not a subset of {A,
B}, it's non-trivial.
•If Y is not a subset of X, then X -> Y is non-trivial.
Example
Rollno Studentname Age
42 Serah 17
43 Rohit 18
44 Meera 18
• Rollno→ Studentname is non trivial, as
studentname is not part of rollno.
• {Rollno , Studentname}→ Age is also non
trivial
Partial Functional Dependency
• A dependency from X →Y is said to be partial
functional dependency if removal of some
attribute from X the dependency still holds.
• The value of Y attribute can be determines by a
part of X attributes) .
• If X and Y form a composite key and X→Z, then
it is partial.
StudentID CourseID CourseName
101 CSE101 DS
102 CSE102 OS
103 CSE103 DS
• {StudentID, CourseID} is the composite
key.
• If courseID→CourseName , is a partial
dependency since only part of the key
determines a non key attribute.
Full Functional Dependency
• A dependency from X →Y is said to be a full
functional dependency if removal of some
value from X means the dependency will no
more hold.
• If X and Y together
StudentID CourseID Grade
101 CSE101 A
102 CSE102 B+
103 CSE103 A-
• {StudentID, CourseID} → grade is full
dependency, as both parts are needed to
determine the grade.
Second Normal Form (2NF)
• For a table to be in 2NF, there are two requirements
– The database is in first normal form.
– All non key attributes in the table must be functionally
dependent on the entire primary key.
– Further, if the key has more than one attribute, then no
non-key attributes should be functionally dependent
upon a part of the key attributes.(Partial Dependency)
Second Normal Form
• Each non prime attribute must be fully functionally
dependent on the key of the relation.
• That is , No Partial Functional dependencies on the key
must exist.
• Prime Attribute: An attribute is said to be prime if it is a
candidate key, primary key or part of candidate key or
primary key.
47
Third Normal Form
• For a relation to be in Third Normal Form,
– it must be in Second Normal form and the
following must satisfy.
– No non-prime attribute is transitively dependent
on prime key attribute.
❖In other words, if functional dependency exist
between two non-key attributes , then the
relation is not in 3NF.
• Now to change the table to the third normal form, you need to divide the
table as shown below:
Boyce-Codd Normal Form(BCNF)
A relation schema R is in BCNF if:
• It is in 3NF.
• For every functional dependency X->Y, X should be a
superkey of the table.
• You have a functional dependency X → Y. In
the particular functional dependency, X has to
be the part of the super key of the provided
table.
Comparison of Normal Forms
Normal Form Test Remedy
1 NF Relation should not have any Form new relations for each
multivalued attributes. multivalued attribute.
2 NF For relations where primary Decompose and set up a
key contains more than one new relation for each partial
attribute, no non key key with its dependent
attribute should be attributes.
functionally dependent on a
part of the primary key.
Normal Form Test Remedy
3NF Relation should not have a Decompose and set up a
non key attribute relation that includes non-
functionally determined by key attributess that
another non-key attribute. functionally determines
ie there should not be a other non key attributes.
transitive dependency with
the primary key.
BCNF For every functional Decompose the relation
dependendency X->A in the such that the FD on a non
relation, X must be a super key is preserved in
superkey. some other relation.
Closure of functional dependencies
•The Closure of Functional Dependency means the complete
set of all possible attributes that can be functionally derived
from given functional dependency using the inference rules
known as Armstrong’s Rules.
•If “F” is a functional dependency then closure of functional
dependency can be denoted using “{F}+”.
•There are three steps to calculate closure of functional
dependency. These are:
• Step-1 :
– Add the attributes which are present on Left Hand Side in
the original functional dependency.
• Step-2 :
– Now, add the attributes present on the Right Hand Side of
the functional dependency.
• Step-3 :
– With the help of attributes present on Right Hand Side,
check the other attributes that can be derived from the other
given functional dependencies.
• Repeat this process until all the possible attributes
which can be derived are added in the closure.
Example
• Consider the table student_details having
(Roll_No,Name,Marks,Location) as the
attributes and having two functional
dependencies.
• Now, We will calculate the closure of all the
attributes present in the relation using the
three steps mentioned below.
Denormalization
• Denormalization in DBMS is a technique used
to optimize database performance by adding
redundant data to a normalized database.
• This reduces the number of joins required to
retrieve data, leading to faster query
execution times.
• However, it also increases data redundancy
and can potentially lead to data
inconsistencies if not managed carefully.