0% found this document useful (0 votes)
47 views144 pages

Relational Database Design and Normalization

The document outlines the principles of relational database design, focusing on normalization and schema refinement. It details the phases of database design, informal guidelines for good design, and common anomalies such as insertion, deletion, and modification anomalies. The importance of maintaining semantics, reducing redundancy, and avoiding null values and spurious tuples is emphasized throughout the lecture.

Uploaded by

vedant Badiger
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views144 pages

Relational Database Design and Normalization

The document outlines the principles of relational database design, focusing on normalization and schema refinement. It details the phases of database design, informal guidelines for good design, and common anomalies such as insertion, deletion, and modification anomalies. The importance of maintaining semantics, reducing redundancy, and avoiding null values and spurious tuples is emphasized throughout the lecture.

Uploaded by

vedant Badiger
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Database Management

Systems

Lecture by:
Prathima M. G ,B.E., M.E ,
Assisstant professor
Dept. of Computer Science & Engineering
Bangalore Institute of Technology
Bangalore – 560 004
prathimamg@[Link]
Relational Database Design

Schema Refinement
RELATIONAL DATABASE
DESIGN

NORMALIZATION

BOTTOM - UP
APPROACH
(THROUGH
SYNTHESIS)
Phases of database
design:
 Step-01: Requirement Gathering
 Step-02: Conceptual database design

( ER_modeling)
 Step-03: Logical database design
(ER- Relational Mapping)
 Step-04: Schema refinement
(Normalization)
 Step-05: Implementation
(using SQL)
Overview
 Informal guidelines for database design
 The concept of Functional Dependencies (FDs)
 Trivial and Nontrivial Dependencies
 Closure of set of Functional Dependencies
 Minimal Set of FD
 Finding the Candidate key
 Normal Forms (1NF, 2NF,3 NF, and BCNF)
 Examples on Normalization
Relational Database Design
Schema refinement :
 The process of evaluating relational schemas
for design quality or
 Measuring the appropriateness/ Goodness of
relational schema other than the intuition of
designer
Approaches to database
design
 Analysis:
 Top- Down approach
 Identify ENTITIES and associate ATTRIBUTES.

 Synthesis:
 Bottom-Up approach
 Consider Individual attributes of a TABLE and
associate appropriately with a TABLE.
Normalization
 The formal process that can be followed
to achieve a good database design
 Also used to check that an existing
design is of good quality
 The different stages of normalization are
known as “normal forms”
 To understand this, we need to
understand the concept of functional
dependency
Informal Guidelines for Good
Database Design
Four Informal Measures
 SEMANTICS of the relation attributes must be
maintained.

 Reducing the REDUNDANT VALUES in tuples

 Reducing NULL VALUES in tuples

 Disallowing
the possibility of GENERATING
SPURIOUS TUPLES.
Drawbacks of an unnormalized
relation
 Consider a WASE DB.
 WASE needs to keep track of details regarding
 its STUDENT (like USN, Name, DOB, Gen, Addr,..)
 the COURSES/SUBJECTS offered like (Cno, Cname,
Sem)
 and also keep track of details regarding Student
being enrolled for many Courses and a Course
having many students enrolled for it along with the
Marks_Range obtained by each student in each
course he/she is enrolled
 and then award Grade based on Marks_range.
ER Diagram for WASE DB

Sname DOB Addr

USN
M_Range
WASE Student_Course Details

Grade
Cno Cname Sem
ER_Relational Mapping
WASE Student_Course Details Primary Key: USN,CNO

USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B
Guidelines-01:
Semantics of the attributes of relation must be
maintained:
Whenever we group attributes to form a Relation
 Design a relation schema so that it is easy to explain its
meaning.
 Do not combine attributes from multiple entity types
into a single relation.
 If we combine attributes from multiple entities and
relationships, semantic ambiguity will result and the
relation cannot be easily explained.
STUDENT_COURSE
USN Sname DOB Addr CNO Cname Sem M-range Grade

Although there is nothing wrong logically in the


STUDENT_COURSE relation, the mixing of attributes
from distinct real-world entities will not characterize
either of the Entities.
EMP_DEPT
Emp_No Ename DOB Sal DOJ Addr DeptNo Dname Dlocs
Guidelines-02
Reducing not Eliminating Redundancy in tuples of
a relation:
 One prime objective of any database design is to minimize storage
space used by base relations
 Another serious problem is of UPDATE ANOMALIES
 Insertion Anomalies
 Deletion Anomalies
 Modification Anomalies
Insertion Anomalies
INSERT INTO STUDENT_COURSE (USN,SNAME, DOB, Addr) VALUES
(‘S6’, ’JULIE’, ‘1-jan-94’, ‘CHI’);
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade
S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B
S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B

S4 John 1-feb-92 chi cs13 DS 3 80-89 A

S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C

S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A

S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B

S6 Julie 1-jan-94 chi ????

???? CS14 CN 3
Insertion Anomalies
Identifying Attribute: USN,CNO

USN Sname DOB Addr CNO Cname Sem M-range Grade


Insertion Anomalies:
Experienced when we attempt to store a value for one
field but cannot do so because the value of another
field is unknown.
Eg: To add a STUDENT to the database, we MUST
specify the course to which he has enrolled.
To add a COURSE to the database, we MUST specify
the student who has enrolled for the course..
Deletion Anomalies
 Experienced when the value of an attribute or field of a
relation is unexpectedly removed when value for another
an attribute/field is deleted.
Assume that a particular Student is no more. we need to
delete the student details.
 E.g., If we delete a Student S6 from the Table, then the
corresponding Cno, Cname, Sem, .. values of that row is
also deleted.
 This results in the loss of information. Here Course CS14
is removed from the database.
Deletion Anomalies
DELETE FROM STUDENT_COURSE WHERE USN=‘S6’;
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 Ram 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 Ram 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 Ram 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B
S6 Julie 1-jan-94 chi CS14 CN 3 80-89 A
S6 Julie 1-jan-94 chi CS11 DBMS 4 80-89 A
Modification Anomalies
 In STUDENT_COURSE relation, if we want o change the value of
Sname of a STUDENT from Ram to Ramkumar, then we must
update all the tuples of that attribute wherever that attribute Sname
exists.

 If we fail to update in some tuples, then the same USN will show
different values for Sname, making the database inconsistent.
Modification anamoly
WASE Student_Course Details Primary Key: USN,CNO
USN Sname DOB Addr CNO Cname Sem M-range Grade

S1 RamKumar 1-jan-92 blr cs11 DBMS 4 70-79 B


S1 RamKumar 1-jan-92 blr cs12 OS 4 80-89 A
S2 Shyam 1-jan-92 blr cs11 DBMS 4 90-100 A+
S3 Ram 1-jan-93 blr cs11 DBMS 4 80-89 A
S1 RamKumar 1-jan-92 blr cs13 DS 3 70-79 B
S4 John 1-feb-92 chi cs13 DS 3 80-89 A
S2 Shyam 1-jan-92 blr cs12 OS 4 60-69 C
S5 Jane 1-jan-94 chi cs12 OS 4 80-89 A
S5 Jane 1-jan-94 chi cs13 DS 3 70-79 B

S1 Ram 1-jan-92 blr cs14 PT 4 70-79 B


Guideline-2: A well
structured table
Well-structured table - contains minimal
redundancy and allows users to insert, modify,
and delete the rows without any errors or
inconsistencies or any possible anomalies like
• Insertion Anomalies
• Deletion Anomalies
• Modification Anomalies
Reducing null values in
tuples
Empno Ename DOB Sal Comm Designation DNO
E1 A 1-jan-80 10000 10000 salesman D1
E2 B 1-jan-80 100000 Manager D1
E3 C 1-jan-80 100000 Developer D1
. . . . . D2
E90 z 1-jan-81 20000 12000 salesman D2
. . . . 12000 salesman D3
E100 zzz 1-jan-79 100000 manager D3

If there are only three salesman of all 100 employees in the


Company, then having a column/attribute called Commission, which
will have 97 null values will waste space at the storage level.

Another problem with null value column is HOW to account for them
when AGGREGATE operations like COUNT, SUM,AVERAGE are
applied.
Null Values
Interpretation of Null values
 The attribute does not apply to this tuple
 The attribute value for his tuple is unknown
 The value is known but absent or not recorded.

Guideline-03:
 As far as possible , avoid placing attributes in a base relation
whose values may frequently be null. If nulls are unavoidable ,
make sure that they apply in exceptional cases only and do not
apply to majority of tuples in the relation.
Generation of Spurious
Tuples
 Consider 2 relations
Project Department
PNO Pname loc DNo DeptNO Dname loc
P1 PMS Blore D1 D1 Sales Blore
P2 BMS Blore D2 D2 R&D Mysore
P3 UMS Mglore D3 D3 Mktng Blore
P4 LMS Mysore D5 D4 Dev Mglore
P5 HMS Mglore D4 D5 Testing Mglore

P6 ERP Hubli D1

Project EQUI_JOIN(LOC) Department


Spurious Tuples
SELECT * FROM PROJECT P, DEPARTMENT D WHERE [Link]=[Link];

Project_Department Spurious Tuples


Pno Pname DNo Deptno Dname loc
P1 PMS D1 D1 Sales Blore
P1 PMS D1 D3 Mktng Blore
P2 BMS D2 D1 Sales Blore
P2 BMS D2 D3 Mktng Blore
P3 UMS D3 D4 Dev Mglore
P3 UMS D3 D5 Testing Mglore
P5 HMS D4 D4 Dev Mglore
P5 HMS D4 D5 Testing Mglore
P4 LMS D5 D2 R&D Mysore
Spurious Tuples
GuideLine-04
 Design Relation schemas so that they can be joined with
equity conditions on attributes that are either primary key
or Foreign key fields in a way that guarantees that no
SPURIOUS tuples are generated.
Normalisation
Database design may have some amount of
 Inconsistences
 Uncertainties
 Redundancies
Normalization is the Refinement process so
as to eliminate these drawbacks.
 It is Defined as a step-by-step process of
decomposing a complex relation into a
simple and stable data structure so as to
eliminate these drawbacks.
Normalization is based on the concept of Functional
Dependency.
Functional Dependency
(FD)
 A FD is defined as the constraint that exists between
the attribute sets of a relation.

 A FD, denoted by X  Y, between two sets of attributes


X and Y that are subsets of relation R specifies a
constraint on the possible tuples.

 The constraint is that, for any two tuples or records t1


and t2 of R that have t1[X]=t2[X], then they must also
have t1[Y]=t2[Y].

 This means that the values of Y component of a tuple in


R depend on or is determined by the values of X
NOTE:
 If a constraint on relation R states that there cannot be
more than one tuple with a given X value in R – then it is
a candidate key of R.
 The FD is a property of the SEMANTICS of the
ATTRIBUTES.
 The database designer will use their understanding of the
semantics of attributes of R to specify FD that hold on all
relation states.
 Consider a relation R with (X, Y) attributes.
 Attribute Y is functionally dependent on attribute X, iff. each
value of X determines EXACTLY ONE value of Y.
Functional
dependency(contd..)
 X  Y : We say here “x determines y” or “y is
functionally dependent on x”
 The left-hand side of the FD is some times called as
the Determinant and the right-hand side is called
Dependent.
 XY does not imply that YX
 If the value of an attribute “Marks” is known then the
value of an attribute “Grade” is determined since
Marks Grade
Types of functional
dependencies:
 – Full dependency
 – Partial dependency
 – Transitive dependency
[Link] dependencies
 An attribute B of a relation R is fully
functionally dependent on attribute A of R if it
is functionally dependent on A & not
functionally dependent on any proper subset of
A.
 Report( USN,CNO,Sname,DOB,Addr,Cname,
Sem,Marks_Range, Grade)
 USN,CNO Marks_Range

This implies that for a given pair of USN,CNO values


occurring in the relation Report there is exactly one
value of Marks. ie Marks are dependent on USN,CNO
as a composite pair, but not on either individually
[Link] dependencies
An attribute B of a relation R is partially dependent on
attribute A of R if it is functionally dependent on any
proper subset of A.

 Report(USN,CNO,Sname,DOB,Addr,Cname,
Sem,Marks_Range, Grade)
 CNOCname, Sem
 USN Sname,DOB,Addr

The attributes Sname,DOB,Addr are said to be


partially dependent on the key (USN,CNO) since
they are dependent only on USN and not on CNO
[Link]
dependencies
An attribute B of a relation R is transitively dependent on
attribute A of R if it is functionally dependent on an
attribute C Which in turn is functionally dependent on A
or any proper subset of A.

 Report(USN,CNO,Sname,DOB,Addr,Cname,
Sem,Marks_Range, Grade)
 USN,CNO Marks_Range
 Marks_Range Grade
AB and BC => AC
The attribute Room# is said to be transitively
dependent on the key C# since it is dependent on
LName which in turn is dependent on C#.
Inference Rules(IR) or Armstrong's Axioms
 IR-1: Reflexivity: If X ‫ ﬤ‬Y AND XX, XY
 IR-2: Augmentation: If X  Y, then XZ  YZ.
 IR-3: Transitivity: If X  Y and Y  Z,
then X  Z.
 IR-4: Decomposition: If X  YZ, then X  Y
and X  Z.
 IR-5: Union: If X  Y and X  Z,
then X  YZ.
 IR-6: Pseudo transitivity: XY, WYZ then
WXZ
 Note: IR1 through IR3 are called as Armstrong
Inference rules. They are Sound and Complete.
IR-1: Reflexivity: If X ‫ ﬤ‬Y AND XX,
XY

Proof: Let r is some relation state of R and


there exists 2 tuples t1 and t2
S.T. t1[x]=t2[x] then we must have
t1[y]=t2[y]
Because X ‫ ﬤ‬Y hence x->y must hold in R.
Ex:X={ssn,ename}
Y={ename}
{ssn,ename}->ename
 IR-2: Augmentation: If X  Y, then XZ 
YZ
Let x->y in a relation instance r of R then
1)t1[x]=t2[x]
2)t1[y]=t2[y]
3)t1[xz]=t2[xz]
4)From 1 and 3 t1[z]=t2[z]
5)from 2 and 4 t1[yz]=t2[yz]
Ex:{ssn}->{ename}
{ssn,add}->{ename,add}
{ssn,phn,add}->{ename,phn,add}
Therefore xz->yz
IR-3: Transitivity: If X  Y and Y
 Z,
then X  Z.
Let r is an relation instance of R with 2
tuples t1 and t2
S.t.1)t1[x]=t2[x]
2)t1[y]=t2[y]
3)t1[z]=t2[z]
From 1 and 3
x->z
Ex:SSN->DNO
DNO->DNAME
SSN->DNAME
 IR-4: Decomposition: If X  YZ, then X  Y
and X  Z.

1.x->yz(Given)
[Link]->y{IR1 & knowing that yz⊇y}
3.x->y{IR3 ON 1 &2}
Eg: ssn->ename,add
ssn->ename,ssn->add
 IR-5: Union: If X  Y and X  Z,
then X  YZ.

Proof of IR5 (using IR1 through IR3).


1. X →Y (given).
2. X → Z (given).
3. X → XY (using IR2 on 1 by augmenting with X;
notice that XX = X).
4. XY → YZ (using IR2 on 2 by augmenting with Y).
5. X → YZ (using IR3 on 3 and 4)
Ex: ssn->ename
ssn->add
ssn->ename,add
 IR-6: Pseudo transitivity: XY, WYZ then
WXZ

1.x->y(given)
[Link]->z(given)
[Link]->wy(IR2 on 1 with w)
[Link]->z(IR3 on 3 &2)
Ex:ssn->dno ename,dno->dname
then ename,ssn->dname
Normal Forms Based on
Primary Keys
 The normalization process, as first proposed by Codd
(1972a), takes a relation schema through a series of
tests to certify whether it satisfies a certain normal
form.
 Normalization of data can be considered a process
of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable
properties of
1) minimizing redundancy and
2) minimizing the insertion, deletion, and update
anomalies.
Definition. The normal form of a relation refers to
the highest normal form condition that it meets, and
hence indicates the degree to which it has been
normalized.
 Normal forms, when considered in isolation from
other factors, do not guarantee a good database
design.
 It is generally not sufficient to check separately that
each relation schema in the database is, say, in BCNF
or 3NF.
 Rather, the process of normalization

through decomposition must also confirm the


existence of additional properties that the relational
schemas, taken together, should possess. These would
include two properties:
1)The nonadditive join or lossless join property,
which guarantees that the spurious tuple generation
problem does not occur with respect to the relation
schemas created after decomposition

 The dependency preservation property, which


ensures that each functional dependency is
represented in some individual relation resulting after
decomposition.
 Denormalization:
 The process of storing the join of higher normal form relations
as a base relation—which is in a lower normal form .

Definitions of Keys and Attributes Participating in


Keys
 A superkey of a relation schema R = {A1, A2, ....,
An} is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]

 A key K is a superkey with the additional property


that removal of any attribute from K will cause K not
to be a superkey any more.
 If a relation schema has more than one key, each is
called a candidate key.
 One of the candidate keys is arbitrarily designated to be
the primary key, and the others are called secondary
keys.
 A Prime attribute must be a member of some
candidate key
 A Nonprime attribute is not a prime attribute—that
is, it is not a member of any candidate key.
First Normal Form: 1NF
 A relation schema is in 1NF :
if and only if all the attributes of the relation R are

atomic in nature.
 Atomic: the smallest level to which data may be
broken down and remain meaningful.
1NF implies:
 Composite attributes are represented only by their
component attributes
 Attributes cannot have multiple values

In relational database design it is not practically


possible to have a table which is not in 1NF.
Online Retail Application Tables –
1NF Normalized
Observation on Un Normalized Retail Application Table
CustomerDetails ItemDetails PurchaseDetails
1001 John 1500012351 STN001 Pen 10 A 5 50
1002 Tom 1200354611 BAK003 Bread 10 A 1 10
1003 Maria 2134724532 GRO001 Potato 20 B 1 20

Above observation violates 1NF definition


To bring it to 1NF we need to make the columns atomic
QtyPurch
CustomerId CustomerName Accountno ItemId ItemName UnitPrice Class ased NetAmt

1001 John 1500012351 STN001 Pen 10 A 5 50

1002 Tom 1200354611 BAK003 Bread 10 A 1 10

1003 Maria 2134724532 GRO001 Potato 20 B 1 20


Second Normal Form: 2NF
 A Relation is said to be in Second Normal Form if and only if :
 It is in the First normal form, and
 There is FULL DEPENDENCY retained or No partial dependency exists between
non-key attributes and key attributes.
 An attribute of a relation R that belongs to the candidate key of R is
said to be a key attribute and that which doesn’t is a non-key
attribute.
 To make a table 2NF compliant, we have to remove all the partial
dependencies
Second Normal Form :
Example
Functional Dependencies of Retail Application Table

RetailApplicationTable( CustomerId, ItemId, CustomerName,


AccountNo,ItemName, UnitPrice, Class,QtyPurchased,
NetAmount)

(i) CustomerId  CustName, AccountNo

(ii) ItemId  ItemName, UnitPrice, Class

(iii) CustId,ItemId QtyPurchased, NetPrice

(iv) UnitPriceClass
Second Normal Form :
(Cont..)
Key and Non Key Attributes of Retail Application Table

{CustomerId, ItemId} is Candidate key

Key Attributes: CustomerId,ItemId

Non Key Attribures: CustomerName,


AccountNo,
ItemName,
UnitPrice,
Class,
QtyPurchased,
NetAmount
Second Normal Form :
(Cont..)

Fully Functionally dependent on Key Attribute


CustomerId, ItemId QtyPurchased, NetPrice

Partial Dependency with respect to Key Attribute

CustomerId CustomerName,AccountNo

ItemId ItemName,UnitPrice,Class
Second Normal Form :
(Cont..)
After removing the Partial dependencies on Key Attributes we get
the below tables which aree in 2NF:

Customer
CustomerId CustomerName Accountno
1001 John 1500012351
1002 Tom 1200354611
1003 Maria 2134724532

Item
ItemId ItemName UnitPrice Class
STN001 Pen 10 A
BAK003 Bread 10 A
GRO001 Potato 20 B
Second Normal Form :
(Cont..)
ItemPurchase
CustomerId ItemId QtyPurchased NetAmt
1001 STN001 5 50
1002 BAK003 1 10
1003 GRO001 1 20
Third Normal Form: 3 NF
A relation R is said to be in the Third Normal Form (3NF) if and only if
 It is in 2NF and
 No transitive dependency exists between non-key attributes and key
attributes through another non key attribute.

A B C

It should
be key It should be
Attribute It should be non key
non key attribute
attribute

To make a table 3NF compliant, we have to remove all such Transitive


Dependencies
Third Normal Form :
Example
Let us consider Item table obtained after bringing Retail Application
table in to 2NF:

ItemI UnitPrice
d
UnitPrice Class

ItemId UnitPri Class


ce

The above dependencies violates 3NF definition


Third Normal Form :
(Cont..)
After removing the transitive dependencies from the Item table we
get the following two table

Item
ItemId ItemName UnitPrice
STN001 Pen 10
BAK003 Bread 10
GRO001 Potato 20

ItemClass
UnitPrice Class
10 A
20 B
ename ssn bdate add dno dname dmgrss
n
 Trivial Functional Dependency

If a functional dependency (FD) X → Y holds, where Y is a subset


of X, then it is called a trivial FD. Trivial FDs always hold.

Eg:{Emp_id, Emp_name} -> Emp_id is a trivial functional


dependency as Emp_id is a subset of {Emp_id,Emp_name}.

 Non-Trivial Functional Dependency


If an FD X → Y holds, where Y is not a subset of X, then it is called
a non-trivial FD.

Eg:{Emp_id, Emp_name} -> Emp_age is a non-trivial functional


dependency as Emp_age is not a subset of {Emp_id,Emp_name}.
Boyce-Codd normal
form:BCNF
 Boyce-Codd Normal Form (BCNF) is an extension of Third Normal
Form on strict terms. BCNF states that −
 For any non-trivial functional dependency, X → A, X must be a
super-key.
 In the above image, Stu_ID is the super-key in the relation
Student_Detail and Zip is the super-key in the relation ZipCodes.
So,
 Stu_ID → Stu_Name, Zip
and
Zip → City
 Which confirms that both the relations are in BCNF.
Boyce-Codd normal
form:BCNF
A relation R is in BCNF if, for every non-trivial functional
dependency AB in it, it is true that A is a superkey of R
In other words, every determinant is a candidate key
 BCNF is a stronger form of 3NF
 3NF states that every non-prime attribute must be non-
transitively dependent on every key
BCNF states that every attribute (prime or non-prime) must
be non-transitively dependent on every key
BCNF
 From the FD: F = {FD1:AB  CD, FD2:C  B} we know
that B is Dependent from FD2.
 Vni={A}
 Voi={ D}
A D
 Candidate Key is

A+ ={A}
AB+ ={A,B,D,C} B C
AC+ ={A,C,B,D}
 Logically R1{A,C,D} is a better choice over R1{A,B,D}
as the join operation is will not generate SPURIOUS
tuples.
Example - 2
 Consider R (City, Street, Zipcode) or R (C, S, Z) and F = {CS  Z,
Z  C}.
The candidate keys for R are CS and ZS (using dependency
graph).
 The relation R is in 3NF (since each attribute is prime) but not
in BCNF, because in Z  C, Z is not a superkey and also it is
not a trivial FD. In R, we cannot store the city to which a
zipcode belongs unless we know a street address with the
zipcode. This introduces insertion anomaly.
 To convert this into BCNF, decompose R into:
R1 = {Z, C} and R2 = {S, Z}
 If we have R2={C,S} as the other table, then
 We can’t have a Foreign key reference to link both the tables.
 Determinants of R2 (i.e.,C,S)does not determine any other attribute
BCNF
 Also, from the FD: F = {FD1:CS  Z, FD2:Z  C}
we know that C is Dependent from FD2.
 Vni={S}
 Voi={ }
 Candidate Key is S
Z
S ={S}
+

SZ+ ={S,Z,C}
C
SC+ ={C,S,Z}
 Logically R1{S,Z} is a better choice over R1{C,S} as
the join operation is will not generate SPURIOUS
tuples.
Example - 3
 Consider the relation GradeList (S, N, C, G}
FD-1: {Name, Course} GPA NC  G
FD-2: {StudentNo, Course}  GPA SC  G
FD-3: Name  StudentNo NS
FD-4: StudentNo  Name SN
 Candidate keys are:
{Name, Course} N G
{StudentNo, Course}

C S
 The relation is in 3NF.
 But redundancy of data.
The association between Name and the corresponding
StudentNo is repeated.
- insertion anomaly.
 There exists deletion anomaly too.
(if a student fails in all subjects, looses the student
information!).
 The relation Gradelist is not in BCNF, because
of FD-3 and FD-4 which are nontrivial and their
determinants (left-hand side) are not super
keys of GradeList.
Closure of a set of FD (F
+
)
 Given some FDs, new FDs can often be
inferred.
 The set of all FDs that are implied by a given
set F of FDs is called the closure of F and is
denoted by F +.
 Example:

Let F = {AB  C, C  B} be a set of FDs


satisfied by R (A, B, C). Then,
F + = {A  A, AB  A, B  B, AB  AC, AB
 BC, AB  ABC, etc.}
Example
Supposing we are given a relation R {A, B, C, D, E, F}
with a set of FDs as shown below:
F={A  BC,B  E,D  EF}. Show that the FD
AD  F holds for R and is a member of the closure.

(1) A  BC {Given}
(2) A  C {Decomposition of (1)}
(3) AD  CD {Augmentation of (2) by adding D}
(4) D  EF {Given}
(5) CD CEF {Augmentation}
(6) CD->C,CD->E,CD->F {DECOMPOSITION}
FROM 3 & 5 APLLY TRANSITIVITY
AD->CD,CD->F
(7) AD  F
Attribute closure, X+
 To compute F + , start with FDs in F; repeatedly apply IR-1 to
IR-3 until no new FD can be derived
 Armstrong's Axioms do not produce any incorrect FDs that are
added to F +. However, finding F + is too expensive; the
complexity grows exponentially
 The solution is to find the attribute closure of X, denoted as X
+
.
 Definition: For each such set of attributes X,
we determine the set X+ of attributes
that are functionally determined by X based on F;
X+ is called the closure of X under F.
Algorithm to find X+
Algorithm Attribute_Closure()
{
X + = X;
Repeat {
for each FD XY in F do X + = X +  Y
for each FD Y  Z in F do
if Y  X + then X+ =X +  Z
// i. e. if Y is in X +, the add Z to X +
until no change;
// until no more attributes are added to X +
}
}
Example: Let us consider
SSN ename pnumber pname plocation hours

 F={ssn->ename,
pnumber->{pname,plocation},
{ssn,pnumber}->hours}
Closure sets w.r.t F
{ssn}+={ssn,ename}
{pnumber}+={pnumber,pname,plocation}
{ssn,pnumber}
+={ssn,ename,pnumber,pname,plocation,hours}
Example
Consider R (A, B, C) and a set of FDs
F = {AB  C, C  B}
Using the Algorithm, we calculate the following
closure sets with respect to F:
A+ = {A},
B+ = {B},
C+ = {C, B} because of FD-2
{AB}+ = {ABC} because of FD-1 add attribute C
{AC}+ = {ACB} because of AC  AB (IR-2) add attribute
B
{BC}+ = {BC} nothing can be added
{ABC}+ = {ABC} nothing can be added
2.3 Equivalence of Sets
of FDs
 Two sets of FDs F and G are equivalent if:
 Every FD in F can be inferred from G, and
 Every FD in G can be inferred from F
 Hence, F and G are equivalent if F+ =G+
 Definition (Covers):
 F covers G if every FD in G can be inferred from F
 (i.e., if G+ subset-of F+)
 F and G are equivalent if F covers G and G covers
F
 There is an algorithm for checking equivalence of
sets of FDs
Method:

 Step1: Find the attribute closure for each FD in F


considering LHS with the FD’s in G.
 Step2: Find the attribute closure for each FD in G
considering LHS with the FD’s in F.
 Step3: Check whether the FD’s in F & G appear in
the attribute closure. If yes ,then they are equal.
Example1:R(A,B,C)
 F={A->B,B->C,C->A}
 G={C->B,B->A,A->C}

F COVERS G G COVERS F
A->B C->B
A+=ACB C+=ABC
B->C B->A
B+=ABC B+=ABC
C->A A->C
C+=ABC A+=ABC
F=G,THEN THEY ARE EQUIVALENT
Example2:R(A,B,C,D,E)
F={A->B,AB->C,D->AC,D->E}
G={A->BC,D->AE}
 G COVERS F F COVERS G
 A->BC A->B
 A+=ABC A+=ABC
 D->AE AB+=ABC
 D+=DACEB D+=DAEBC
 F=G
 THEN THEY ARE EQUIVALENT
Example3:
X={A->B,B->C}
Y={A->B,B->C,A->C}
Check if X covers y y x
Y covers x x y
 then X=Y

First take Y
Y=A->B Y=A->C A+=ABC
A+=ABC
Y=B->C
B+=BC
Example4:
 X={AB->CD,B->C,C->D}
 Y={AB->C,AB->D,C->D}

X COVERS Y Y COVERS X
AB->C AB+=ABCD
AB+=ABCD C+=CD
B+=BCD
C+=CD
Example5:
 F={A->B,A->C}
 G={A->B,B->C}
 F COVERS G G COVERS F
 A->B A->B
 A+=ABC A+=ABC
 B->C
 B+=B
Not equivalent
Minimal Cover (F I )
 A set of FDs F is minimal and can be
represented as a set of FDs G if it satisfies the
following conditions:
a) Every FD in G has a single attribute on its right-
hand side, i.e. X  A, where A is a single attribute.
b) No FD can be removed from G and still have a set of
FDs that is equivalent to F.
c) We can not replace any FD: X  A in F with a
dependency Y  A, where Y  A and still have a
set of dependencies that is equivalent to F.
Algorithm to find the Minimal cover
Algorithm MinimalCover(F)
{
Step-1: G = F.
Step-2: Transform G into a set of FD's with right hand side
containing only one attribute (Canonical cover).
Step-3: Eliminate a redundant attribute from left-side.
For each dependency A1, A2, ..., Ak  B in current set of G,
and each attribute Ai in its left-side,
if G - {A1, A2, ..., Ak  B }  {A1, A2, ..., Ai -1, Ai+1,..., Ak  B}
is equivalent to G.
then delete Ai from the left side of A1, A2, ..., Ak  B.
Step-4: Eliminate a redundant dependency
For each dependency X  Y in the current set of dependencies G
if G - {X  Y} is equivalent to G then delete X  Y from G.
}
Example 1: Let the given set of FDs be
E: {B → A, D → A, AB → D}. We have to
find the minimal cover of E.
 All above dependencies are in canonical form (that is, they
have only one attribute on the right-hand side), so we have
completed step 1 of Algorithm
and can proceed to step 2.
 In step 2 we need to determine if AB → D has any redundant
(extraneous) attribute on the left-hand side; that is, can it be
replaced by
B → D or A → D?
 Since B → A, by augmenting with B on both sides (IR2), we
have BB → AB, or B → AB (i).
However, AB → D as given (ii).
 Hence by the transitive rule (IR3), we get from (i) and (ii), B
→ D. Thus
AB → D may be replaced by B → D.

 We now have a set equivalent to original E, say E′: {B → A, D


→ A, B → D}.
 No further reduction is possible in step 2 since all FDs have a
single attribute on the left-hand side.
 In step 3 we look for a redundant FD in E′. By using the
transitive rule on B → D and D → A,
 we derive B → A. Hence B → A is redundant in E′ and can be
eliminated.
 Therefore, the minimal cover of E is
F: {B → D, D → A}.
Example
 Given F = {AB  C, A  D, BD  C, D  BG, AE  F}
Make all FDs with single attributes in the right-side (only
FD-4 need to be split).
Step-1: G = F
Step 2: G = {AB  C, A  D, BD  C, D  B, D  G,AE  F}
Step 3: The only FDs to be considered are
AB  C, BD  C.
For AB  C, we want to find whether A  C or B  C holds or not.
Because A  D, D  B implies A  B (transitivity).
Because A  B implies A  AB.
Because A  AB, AB  C implies A  C (transitivity).
Thus, G - {AB  C}  {A  C}  G.
Similarly, BD  C can be replaced by D  C, because D  B
implies D  BD, and
D  BD, BD  C implies D  C.

However, AE  F cannot be replaced by any other FD.


The result of Step-3 is given below:
G = {A  C, A  D, D  C, D  B, D  G,
AE  F}
Step-4: A  C can be removed from G, because A  D,
D  C implies A  C.

The minimal cover of original F is:


G = {A  D, D  C, D  B, D  G, AE  F}
Find the Non-redundant
cover of FI
R(A,B,C,D,E,H)
F={ABC, BE,ABHBD,DAEH,DHBC}

Find the canonical cover.


FC ={AB, AC , BE , ABHB , ABHD,
DA, DE, DH, DHB, DHC }
Potential FD’s for removing Redundant Fd’s are
AB, AC , BE
ABHB DHC DE,
DHB,
Find the Non-redundant
cover of FI
Consider the FD’s
AB
ABHB
……LHS is redundant /Augmentation of AB
DHB
…..LHS is composite, hence remove
Consider the FD’s
 AC
 DHC

…..LHS is composite, hence remove


Consider the FD’s
 BE ……B is dependent and also a determinant, but D is a
determinant of more attributes, hence remove
 DE
FI={AB , AC, DA, DE, DH, ABHD}
Candidate Keys
1: Draw the dependency graph of F. Each vertex corresponds
to an attribute.
Edges can be defined as follows:
AB becomes A B
A  BC becomes A B
C
AB  C becomes A
B C
2: Identify Vertices Vni that have No_Incoming
edges.
3: Identify Vertices Voi that have Only_Incoming
edges.
4: A candidate key is a set of attributes that
 contains all attributes in Vni.
 contains no attributes in Voi.
 has no subset that is already a candidate key.
Example - 1
 Consider R (A, B, C, G, H, I), and
F = {A  BC, CG  HI, B  H}.
B H
A C
I
G

Vni = {A, G} – ‘no incoming’


Voi = {H, I} – ‘only incoming’
(AG)+ = {A, B, C, G, H, I},
AG is the only candidate key
F= {A  D, AC, D  C, D  B, D  G, AE  H}

A D B

E C G
H

Vni = {B,C,G,H}
Voi = {A, E}
Candidate keys = AE
Example - 2
Consider R (A, B, C, D, E, H), and
F = {A  B, AB  E, BH  C, C  D, D  A}

A B C D

E H
Vni = {H}
Voi = {E}
Candidate keys = AH, BH, CH, and DH
BS={ABCD} So to find ck, w.r.t BS

AH+={AH} BH+={BH}
={AHB} ={BHC}
={AHBE} ={BHCD}
={AHBEC} ={BHCDA}
={AHBECD} ={BHCDAE}
CK1={AH} CK2={BH}

CH+={CH} DH+={DH}
={CHD}{CHDA}{CHDAB} ={DHA}
={CHADBE} ={DHAB}{DHABE}{DHABEC}
CK3={CH} CK4={DH}
Consider R (A, B, C, D, E, H),
and

1)F={A->C,C->D,D->B,E->F}
2)F={CH->G,A->BC,B->CFH,E->A,F->EG}
Find the candidate Keys
R={booktitle,authorname,booktype,list
price,affilitation,publication}

F={book-title->booktype,publication,
author name->affiliation
book type->list price}
Find the key and normalize.
1)Find vni={Btitle,Authorname}
2)Voi={publication,listprice,affilitation}
3){Btitle}+={Btitle,Btype,pub,listprice}#R
4){Authorname}+={Aname,Affilitation}#R
5){Btitle,Authorname}
+={btitle,btype,pub,listprice,authname,Aff}=R
 {Btitle,Aname} is the key

Btitle anam btype list aff pub


e
Functional Dependencies

Input: A relation R and a set of functional


dependencies F on the attributes of R.
1. Set K := R.
2. For each attribute A in K
{compute (K − A)+ with respect to F;
if (K − A)+ contains all the attributes in R, then set
K := K − {A} };
Example1:R(ssn,ename,pnu
mber,pname,ploc,hours)
 F={ssn->ename,
pnumber->{pname,plocation},
{ssn,pnumber}->hours}
Step1:K=R
Step2:for each attribute A in k compute (K − A)+
(ssn,pnumber,ename,pname,ploc)+=(ssn,ename,pnumber,pname,
ploc,hours)=R
Therefore we can remove hours from K
(ssn, pnumber,ename,,pname)+=(ssn,ename,pnumber,pname,
ploc,hours)=R
Therefore we can remove ploc from K
(ssn, pnumber, ename,)+=
(ssn,ename,pnumber,pname,plocation,hours)=R
Therefore we can remove pname from K
(ssn,pnumber)
+=(ssn,ename,pnumber,pname,plocation,hours)=R
Therefore we can remove ename from K.
Relational Design Algorithms and
Further Dependencies
Relational Databases Design Algorithms:
 Normal forms are not sufficient criteria for a good design
 Example: Any relation with two attributes is always in BCNF-
so we can create 2-attribute relations arbitrarily and get BCNF
 Additional conditions are needed to ensure a good design
 Lossless join property
 Dependency preserving property
1.2 The Dependency Preservation Property
 The database designers define a set F of functional
dependencies that should hold on the attributes of R
 D should preserve the dependencies; informally, the
collection of all dependencies that hold on the individual
relations Ri should be equivalent to F
 Formally:
 Define the projection of F on Ri, denoted by F(Ri), to be the set
of FDs X  Y in F+ such that (X  Y)  Ri
(cont.)

 A decomposition D = {R1, R2, …, Rm} is dependency


preserving if (F(R1)  F(R2)  …  F(Rm))+ = F+
 This property makes it possible to ensure that the FDs in F hold
simply by ensuring that the dependencies on each relation Ri
hold individually
 There is an algorithm to decompose R into a dependency
preserving decomposition D = {R1, R2, …, Rm} with respect
to F such that each Ri is in 3NF
 F+={F1  F2  …  Fn} +
 R1 R2 Rn
 RF R1(F1)

 R2(F2)
(cont.)
 Find a minimal set of FDs G equivalent to F
 For each X of an FD X  A in G

Create a relation schema Ri in D with the attributes {X 


A1  A2  …  Ak} where the Aj’s are all the attributes
appearing in an FD in G with X as left hand side
 If any attributes in R are not placed in any R , create another
i
relation in D for these attributes
 It can be proven that all dependencies in F are preserved and
that all relation schemas in D are in 3NF Called the relational
synthesis algorithm
(cont.)

 Problems:
 Must find a minimal cover G for F
 No efficient algorithm for finding a minimal cover
 Several minimal covers can exist for F; the result of the
algorithm can be different depending on which is chosen
Example1: F+=(F1UF2)+

[Link] R(A,B,C,D) with


F={A->B,A->C,C->D}
R1(A,B,C) R2(C,D)
F1={A->B,A->C} F2={C->D}
Solution: F+=(F1UF2)+

{A->B,A->C,C->D} ={A->B,A->C,C->D}
Therefore it is dependency preserved.
1.3 The Lossless(Non-Additive)
Join Property

 Informally, this property ensures that no spurious tuples appear


when the relations in the decomposition are JOINed
 Formally:
 A decomposition D = {R1, R2, …, Rm} of R has the lossless
join property with respect to a set F of FDs if, for every
relation instance r® whose tuples satisfy all the FDs in F, we
have:
(R1(r(R))  R2(r(R))  …  Rm(r(R))) = r(R)
 This condition ensures that whenever a relation instance r(R)
satisfies F, no spurious tuples are generated by joining the
decomposed relations r(Ri)
(cont.)

 Since we actually store the decomposed relations as base relations,


this condition is necessary to generate meaningful results for
queries involving JOINs
 There is an algorithm for testing whether a decomposition D
satisfies the lossless join property with respect to a set F of FDs
Algorithm 15.3. Testing for Nonadditive Join
Property
 Input: A universal relation R, a decomposition D = {R1, R2,
… , Rm} of R, and a set F of functional dependencies.
 Note: Explanatory comments are given at the end of some of
the steps. They follow the format: (*comment*).
1. Create an initial matrix S with one row i for each relation
Ri in D, and one column j for each attribute Aj in R.

2. Set S(i, j): = bij for all matrix entries. (*Each bij is a
distinct symbol associated with indices (i, j)*)

3. For each row i representing relation schema Ri


{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i, j): = aj;};};
(*Each aj is a distinct symbol associated with index (j)*)
4. Repeat the following loop until a complete loop execution
results in no changes to S
{for each functional dependency X → Y in F
{for all rows in S that have the same symbols in the columns
corresponding to attributes in X
{make the symbols in each column that correspond to an
attribute in Y be the same in all these rows as follows:
If any of the rows has an a symbol for the column, set the
other rows to that same a symbol in the column.
If no a symbol exists for the attribute in any of the rows, choose
one of the b symbols that appears in one of the rows for the
attribute and set the other rows to that same b symbol in the
column ;} ; } ;};
5. If a row is made up entirely of a symbols, then the
decomposition has the nonadditive join property; otherwise, it
does not
(cont.)
 There is an algorithm for decomposing R into BCNF relations such that
the decomposition has the lossless join property with respect to a set of
FDs F on R
 Set D  {R}
 While there is a relation schema Q in D that is not in BCNF do

Begin
Choose one Q in D that is not in BCNF;
Find a FD X Y in Q that violates BCNF;
Replace Q in D by two relation schemas(Q-Y) and (XY) end;
(cont.)
 This is based on two properties of lossless join
decomposition:
 The decomposition D = {R1, R2} of R has the lossless join
property w.r.t F if and only if either:
 The FD (R1R2)  (R1  R2) is in F+, or
 The FD (R1R2)  (R2  R1) is in F+
 If D = {R1, R2, …, Rm} of R has the lossless join property w.r.t
a set F, and D1 = {Q1,Q2, …, Qk} of Ri has the lossless join
property w.r.t. F(Ri) then D = {R1, R2, …, Ri-1, Q1, Q2, …, Qk.
Ri+1,… Rm} has the lossless join property w.r.t F
(cont.)

 Meaning: if D is already lossless in join,then any


decomposition Di of Ri that is also lossless in join will also
make D lossless in join.
 There is no algorithm for decomposition into BCNF relations
that is dependency preserving
 A modification of the synthesis algorithm guarantees both the
lossless join and dependency preserving properties but into
3NF relations (not BCNF)
 Fortunately, many 3NF relations are also in BCNF
Lossless join and dependency preserving
decomposition into 3NF relations:

 Find a minimal set of FDs G equivalent to F


 For each X of an FD X  Y in G

Create a relation schema Ri in D with the attributes {X  A1 


A2  …  Ak} where the Aj’s are all the attributes appearing
in an FD in G with X as left hand side
 If any attributes in R are not placed in any R , create another
i
relation in D for these attributes
 If none of the relations in D contain a key of R, create a
relation that contains a key of R and add it to D
1.4 Null Values and Dangling Tuples
 Null values may create problems if they appear as join attributes
 The distinction between the result of a regular join and an outer
join becomes important when specifying queries
 Some queries require regular join and other queries require outer
join
 Dangling tuples:
 Sometimes the attributes of a relation are “partitioned” into
several relations with the primary key repeated in each of the
relations
(cont.)

 Tuples whose primary key does not appear in all of the


relations are called “dangling tuples”
 If a regular join is taken on the relations on the primary key to
rebuild the tuples, the dangling tuples do not appear in the
result
Figure 15.2
Figure 15.3
2. Further Dependencies and
Normal Forms
 Functional dependencies(FDs) are used to
specify one very common type of constraint
 Other types of constraints cannot be specified
by FDs alone
 Additional dependencies include multivalued
dependencies(MVDs), join dependencies(JDs),
inclusion dependencies
 Some dependencies lead to normal forms
beyond 3NF and BCNF
2.1Multivalued Dependencies
and 4NF
 Informally, a set of attributes X
multidetermines a set of attributes Y if the
value of X determines a set of values for
Y(independently of any other attributes)
 A multi-valued dependency (MVD) is written
as X  Y
 Specifies a constraint on all relation
instances r(R)
 Formally:
 Let R be a relation schema; let X and Y be subsets
of the attributes in R; and let X = R  (XY) (the
remaining attributes)
(cont.)
 An MVD X  Y holds in R if whenever two tuples
t1 and t2 exist in a relation instance r(R) with
t1[X] = t2[X], then two tuples t3 and t4 must also
exist in r(R) such that the follwing holds:
t1: x y1 z1
 t3[X] = t4[X] = t1[X] = t2[X] t2: x y2 z2
 t3[Y] = t1[Y] and t4[Y] = t2[Y] t3: x y1 z2
t4: x y2 z1
 t3[Z] = t2[Z] and t4[Z] = t1[Z]  xy
 The MVD constraint implies that a value of X
determines a set of values of Y independently
from the values of Z
 Property of MVD: If X  Y holds, then X  Z
also holds

x  y/z
(cont.)
 An MVD X  Y is called a trivial MVD if either:
(a) Y  X or,
(b) (X  Y) = R
 A trivial MVD always holds according to the formal
MVD definition
Figure 15.4
(cont.)
 Given a set F of FDs and MVDs, we can infer
additional FDs and MVDs that hold whenever the
dependencies in F hold
 Sound and complete set of inference rules for
FDs and MVDs:
 (Reflexive for FDs) If Y  X, then X  Y
 (Augmentation for FDs) If X  Y, then XZ  YZ
(Notation: XZ stands for X  Z)
 (Transitive for FDs) If X  Y and Y  Z, then X  Z
(cont.)
 (Complementation for MVDs) If X  Y, then X 
Z (where Z = R  (X  Y))
 (Augmentation for MVDs) If X  Y and Z  W,
then WX  YZ
 (Transitive rule for MVDs) If X  Y and Y  Z,
then X  (Z  Y)
 (Relication rule FD to MVD) If X  Y, then X  Y
 (Coalescence rule for MVDs) If X  Y and there
exists W such that (a) Z  Y, (b) W  Z, and (c)
W  Y is empty, then X  Z
(cont.)
 Notes:
 By rule I7, every FD is also an MVD
 I1 to I8 can derive the closure F+ of a set of
dependencies F
 Fourth Normal Form(4NF):
 3NF and BCNF do not deal with multivalued
dependencies
 A relation schema with some non-trivial MVDs
may not be a good design (see Figures 15.4 and
15.5)
 4NF takes care of these problems, and implies
BCNF (every relation in 4NF is also in BCNF)
(cont.)
 Formal definition of 4NF:
 A relation schema R is in 4NF with respect to a set
of (functional and multivalued) dependencies F if
for every nontrivial multivalued dependency X 
Y in F+, X is a superkey of R
 Check Fig15.4(a) is not in 4NF; ENAME 
PNAME/DNAME But ENAME is not the superkey
 Notes:
 Since every FD is an MVD, 4NF implies BCNF
 If all dependencies in F are FDs, the definition for
4NF becomes the definition for BCNF
(cont.)
 There is an algorithm for decomposing R into 4NF
relations such that the decomposition has the
lossless join property with respect to a set F of
FDs and MVDs on R
 Algorithm 4.5:
 Set D  {R}
 While there is a relation schema Q in D that is not in
4NF do
begin
Choose one Q in D that is not in 4NF;
Find a nontrivial MVD X  Y in Q that violates 4NF;
Replace Q in D by two relation schemas (Q  Y) and (X
 Y)
end;
(cont.)
 Algorithm 15.5 is based on property LJ1:
 A decomposition D = {R1, R2} of R has the
lossless join property with respect to F if and
only if either:
 (a) (R1 R2)  (R1  R2) holds in F+, or
 (b) (R1 R2)  (R2  R1) holds in F+
Figure 15.5
2.2 Join Dependencies and
5NF
 A join dependency JD(R1, R2, …, Rn) is a
constraint on R
 Specifies that every legal instance r(R)
should have a lossless join decomposition
into R1, R2, …, Rn
 An MVD is a special case of a JD where n= 2
 A JD(R1, R2, …, Rn) is a trivial JD if some Ri =
R
 Fifth normal form (5NF):
 A relation schema R is in 5NF with respect to a set
F of FDs, MVDs, and JDs if for every nontrivial
JD(R1, R2, …, Rn), each Ri is a superkey of R
(cont.)
 5NF is also called PJNF (project-join normal form)
 Note: Unfortunately, the theory of MVD and
JD lossless join decomposition does not take
NULL values into account
2.3 Inclusion Dependencies
 Used to specify referential integrity and
class/subclass constraints between two
relations R and S
 An inclusion dependency R.X < S.Y
specifies that at any point in time, if r(R) and
s(S) are relation instances of R and S, then:
  (r(R))   (s(S))
X Y
 The attributes X of R and Y of S must be
compatible
(cont.)
 Sound and complete inference rules for
inclusion dependencies:
 (ID1) R.X < R.X
 (ID2) If R.X < S.Y, where X = {A1, A2, …, An}, Y =
{B1, B2, …, Bn}, and Ai corresponds to Bi, then Ai
< Bi for 1  i  n
 (ID3) If R.X < S.Y and S.Y < T.Z, then R.X < T.Z
 So far, there are no normal forms based on
inclusion dependencies
 Example: [Link] < [Link]
 WORKS_ON.SSN < [Link]
 WORKS_ON.PNUMBER < [Link]
Merits of Normalization
 Normalization is based on a mathematical

foundation.

 Removes the redundancy to a greater

extent.

 Removes the anomalies present in INSERTs,

UPDATEs and DELETEs.


Demerits of Normalization
 Data retrieval or SELECT operation

performance will be severely affected.

 Normalization might not always represent

real world scenarios.


Summary of Normal Forms
Input Operation Output
Un-normalized Create separate rows or columns
for every combination of multi Table in 1 NF
Table valued columns

Table in 1 NF Eliminate Partial dependencies Tables in 2NF

Tables in 2 NF
Eliminate partial dependency Tables in 3 NF
Summary
 While converting ERD into relational schema, each strong entity becomes a table.

 Each weak entity becomes a table.

 For each M:N relationship new table is created.

 Normalization is a refinement process. It helps in removing anomalies present in


INSERTs/UPDATEs/DELETEs.

 There are three normal forms that were defined being commonly used.

 1NF makes sure that all the attributes are atomic in nature.

 2NF removes the partial dependency.


End of Chapter

You might also like