0% found this document useful (0 votes)
10 views53 pages

Unit2 Database Design

The document outlines the process of database design, including understanding the organization, identifying entities and attributes, creating E-R diagrams, and normalizing tables to avoid redundancy and anomalies. It emphasizes the importance of functional dependencies and normalization levels, such as 1NF, 2NF, and 3NF, to ensure data integrity and efficient storage. Additionally, it discusses the significance of avoiding update, insert, and deletion anomalies through proper database structuring.

Uploaded by

Andi Mandi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

Unit2 Database Design

The document outlines the process of database design, including understanding the organization, identifying entities and attributes, creating E-R diagrams, and normalizing tables to avoid redundancy and anomalies. It emphasizes the importance of functional dependencies and normalization levels, such as 1NF, 2NF, and 3NF, to ensure data integrity and efficient storage. Additionally, it discusses the significance of avoiding update, insert, and deletion anomalies through proper database structuring.

Uploaded by

Andi Mandi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Design

Project Design
• Understand the organization/ Environment
• Convert the functional units into entities
• Identify the entities and Attributes and Relations amongst
entities very correctly
• Draw an E-R Diagram
• Reduce your E-R Diagram into set of tables i.e. database
• Put all the integrity constraints of the organization
• Bring your databases into normal forms(Normalization)
• Then create your back-end.
• And at last front end.
Properties of Good Database Design

Good Database design must possess following properties.

• Avoids Redundancy in data.


• Database design should be dependency preserving.
Example
• Consider the relation schema: (Lending Schema)

Branch_Nam Branch_city Assets Customer_n Loan_ Amount


e ame number
• Warje Pune 400000 Sachin L- 10 10,000

Varali Mumbai 500000 Ajay L-25 25,000

Katraj Pune 450000 Rahul L-67 35,000

Warje Pune 400000 Neeta L-87 20,000

CIDCO Aurangabad 600000 Ramesh L-34 10,000

Kothrud Pune 650000 Sachin L-65 15,000

Warje Pune 400000


Purpose of Normalization
• To avoid redundancy (less storage space needed, and
data is consistent).
• To avoid insert/update/delete anomalies.
• It is used to remove the duplicate data and database
anomalies from the relational table.
• Normalization helps to reduce redundancy and
complexity by examining new data types used in the table.
• It is helpful to divide the large database table into smaller
tables and link them using relationship.
• It avoids duplicate data or no repeating groups into a
table.
Data Redundancy and Update Anomalies
• If a database design is not perfect, it may contain
anomalies.
• Managing a database with anomalies is next to impossible.
• Update anomalies :
– If data items are scattered and are not linked to each
other properly, then it could lead to strange situations.
– For example, when we try to update one data item
having its copies scattered over several places, a few
instances get updated properly while a few others are
left with old values.
– Such instances leave the database in an inconsistent
state.
• Insert anomalies: We tried to insert data in a record that
does not exist at all.
• Deletion anomalies: We tried to delete a record, but parts
of it was left undeleted because of unawareness, the data is
also saved somewhere else.
• Data Redundancy: Data redundancy is a condition
created within a database or data storage technology in
which the same piece of data is held in two separate places.
• To overcome these anomalies, we need to normalize the
data.
Functional Dependencies
Functional dependency describes the relationship
between attributes in a relation. For example, if A and B are
attributes of relation R, and B is functionally dependent on A
( denoted A B), if each value of A is associated with
exactly one value of B. ( A and B may each consist of one
or more attributes.)

B is functionally
A B
dependent on A

Determinant Refers to the attribute or group of attributes on the


left-hand side of the arrow of a functional
dependency
Functional Dependencies
1. If one set of attributes in a table determines another
set of attributes in the table, then the second set of
attributes is said to be functionally dependent on the
first set of attributes.

Example 1
ISBN Title Price Table Scheme: {ISBN, Title, Price}
0-321-32132-1 Balloon $34.00 Functional Dependencies: {ISBN}  {Title}
0-55-123456-9 Main Street $22.95 {ISBN}  {Price}
0-123-45678-0 Ulysses $34.00

1-22-233700-0 Visual $25.00


Basic
Functional Dependencies (3)

Main characteristics of functional dependencies in normalization

• Have a one-to-one relationship between attribute(s) on the left- and


right- hand side of a dependency;

• hold for all time;

• are nontrivial.
Functional Dependencies (2)
Trival functional dependency means that the right-hand side is a
subset ( not necessarily a proper subset) of the left- hand side.

For example:
staffNo, sName  sName
staffNo, sName  staffNo

They do not provide any additional information about possible integrity constraints
on the values held by these attributes.

We are normally more interested in nontrivial dependencies because they represent


integrity constraints for the relation.
Normalization
• Normalization is a technique for
producing a set of relations with desirable
properties, given the data requirements of
an enterprise.
• The process of normalization is a formal
method that identifies relations based on
their primary or candidate keys and the
functional dependencies among their
attributes.
Normalization Definition
• It is a process which allows you to remove redundant
data within your database and also avoids different
types of anomalies(insert, deletion and modification).
• This involves restructuring the tables to successively
meeting higher forms of Normalization.
• A properly normalized database should have the following
characteristics
– Scalar values in each fields
– Absence of redundancy.
– Minimal use of null values.
– Minimal loss of information.
Levels of Normalization
• Levels of normalization based on the amount of
redundancy in the database.
• Various levels of normalization are:
– First Normal Form (1NF)
– Second Normal Form (2NF)

Redundancy

Number of Tables
– Third Normal Form (3NF)

Complexity
– Boyce-Codd Normal Form (BCNF)
– Fourth Normal Form (4NF)
– Fifth Normal Form (5NF)
– Domain Key Normal Form (DKNF)

Most databases should be 3NF or BCNF in order to avoid the


database anomalies.
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF

Each higher level is a subset of the lower level


First Normal Form
• A relation is in first normal form if every
attribute in that relation is singled valued
attribute.
ID Name Courses

1 Smith c1, c2
2 John c3
3 Kedar c2, c3
1NF Tables: Repeating Attributes Removed
Project Code Employee No. Employee Name Department No. Department Name Hourly Rate

PC010 S10001 A Smith L004 IT 22.00


PC010 S10030 L Jones L023 Pensions 18.50
PC010 S21010 P Lewis L004 IT 21.00
PC045 S10010 B Jones L004 IT 21.75
PC045 S10001 A Smith L004 IT 18.00
PC045 S31002 T Gilbert L028 Database 25.50
PC045 S13210 W Richards L008 Salary 17.00
PC064 S31002 T Gilbert L028 Database 23.25
PC064 S21010 P Lewis L004 IT 17.50
PC064 S10034 B James L009 HR 16.50

Project Code Project Title Project Manager Project Budget

PC010 Pensions System M Phillips 24500


PC045 Salaries System H Martin 17400
PC064 HR System K Lewis 12250
Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
– The database is in first normal form
– All nonkey attributes in the table must be functionally
dependent on the entire primary key

No
te:Rememberthatwearedeain
l gw
ithnon
-key
at ib rt u tes
2NF Tables: Partial Key Dependencies Removed
Project Project Title Project Project
Code Manager Budget
Project Employee Hourly
Code No. Rate PC010 Pensions System M Phillips 24500
PC010 S10001 22.00 PC045 Salaries System H Martin 17400
PC010 S10030 18.50 PC064 HR System K Lewis 12250
PC010 S21010 21.00
PC045 S10010 21.75 Employee Employee Departme Department
No. Name nt Name
PC045 S10001 18.00 No.

PC045 S31002 25.50 S10001 A Smith L004 IT


PC045 S13210 17.00 S10030 L Jones L023 Pensions
PC064 S31002 23.25 S21010 P Lewis L004 IT
PC064 S21010 17.50 S10010 B Jones L004 IT
PC064 S10034 16.50 S31002 T Gilbert L028 Database
S13210 W Richards L008 Salary
S10034 B James L009 HR
Third Normal Form (3NF)
For a table to be in 3NF, there are two requirements
– The table should be second normal form
– No attribute is transitively dependent on the primary
key
3NF Tables: Transitive Dependencies Removed
Project Employee No. Hourly
Code Rate
Project Project Title Project Project Budget
Code Manager PC010 S10001 22.00
PC010 Pensions System M Phillips 24500 PC010 S10030 18.50
PC010 S21010 21.00
PC045 Salaries System H Martin 17400
PC045 S10010 21.75
PC064 HR System K Lewis 12250
PC045 S10001 18.00
Employee No. Employee Name Department No. * PC045 S31002 25.50
S10001 A Smith L004 PC064 S31002 23.25

S10030 L Jones L023 PC064 S21010 17.50


S21010 P Lewis L004 PC064 S10034 16.50
S10010 B Jones L004 Department Department
No. Name
S31002 T Gilbert L023
L004 IT
S13210 W Richards L008
L023 Pensions
S10034 B James L0009
L028 Database
L008 Salary
L009 HR
Functional Dependency Theory
1. Single valued FD
2. Multivalued FD
3. Trivial FD
4. Non-trivial FD
5. Transitive FD
Multivalued functional dependency

Multivalued dependency occurs when there are more


than one independent multivalued attributes in a table.

bike_model manuf_year color


M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Trivial Functional Dependency
1. Trivial − If a functional dependency (FD) X →
Y holds, where Y is a subset of X, then it is
called a trivial FD. Trivial FDs always hold.

2. Non-trivial − If an FD X → Y holds, where Y is


not a subset of X, then it is called a non-trivial
FD.
Closure of a Set of Functional Dependencies
• Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
– E.g. If A → B and B → C, then we can infer that A → C
• The set of all functional dependencies logically implied by F is the
closure of F.
• We denote the closure of F by F+.
• We can find all of F+ by applying Armstrong’s Axioms:
– if βc α, then α → β (reflexivity)
– if α → β, then yα → yβ (augmentation)
– if α → β, and β → y, then α → y (transitivity)
• These rules are
– sound (generate only functional dependencies that actually hold)
and
– complete (generate all functional dependencies that hold).
Closure of Functional Dependencies (Cont.)

• We can further simplify manual computation of


F+ by using the following additional rules.
1. If α → β holds and α → yholds, then α → βy
holds (union/additivity)
2. If α → βyholds, then α → β holds and α → y
holds (decomposition/projectivity)
3. If α → β holds and yβ → δ holds, then α y→ δ
holds (pseudo transitivity)
The above rules can be inferred from Armstrong’s
axioms.
Example
• R = (A, B, C, G, H, I)
F={ A→ B
A → C
CG → H
CG → I
B → H}
• some members of F+
– A→ H
• by transitivity from A → B and B → H
– AG → I
• by augmenting A → C with G, to get AG → CG
and then transitivity with CG → I
– CG → HI
• from CG → H and CG → I : “union rule” can be inferred from
Canonical Cover
• Sets of functional dependencies may have redundant
dependencies that can be inferred from the others
– Eg: A → C is redundant in: {A → B, B → C, A → C}
– Parts of a functional dependency may be redundant
• E.g. on RHS: {A → B, B → C, A → CD} can be simplified to
{A → B, B → C, A → D}
• E.g. on LHS: {A → B, B → C, AC → D} can be simplified to
{A → B, B → C, A → D}
• Intuitively, a canonical cover of F is a “minimal” set of
functional dependencies equivalent to F, having no
redundant dependencies or redundant parts of
dependencies.
Example of Computing a Canonical Cover
• R = (A, B, C)
F = {A → BC
B→ C
A→ B
AB → C}
• Combine A → BC and A → B into A → BC
– Set is now {A → BC, B → C, AB → C}
• A is extraneous in AB → C
– Check if the result of deleting A from AB → C is implied by the other
dependencies
• Yes: in fact, B → C is already present!
– Set is now {A → BC, B → C}
• C is extraneous in A → BC
– Check if A → C is logically implied by A → B and the other dependencies
• Yes: using transitivity on A → B and B → C.
– Can use attribute closure of A in more complex cases
• The canonical cover is: A→ B
B→ C
Decomposition
• A relation R can be decomposed into a collection of
relation schema {R1, R2, ...,Rn} to eliminate some of
the anomalies contained in the original relation R.
• R = R1 U R2 U …..U Rn
each Ri is a subset of R ( for i = 1,2…,n)
e.g:
• For relation R(x, y, z) there can be 2 subsets:
R1(x,z) and R2(y,z)

• If we union R1 and R2, we get R


R = R1 U R2
Desirable properties of Decomposition
Decomposition must be :

• Loss-less join decomposition


• Dependency preserving
Example : Problem with Decomposition
R
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon

R1 R2
Model Name Category Price Category

a11 Canon 100 Canon

s20 Nikon 200 Nikon

a70 Canon 150 Canon


Example : Problem with Decomposition
Model Name Price Category
R1 U R2 a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon

Model Name Price Category


R
a11 100 Canon
s20 200 Nikon
a70 150 Canon
Lossy decomposition

• In previous example, additional tuples are obtained along


with original tuples
• Although there are more tuples, this leads to less
information
• Due to the loss of information, decomposition for
previous example is called lossy decomposition or
lossy-join decomposition
Loss-less join decomposition
R : relation
F : set of functional dependencies on R
R1,R2 : decomposition of R
Decomposition is lossles if :
– R1∩ R2 → R1,
– that is: all attributes common to both R1 and R2
functionally determine ALL the attributes in R1.
OR
– R1 ∩ R2 → R2,
– that is: all attributes common to both R1 and R2
functionally determine ALL the attributes in R2.
Loss-less join decomposition

• In other words, if R1 ∩ R2 forms a superkey of


either R1 or R2 , then decomposition of R is a
lossless decomposition.
• Example-1
R = (A, B, C) F = {A -> B}
R1 = (A, B), R2 = (A, C)
R1∩ R2 = A,
R1- R2 = B
check A -> B in F ? Yes. Therefore lossless
Example-2
R = (A, B, C) F = {A -> B}
R1 = (A, B), R2 = (B, C)

R1∩ R2 = B,
R1 - R2 = A , R2 - R1= C
check B -> A in F ? NO
check B -> C in F ? NO
So, this is lossy join.
Dependency Preservation Decomposition
• Definition: Each FD specified in F either appears directly in
one of the relations in the decomposition, or be inferred from
FDs that appear in some relation.
Test of Dependency Preservation
• If a decomposition is not dependency-preserving,
some dependency is lost in the decomposition.
• One way to verify that a dependency is not lost is to
take joins of two or more relations in the
decomposition to get a relation that contains all of the
attributes in the dependency under consideration and
then check that the dependency holds on the result of
the joins.
Dependency Preserving Example
• Consider relation R = (A, B, C, D ), with FD’s :
A ->B, B ->C, C ->D
- Decompose into two relations: R1= (A, B, C)
and R2=(C, D).
- R1 supports the FD’s A->B, B->C.
- R2 supports the FD C->D.
- All the original dependencies are preserved.
Non-Dependency Preserving Example
• Consider relation R=(A,B, C, D), with FD’s:
A ->B, B ->C, C->D
- Decompose into two relations: R1= (A, C, D) and
R2=(B, C)..
- ACD supports the FD C-> D and implied FD A ->C.
- BC supports the FD B->C.
- However, no relation supports A ->B.
- So the dependency is not preserved.
Boyce-Codd Normal Form (BCNF)
• A database design is said to be BCNF if both of the
following conditions are satisfied by it
– Table must be in 3NF
– For any non-trivial functional dependency X->A, X must be
the a super key.

• In other words: every determinant in a non-trivial


dependency is a (super) key.

• It is same as 3NF except in 3NF we only worry about non-key


Bs
• If there is only one candidate key then 3NF and BCNF are
the same.
3NF Vs BCNF
Basis for
Comparison 3NF BCNF

For any trivial


No non-prime attribute
must be transitively dependency in a
Concept relation R say X->Y, X
dependent on the
Candidate key. should be a super key
of relation R.

3NF can be obtained Dependencies may not


Dependency without sacrificing all be preserved in
dependencies. BCNF.

Lossless decomposition
Decomposition Lossless decomposition is hard to achieve in
can be achieved in 3NF. BCNF.
Example of BCNF

R(A, B, C, D, F)
F= { A-> BCDF , BF-> ACD, D-> F}

• we can say that A and BF are candidate keys of relation R,


because they alone can search the value for all attributes in the
relation R.
• But one functional dependency i.e. D -> F is violating the
definition of BCNF, according to which, if D -> F exist then D
should be the super key which is not the case here
Decompose relation

• Now the tables R1 nd R2 are in BCNF. Relation R1 has two


candidate keys A and B, the trivial functional dependency of R1 i.e.
A-> BCD and B -> ACD, hold for BCNF as A and B are the super
keys for relation.
•Relation R2 has D as its candidate key and the functional
dependency D -> F also holds for BCNF as D is a Super Key.
Fourth Normal Form (4NF)
• Fourth normal form eliminates independent many-to-one
relationships between columns.
• To be in Fourth Normal Form,
– a relation must first be in 3NF or Boyce-Codd Normal
Form.
– There are no non-trivial multi-valued dependencies
• Multi-Valued Dependency (MVD): MVD is the dependency
where one attribute value is potentially a 'multi-valued fact'
about another.
MVD Example
Pets_info(name, addr, phones, petsLiked)
• A person’s phones are independent of the pets they like.
name phones and
name petsLiked
• Thus, each of a person’s phones appears with each of the
pets they like in all combinations.

Name Addr Phones petsLiked


Smith A m1 p1
Smith A m2 p1
Smith A m1 p2
Smith A m2 p2
4NF Decomposition Example
Pets_info(name, addr, phones, petsLiked)

FD: nameaddr
MVD’s: namephones
namepetsLiked
• Key is {name, phones, petsLiked}.
• All dependencies violate 4NF.
4NF Decomposition Example (cont.)
• Decompose using nameaddr:
1. person(name, addr)
– In 4NF; only dependency is nameaddr.
2. pet1(name, phones, petsLiked)
– Not in 4NF. MVD’s namephones and
namepetsLiked apply.
– No FD’s, so all three attributes form the key.
3. Decompose pet1:
• pet2(name, phones)
• pet3(name, petsLiked)
Fifth Normal Form (5NF)
A database is said to be in 5NF, if and only if:
• It is in 4NF.
• If we can decompose table further to eliminate redundancy and
anomaly, and when we re-join the decomposed tables by means of
candidate keys, we should not be losing the original data or any
new record set should not arise. In simple words, joining two or
more decomposed table should not lose records nor create new
records.
OR
A relation R is in fifth normal form (5NF) – also called
projection-join normal form (PJ/NF) if and only if every join
dependency in R is a consequence of the candidate keys of R.
Subject Lecturer Semester
Maths Alex SEM-1
Maths Rose SEM-1
Physics Rose SEM-1
Physics Joseph SEM-2
Chemistry Kedar SEM-1

Table in 5NF
Subject Lecturer Semester Lecturer
Maths Alex SEM-1 Alex
Maths Rose SEM-1 Rose
Physics Rose SEM-1 Rose
Physics Joseph SEM-2 Joseph
Chemistry Kedar SEM-1 Kedar

Semester Subject
SEM-1 Maths
SEM-1 Maths
SEM-1 Physics
SEM-2 Physics
SEM-1 Chemistry

You might also like