+
Database
Normalization
Housekeeping
Assignment 1 marked
Assignment 1 & 2 solutions will be posted this morning
Quiz 1 pick up at end of class
Quiz 2 will mark before Fridays class (return on Fri)
Mid-term: Monday, February 18th
If cant make it, or need accommodations, email me NOW
Coverage: Database design: ER diagrams, conversion to
tables, normalization, relational algebra
Closed book
11/11/14
+
Fundamentals of Database
Management Systems
by
Mark L. Gillenson, Ph.D.
University of Memphis
John Wiley & Sons, Inc.
Chapter
7
Logical
Database
Design
7-4
Chapter Objectives
Describe the data normalization process
Perform the data normalization process.
Test tables for irregularities using the data
normalization process.
7-5
Logical Database Design
The process of deciding how to arrange the attributes
of the entities in the business environment into
database structures, such as the tables of a relational
database.
The goal is to create well structured tables that
properly reflect the companys business environment.
7-6
Logical Design of Relational
Database Systems
(1) The conversion of E-R diagrams into relational
tables.
(2) The data normalization technique.
(3) The use of the data normalization technique to test
the tables resulting from the E-R diagram conversions.
7-7
The Data Normalization Process
A methodology for organizing attributes into tables so
that redundancy among the nonkey attributes is
eliminated.
The output of the data normalization process is a
properly structured relational database.
7-8
The Data Normalization
Technique
Input:
all
the attributes that must be incorporated into
the database
list of all the defining associations between the
attributes (i.e., the functional dependencies).
a means of expressing that the value of one particular
attribute is associated with a single, specific value of
another attribute.
If we know that one of these attributes has a particular
value, then the other attribute must have some other
value.
General Hardware Company
Database
7-9
7-10
Functional
Dependence
Salesperson Number
Salesperson
Salesperson Name
Number is the determinant.
The
value of Salesperson Number determines
the value of Salesperson Name.
Salesperson
Name is functionally
dependent on Salesperson Number.
7-11
General Hardware Environment:
SALESPERSON and PRODUCT
What
are the
functional
dependencies?
+
General
Hardware Environment:
SALESPERSON and PRODUCT
7-12
+ Full Functional dependency:
If
A and B are attributes(columns)of a table, B is fully functionally dependent
on A if B is functionally dependent on A, but not on any proper subset of A.
SalesPerson#--SalesPersonName
Partial Functional Dependency:
If
A and B are attributes of a table, B is partially dependent on A if there is
some attribute that can be removed from A and yet the dependency still
holds.
SP#, SPName -------> Comm%
Comm% is functionally dependent on a subset of A (SP#, SPName),
namely SP#.
Transitive Functional Dependency:
A
, B and C are attributes of a table. If A is functionally dependent on B, and
B is functionally dependent on C, then C is Transitively dependent on A via B.
SP#---->SPName, Comm%, YearOfHire, Office#
Office#---Telephone#
SP# attribute functionally determines Telephone# via Office#
attribute.
11/11/14
+ Steps in the Data Normalization
Process
7-14
7-15
The Data Normalization Process
Once the attributes are arranged in third normal form,
the group of tables that they comprise is a wellstructured relational database with no data
redundancy.
Subsequently [Link] and [Link] introduced a stronger
definition of 3NF called Boyce-Codd Normal Form(BCNF).
With the exception of 1NF, all these normal forms are based on
Functional dependencies among the attributes of a table. Higher
normal forms that go beyond BCNF were introduced later such as
Fourth Normal Form (4NF) and Fifth Normal Form (5NF). However
these later normal forms deal with situations that are very rare.
A group of tables is said to be in a particular normal
form if every table in the group is in that normal form.
The data normalization process is progressive.
For example, if a group of tables is in second normal form, it
is also in first normal form.
7-16
General Hardware Company:
Unnormalized Data
Sales-person
Number
Product
Number
137
19440
24013
26722
186
16386
19440
21765
24013
204
21765
26722
361
16386
21765
26722
Sales-person
Name
Commission
Percentage
Baker
10
Adams
Dickens
Carlyle
Year Depart-ment
of
Number
Manager Product
Hire
Name
Name
1995
73 Scott
Hammer
Saw
Pliers
15 2001
59 Lopez
Wrench
Hammer
Drill
Saw
10 1998
73 Scott
Drill
Pliers
20 2001
73 Scott
Wrench
Drill
Pliers
Unit
Price Quantity
17.50
473
26.25
170
11.50
688
12.95
1745
17.50
2529
32.99
1962
26.25
3071
32.99
809
11.50
734
12.95
3729
32.99
3110
11.50
2738
SALESPERSON/PRODUCT Table
Records contain multivalued attributes.
General Hardware Company: First
Normal Form
Sales-person Product Sales-person Commission Year Depart-ment Manager Product Unit Quantity
Number
Number Name
Percentage
of
Number
Name
Name
Price
Hire
SALESPERSON/PRODUCT Table
The only thing that is required for a table to be in 1NF is
to contain only atomic values (intersection of each row
and column should contain one and only one value). This
is sometimes referred to as: Eliminate Repeating groups.
The attributes under consideration have been listed in
one table, and a primary key has been established.
The number of records has been increased so that every
attribute of every record has just one value.
The multivalued attributes have been eliminated.
7-17
7-18
General Hardware Company:
First Normal Form
Sales-person
Number
Product
Number
137
19440
137
24013
137
26722
186
16386
186
19440
186
21765
186
24013
204
21765
204
26722
361
16386
361
21765
361
26722
Sales-person
Name
Commission
Percentage
Baker
10
Baker
10
Baker
10
Adams
15
Adams
15
Adams
15
Adams
15
Dickens
10
Dickens
10
Carlyle
20
Carlyle
20
Carlyle
20
SALESPERSON/PRODUCT Table
Year Depart-ment
of
Number
Manager Product
Hire
Name
Name
1995
73 Scott
Hammer
1995
73 Scott
Saw
1995
73 Scott
Pliers
2001
59 Lopez
Wrench
2001
59 Lopez
Hammer
2001
59 Lopez
Drill
2001
59 Lopez
Saw
1998
73 Scott
Drill
1998
73 Scott
Pliers
2001
73 Scott
Wrench
2001
73 Scott
Drill
2001
73 Scott
Pliers
Unit
Price Quantity
17.50
473
26.25
170
11.50
688
12.95
1745
17.50
2529
32.99
1962
26.25
3071
32.99
809
11.50
734
12.95
3729
32.99
3110
11.50
2738
7-19
General Hardware Company:
First Normal Form
First
normal form is merely a starting
point in the normalization process.
First
normal form contains a great deal
of data redundancy.
Three
records involve salesperson 137, so
there are three places in which his name is
listed as Baker, his commission percentage
is listed as 10, and so on.
Two records involve product 19440 and this
products name is listed twice as Hammer
and its unit price is listed twice as 17.50.
7-20
Second Normal Form
A Table is said to be in 2NF if it is in 1NF and there are
no partial dependencies
No Partial Functional Dependencies
Every non primary key attribute of the table must be fully
functionally dependent on the entire primary key of that
table.
A non-key attribute cannot depend on only part of the key.
General Hardware Company:
Second Normal Form
Salesperson Salesperson Commission Year Department Manager
Number
Name
Percentage of
Number
Name
Hire
SALESPERSON Table
In SALESPERSON, Salesperson Number is the sole
primary key attribute. Every nonkey attribute of the
table is fully defined just by Salesperson Number.
Similar logic for PRODUCT and QUANTITY tables.
7-21
General Hardware Company:
Second Normal Form
Salesperson Salesperson Commission Year Department
Number
Name
Percentage of
Number
Hire
137 Baker
10 1995
73
186 Adams
15 2001
59
204 Dickens
10 1998
73
361 Carlyle
20 2001
73
SALESPERSON Table
Manager
Name
Scott
Lopez
Scott
Scott
7-22
7-23
Third Normal Form
A Table that is in 1NF and 2NF and in which no non
primary key attribute is transitively dependent on the
primary key.
Does not allow transitive dependencies in which one
nonkey attribute is functionally dependent on another.
Nonkey attributes are not allowed to define other
nonkey attributes.
"Each attribute must be a fact about the key, the whole
key, and nothing but the key."
General Hardware Company: Third
Normal Form
7-24
General Hardware Company: Third
Normal Form
7-25
7-26
General Hardware Company:
Third Normal Form
Important
points about the third normal
form structure are:
It is completely free of data redundancy.
All
foreign keys appear where needed to
logically tie together related tables.
It
is the same structure that would have
been derived from a properly drawn entityrelationship diagram of the same business
environment.
Recap + clarification
0NF:
Unstructured data, can have multi-valued attributes
1NF:
Atomic values (one per column of the record)
No duplicate rows (implies there is a key)
As we move forward need to consider functional
dependencies and determine candidate keys
Note: earlier slides simplified this slightly by saying choose a
primary key, but normalization is actually about candidate
keys
For the higher forms of normalization, we need to consider the
case where there is not a single option for the primary key
11/11/14
Definitions
Superkey: a combination of attributes that can be
used to uniquely identify a row in a database. The
trivial superkey is all attributes.
Candidate key: a minimal superkey all attributes are
necessary to uniquely identify the record
Primary key: one candidate key, arbitrarily chosen
Prime attribute: an attribute that occurs in some
candidate key
Non prime attribute: an attribute that does not occur
in any candidate key
11/11/14
2NF (revisited)
A Table is said to be in 2NF if it is in 1NF and there are no
partial dependencies
No Partial Functional Dependencies
Every non primary key attribute of the table must be fully
functionally dependent on the entire primary key of that
table.
A non-key attribute cannot depend on only part of the key.
No Partial Functional Dependences
Every non prime key attribute of the table must be fully
functionally dependent on the entire key of one of the
candidate keys in the table.
An non-prime attribute cannot depend on only part of one of
the candidate keys.
11/11/14
3NF revisited
A Table that is in 1NF and 2NF and in which no non
primary key attribute is transitively dependent on the
primary key.
Old:
Does not allow transitive dependencies in which one nonkey attribute is functionally dependent on another.
Nonkey attributes are not allowed to define other nonkey
attributes.
New:
Every non-prime attribute is non-transitively dependent on
every candidate key in the table.
The attributes that do not directly contribute to the
description of the candidate keys are removed from the
table. In other words, no transitive dependency is allowed.
11/11/14
"Each non-key attribute must
provide a fact about the key,
the whole key, and nothing but
the key.
The key: 1NF we have a table of related items, with
each row unique, with atomic values
The whole key: 2NF no partial dependences on the
candidate keys
Nothing but the key: 3NF no transitive dependencies
11/11/14
7-32
Boyce-codd Normal Form
(BCNF)
A Table is in BCNF if and only if every determinant (i.e.,
the attribute or a group of attributes on which some
other attribute is fully functionally dependent) is a
candidate key. BCNF is a stronger form of 3NF.
The
difference between 3NF and BCNF is that for a
Functional dependency A--->B, 3NF allows this
dependency in a table if attribute B is a primary key
attribute and attribute A is not a candidate key, where
as BCNF insists that for this dependency to remain in a
table, attribute A must be a candidate key.
Only
in rare cases does a 3NF table not meet the
requirements of BCNF. A 3NF table which does not
have multiple overlapping candidate keys is
guaranteed to be in BCNF. Depending on what its
functional dependencies are, a 3NF table with two or
more overlapping candidate keys may or may not be in
BCNF
Example
of
3NF
table
not
in
+
Today's Court
Court
StartTime EndTime
BCNF
Bookings
RateType
09:30
10:30
SAVER
11:00
12:00
SAVER
14:00
15:30
STANDARD
10:00
11:30
PREMIUM-B
11:30
13:30
PREMIUM-B
Each row in the table represents a court booking at a tennis club that has one
16:30
hard court (Court 1)2and one grass15:00
court (Court 2).
A booking is PREMIUM-A
defined by its
Court and the period for which the Court is reserved Additionally, each
booking has a Rate Type associated with it. There are four distinct rate types:
SAVER, for Court 1 bookings made by members
STANDARD, for Court 1 bookings made by non-members
PREMIUM-A, for Court 2 bookings made by members
PREMIUM-B, for Court 2 bookings made by non-members
[Link]
11/11/14
What is the problem?
Identify the functional dependencies
Identify the tables super keys
Determine the candidate keys
Why isnt the table in BCNF?
A Table is in BCNF if and only if every determinant is a
candidate key.
For all functional dependencies A -> B, is A a candidate
key?
11/11/14
Fourth Normal Form (4NF)
4NF is a stronger normal form than 3NF/BCNF as it
prevents Tables from containing nontrivial Multi-Valued
Dependencies (MVDs) and hence data redundancy.
The Normalization of BCNF Tables to 4NF involves the
removal of MVDs from the Table by placing the
attribute(s) in a new Table along with the copy of the
determinant(s).
[Link]
11/11/14
Pizza Delivery Example not in 4NF
Restaurant
Pizza Variety Delivery Area
A1 Pizza
Thick Crust
Springfield
A1 Pizza
Thick Crust
Shelbyville
A1 Pizza
Thick Crust
Capital City
A1 Pizza
Stuffed Crust Springfield
A1 Pizza
Stuffed Crust Shelbyville
A1 Pizza
Stuffed Crust Capital City
Elite Pizza
Thin Crust
Elite Pizza
Stuffed Crust Capital City
Capital City
Vincenzo's Pizza
Thick Crust
Springfield
Vincenzo's Pizza
Thick Crust
Shelbyville
Vincenzo's Pizza
Thin Crust
Springfield
Vincenzo's Pizza
Thin Crust
Shelbyville
11/11/14
Fifth Normal Form (5NF)
Also known as project-join normal form (PJ/NF)
Designed to reduce redundancy in relational databases
recording multi-valued facts by isolating semantically
related multiple relationships.
A table is said to be in the 5NF if and only if every join
dependency in it is implied by the candidate keys.
[Link]
11/11/14
Traveling Salesman Product Availability By Brand
Traveling Salesman
Brand Product Type
Jack Schneider
Acme Vacuum Cleaner
Jack Schneider
Acme Breadbox
Willy Loman
Robusto
Pruning Shears
Willy Loman
Robusto
Vacuum Cleaner
Willy Loman
Robusto
Breadbox
Willy Loman
Robusto
Umbrella Stand
Louis Ferguson
Robusto
Vacuum Cleaner
Louis Ferguson
Robusto
Telescope
Louis Ferguson
Acme Vacuum Cleaner
Louis Ferguson
Acme Lava Lamp
Louis Ferguson
Nimbus
Tie Rack
Not in 5NF if the Salesman must offer only products of Type
P made by Brand B if that product type and brand is in his
repetoire
11/11/14
+ Steps in the Data Normalization
Process
7-39
Other good resources:
[Link]
[Link]
Todays handout includes the explanations of each
normalization step for the example highlighted in the
poster
11/11/14
General Hardware Company:
Functional Dependencies
Salesperson Number
Salesperson Name
Salesperson Number
Commission Percentage
Salesperson Number
Year of Hire
Salesperson Number
Department Nu mber
Salesperson Number
Manager Name
Customer Number
Customer Name
Customer Number
Salesperson Number
Customer Number
HQ City
Customer Number, Employee Number
Employee Name
Customer Number, Employee Number
Title
Product Number
Product Name
Product Number
Unit Price
Department Number
Manager Name
Salesperson Number, Product Number
Quantity
Office Number
Telephone
Office Number
Salesperson Number
Office Number
Size
7-41
General Hardware Company: First
Normal Form
Salesperson Customer Employee Product Office Salesperson Commission
Number
Number Number
Number Number Name
Percentage
Year Department Manager Customer
Employee
Product
of
Number
Name
Name
HQ Name
Title Name
Hire
City
7-42
7-43
World Music Association:
Functional Dependencies
7-44
Lucky Rent-A-Car:
Functional Dependencies