CENG 3005
Database Management Systems
Week 9
• Normal Forms (continued)
1
Normalization
1NF (atomic, primary key):
{Order, Product, Customer, Address, Quantity,
UnitPrice}
2NF (no partial dependency on key)
{Product, UnitPrice}
{Order, Product, Quantity}
{Order, Customer, Address}
3NF (no transitive dependence on a key)
{Product, UnitPrice}
{Order, Product, Quantity}
{Order, Customer}
{Customer, Address}
Normalization of DB Tables
Normalization
– Process for evaluating and correcting table structures
• determines the optimal assignments of attributes to entities
– Normalization provides micro view of entities
• focuses on characteristics of specific entities
• may yield additional entities
– Works through a series of stages called normal forms
• 1NF 2NF 3NF 4NF (optional)
– Higher the normal form, slower the database response
• more joins are required to answer end-user queries
So if the response is slow, why do the database people
normalize?
1. Reduce uncontrolled data redundancies
• Help eliminate data anomalies
2. Produce controlled redundancies to link tables
Redundancy
Dependencies between attributes cause
redundancy
– Eg. All addresses in the same town have
the same zip code
SSN Name Town Zip
Redundancy
1234 Joe Stony Brook 11790
4321 Mary Stony Brook 11790
5454 Tom Stony Brook 11790
………………….
4
Example
ER Model
SSN Name Address Hobby
1111 Joe 123 Main {biking, hiking}
Relational Model
SSN Name Address Hobby
1111 Joe 123 Main biking
1111 Joe 123 Main hiking
…………….
Redundancy
Anomalies
Redundancy leads to anomalies:
– Update anomaly: If you need to change Address,
you must change in multiple places (columns/tables)
– Deletion anomaly: Suppose a person gives up all
hobbies. Do we:
• Set Hobby attribute to null? No, you can’t,
because Hobby is part of key
• Delete the entire row? No, since we lose other
information in the row
– Insertion anomaly: Hobby value must be supplied
for any inserted row since Hobby is part of key
Decomposition of Tables
Solution: use two relations to store
Person information
– Person1 (SSN, Name, Address)
– Hobbies (SSN, Hobby)
The decomposition is more general:
people with hobbies can now be
described
No update anomalies:
– Name and address stored once
– A hobby can be separately supplied
or deleted
What if you need to combine
tables?
Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number,
amount )
Result might cause a possible repetition of
information (L-100 in example below)
A Combined Schema Without
Repetition
Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount,
branch_name)
No repetition (as suggested by example below)
How to decide whether to
split into smaller tables?
Suppose we had started with bor_loan. How would we know to
split up (decompose) it into borrower and loan?
Write a rule “if there were a schema (loan_number, amount),
then loan_number would be a candidate key”
Denote as a functional dependency:
loan_number amount
In bor_loan, because loan_number is not a candidate key, the
amount of a loan may have to be repeated. This indicates the
need to decompose bor_loan.
Not all decompositions are good. Suppose we decompose
employee into
employee1 = (employee_id, employee_name)
employee2 = (employee_name, telephone_number, start_date)
The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is a
lossy decomposition.
1
A Lossy Decomposition
1
First Normal Form
Domain is atomic if its elements are considered to be
indivisible units
– Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up
into parts
A relational schema R is in first normal form if the
domains of all attributes of R are atomic
Non-atomic values complicate storage and encourage
redundant (repeated) storage of data
– Example: Set of accounts stored with each customer, and set
of owners stored with each account
– We assume all relations are in first normal form
1
First Normal Form (Cont’d)
Atomicity is actually a property of how the
elements of the domain are used.
– Example: Strings would normally be considered indivisible
– Suppose that students are given roll numbers which are strings
of the form CS0012 or EE1127
– If the first two characters are extracted to find the department,
the domain of roll numbers is not atomic.
– This leads to encoding of information in application program
rather than in the database.
1
Goal — Devise a Theory for the
Following
Decide whether a particular relation R is in
“good” form.
If a relation R is not in “good” form, decompose
it into a set of relations {R1, R2, ..., Rn} such that
– each relation is in good form
– the decomposition is a lossless-join decomposition
Our theory is based on:
– functional dependencies
– multivalued dependencies
1
How to determine Primary Key
using Functional Dependencies
Watch the video!!!
[Link]
w
1
Normalization Forms
(simple simple simple)
“Data depends on the key
[1NF]
the whole key
[2NF]
and nothing but the key
[3NF]”
“If all the arrows in FDs are out of a candidate
key” [BCNF]
1
First normal form (1NF)
First normal form (1NF)
– All data values are atomic
– Each row is unique (has a primary key)
Step by step 1NF:
1. Eliminate repeating groups, eliminate all-null columns
2. Identify the Primary Key –Primary key must uniquely identify
attribute value –New key must be composed
3. Identify All Dependencies –Dependencies can be depicted
with help of a dependency diagram
1
Normalization of DB Tables
Normalization
– Process for evaluating and correcting table structures
• determines the optimal assignments of attributes to entities
– Normalization provides micro view of entities
• focuses on characteristics of specific entities
• may yield additional entities
– Works through a series of stages called normal forms
• 1NF 2NF 3NF 4NF (optional)
– Higher the normal form, slower the database response
• more joins are required to answer end-user queries
Why normalize?
– Reduce uncontrolled data redundancies
• Help eliminate data anomalies
– Produce controlled redundancies to link tables
1
1
Example: Need for Normalization
PRO_NUM is intended to be primary key but contain nulls
Table entries invite data inconsistencies
– e.g. “Elect. Engineer”, “[Link].”, “EE”
Table displays data redundancies that can cause data anomalies
– Update anomalies
• Modifying JOB_CLASS could require many alterations (all the rows for the same EMP_NUM)
– Insertion anomalies
• New employee must be assigned a project
– Deletion anomalies
• If employee quits and a row deleted, other vital data may get lost
1
1
Normalization: First Normal Form
First Normal Form (1NF)
– All the primary key attributes are defined
– There are no repeating groups
– All attributes are dependent on the primary key
Conversion to 1NF
– Objective
• Develop a proper primary key
– Steps
1. Eliminate repeating groups
– fill in the null cells with appropriate data value
2. Identify primary key
– identify attribute(s) that uniquely identifies each row
3. Identify all dependencies
– make sure all attributes are dependent on the primary key
2
2
Normalization: 1NF example
1. Eliminate repeating groups - Fill in the null cells to make each row define a single entity
2. Identify the primary key - Make sure all attributes are dependent on the primary key
2
2
Normalization: 1NF example
3. Identify all dependencies (in a Dependency Table)
– Desirable dependencies (arrows above)
• based on primary key (functional dependency)
– Less desirable dependencies (arrows below)
• Partial dependency
– based on part of composite primary key
• Transitive dependency
– one nonprime attribute depends on another nonprime attribute
• Subject to data redundancies and anomalies
2
2
Normalization: Second Normal
Form
Second Normal Form (2NF)
– It is in 1NF
– There are no partial dependencies
Conversion to 2NF
– Objective
• Eliminate partial dependencies
– Steps
1. Start with 1NF format
2. Write each key component (w/ partial dependency) on
separate line
3. Write original (composite) key on last line
4. Each component is new table
5. Write dependent attributes after each key
1NF (PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,
HOURS)
PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) 2
Normalization: 2NF example
2
Normalization: Third Normal
Form
Third Normal Form (3NF)
– It is in 2NF
– There are no transitive dependencies
Conversion to 3NF
– Objective
• Eliminate transitive dependencies (TD)
– Steps
1. Start with 2NF format
2. Break off the TD pieces and create separate tables
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)
2
Normalization: 3NF example
2
2
2