0% found this document useful (0 votes)
8 views27 pages

Database Normalization Techniques Explained

The document discusses normalization in database management, detailing the different normal forms (1NF, 2NF, 3NF) and their significance in reducing data redundancy and anomalies. It explains the processes for achieving each normal form, including eliminating repeating groups, identifying primary keys, and addressing dependencies. The document emphasizes the importance of normalization for maintaining data integrity and optimizing database structure.

Uploaded by

clafedersin
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

Database Normalization Techniques Explained

The document discusses normalization in database management, detailing the different normal forms (1NF, 2NF, 3NF) and their significance in reducing data redundancy and anomalies. It explains the processes for achieving each normal form, including eliminating repeating groups, identifying primary keys, and addressing dependencies. The document emphasizes the importance of normalization for maintaining data integrity and optimizing database structure.

Uploaded by

clafedersin
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CENG 3005

Database Management Systems


Week 9

• Normal Forms (continued)

1
Normalization
 1NF (atomic, primary key):
 {Order, Product, Customer, Address, Quantity,
UnitPrice}
 2NF (no partial dependency on key)
 {Product, UnitPrice}
 {Order, Product, Quantity}
 {Order, Customer, Address}

 3NF (no transitive dependence on a key)


 {Product, UnitPrice}
 {Order, Product, Quantity}
 {Order, Customer}
 {Customer, Address}
Normalization of DB Tables
 Normalization
– Process for evaluating and correcting table structures
• determines the optimal assignments of attributes to entities
– Normalization provides micro view of entities
• focuses on characteristics of specific entities
• may yield additional entities
– Works through a series of stages called normal forms
• 1NF  2NF  3NF  4NF (optional)
– Higher the normal form, slower the database response
• more joins are required to answer end-user queries

 So if the response is slow, why do the database people


normalize?
1. Reduce uncontrolled data redundancies
• Help eliminate data anomalies
2. Produce controlled redundancies to link tables
Redundancy

 Dependencies between attributes cause


redundancy
– Eg. All addresses in the same town have
the same zip code

SSN Name Town Zip


Redundancy
1234 Joe Stony Brook 11790
4321 Mary Stony Brook 11790
5454 Tom Stony Brook 11790
………………….

4
Example

ER Model
SSN Name Address Hobby
1111 Joe 123 Main {biking, hiking}

Relational Model
SSN Name Address Hobby
1111 Joe 123 Main biking
1111 Joe 123 Main hiking
…………….
Redundancy
Anomalies
Redundancy leads to anomalies:
– Update anomaly: If you need to change Address,
you must change in multiple places (columns/tables)
– Deletion anomaly: Suppose a person gives up all
hobbies. Do we:
• Set Hobby attribute to null? No, you can’t,
because Hobby is part of key
• Delete the entire row? No, since we lose other
information in the row
– Insertion anomaly: Hobby value must be supplied
for any inserted row since Hobby is part of key
Decomposition of Tables
 Solution: use two relations to store
Person information
– Person1 (SSN, Name, Address)
– Hobbies (SSN, Hobby)
 The decomposition is more general:
people with hobbies can now be
described
 No update anomalies:
– Name and address stored once
– A hobby can be separately supplied
or deleted
What if you need to combine
tables?
 Suppose we combine borrower and loan to get
bor_loan = (customer_id, loan_number,
amount )
 Result might cause a possible repetition of
information (L-100 in example below)
A Combined Schema Without
Repetition
 Consider combining loan_branch and loan
loan_amt_br = (loan_number, amount,
branch_name)
 No repetition (as suggested by example below)
How to decide whether to
split into smaller tables?
 Suppose we had started with bor_loan. How would we know to
split up (decompose) it into borrower and loan?
 Write a rule “if there were a schema (loan_number, amount),
then loan_number would be a candidate key”
 Denote as a functional dependency:
loan_number  amount
 In bor_loan, because loan_number is not a candidate key, the
amount of a loan may have to be repeated. This indicates the
need to decompose bor_loan.
 Not all decompositions are good. Suppose we decompose
employee into
employee1 = (employee_id, employee_name)
employee2 = (employee_name, telephone_number, start_date)
 The next slide shows how we lose information -- we cannot
reconstruct the original employee relation -- and so, this is a
lossy decomposition.
1
A Lossy Decomposition

1
First Normal Form
 Domain is atomic if its elements are considered to be
indivisible units
– Examples of non-atomic domains:
• Set of names, composite attributes
• Identification numbers like CS101 that can be broken up
into parts
 A relational schema R is in first normal form if the
domains of all attributes of R are atomic
 Non-atomic values complicate storage and encourage
redundant (repeated) storage of data
– Example: Set of accounts stored with each customer, and set
of owners stored with each account
– We assume all relations are in first normal form

1
First Normal Form (Cont’d)
 Atomicity is actually a property of how the
elements of the domain are used.
– Example: Strings would normally be considered indivisible
– Suppose that students are given roll numbers which are strings
of the form CS0012 or EE1127
– If the first two characters are extracted to find the department,
the domain of roll numbers is not atomic.
– This leads to encoding of information in application program
rather than in the database.

1
Goal — Devise a Theory for the
Following
 Decide whether a particular relation R is in
“good” form.
 If a relation R is not in “good” form, decompose
it into a set of relations {R1, R2, ..., Rn} such that
– each relation is in good form
– the decomposition is a lossless-join decomposition
 Our theory is based on:
– functional dependencies
– multivalued dependencies

1
How to determine Primary Key
using Functional Dependencies
 Watch the video!!!
[Link]
w

1
Normalization Forms
(simple simple simple)

“Data depends on the key


[1NF]
the whole key
[2NF]
and nothing but the key
[3NF]”

“If all the arrows in FDs are out of a candidate


key” [BCNF]
1
First normal form (1NF)
 First normal form (1NF)
– All data values are atomic
– Each row is unique (has a primary key)
 Step by step 1NF:
1. Eliminate repeating groups, eliminate all-null columns
2. Identify the Primary Key –Primary key must uniquely identify
attribute value –New key must be composed
3. Identify All Dependencies –Dependencies can be depicted
with help of a dependency diagram

1
Normalization of DB Tables
 Normalization
– Process for evaluating and correcting table structures
• determines the optimal assignments of attributes to entities
– Normalization provides micro view of entities
• focuses on characteristics of specific entities
• may yield additional entities
– Works through a series of stages called normal forms
• 1NF  2NF  3NF  4NF (optional)
– Higher the normal form, slower the database response
• more joins are required to answer end-user queries

 Why normalize?
– Reduce uncontrolled data redundancies
• Help eliminate data anomalies
– Produce controlled redundancies to link tables

1
1
Example: Need for Normalization
 PRO_NUM is intended to be primary key but contain nulls
 Table entries invite data inconsistencies
– e.g. “Elect. Engineer”, “[Link].”, “EE”
 Table displays data redundancies that can cause data anomalies
– Update anomalies
• Modifying JOB_CLASS could require many alterations (all the rows for the same EMP_NUM)
– Insertion anomalies
• New employee must be assigned a project
– Deletion anomalies
• If employee quits and a row deleted, other vital data may get lost

1
1
Normalization: First Normal Form
 First Normal Form (1NF)
– All the primary key attributes are defined
– There are no repeating groups
– All attributes are dependent on the primary key

 Conversion to 1NF
– Objective
• Develop a proper primary key
– Steps
1. Eliminate repeating groups
– fill in the null cells with appropriate data value
2. Identify primary key
– identify attribute(s) that uniquely identifies each row
3. Identify all dependencies
– make sure all attributes are dependent on the primary key

2
2
Normalization: 1NF example
1. Eliminate repeating groups - Fill in the null cells to make each row define a single entity
2. Identify the primary key - Make sure all attributes are dependent on the primary key

2
2
Normalization: 1NF example
3. Identify all dependencies (in a Dependency Table)
– Desirable dependencies (arrows above)
• based on primary key (functional dependency)
– Less desirable dependencies (arrows below)
• Partial dependency
– based on part of composite primary key
• Transitive dependency
– one nonprime attribute depends on another nonprime attribute
• Subject to data redundancies and anomalies

2
2
Normalization: Second Normal

Form
Second Normal Form (2NF)
– It is in 1NF
– There are no partial dependencies

 Conversion to 2NF
– Objective
• Eliminate partial dependencies
– Steps
1. Start with 1NF format
2. Write each key component (w/ partial dependency) on
separate line
3. Write original (composite) key on last line
4. Each component is new table
5. Write dependent attributes after each key

1NF (PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,


HOURS)

PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR) 2
Normalization: 2NF example

2
Normalization: Third Normal
Form
 Third Normal Form (3NF)
– It is in 2NF
– There are no transitive dependencies

 Conversion to 3NF
– Objective
• Eliminate transitive dependencies (TD)
– Steps
1. Start with 2NF format
2. Break off the TD pieces and create separate tables

EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)



EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
JOB (JOB_CLASS, CHG_HOUR)

2
Normalization: 3NF example

2
2
2

You might also like