Normalization
Prepared by Caselyn F. Silvestre
The Need for Normalization
Normalization
We discuss four normal forms: first, second, third, and
Boyce-Codd normal forms
1NF, 2NF, 3NF, and BCNF
Normalization is a process that “improves” a database
design by generating relations that are of higher normal
forms.
The objective of normalization:
“to create relations where every dependency is on the key,
the whole key, and nothing but the key”.
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
\
Also,
any relation that is in BCNF, is in 3NF;
any relation in 3NF is in 2NF; and
any relation in 2NF is in 1NF.
The Normalization Process
• 1. Each table represents a single subject.
• Ex. A course table will contain only data that directly pertain to courses.
Similarity, a student table will contain only student data.
• 2. NO data item will be unnecessarily stored in more than one table.
This requirement is to ensure the that data are updated in only one
place.
• 3. All nonprime attributes are dependent on the primary key, the
entire primary key and nothing but the primary key to have uniquely
identify each value.
• 4. Each table is void, update or deletion anomalies, which ensure
integrity and consistency of the data.
Insert report layout here
Normal Forms
NORMAL FORM CHARACTERISTICS
First Normal form (1NF) Table format, no repeating groups, and PK identified
Second Normal form (2NF) 1NF and no partial dependencies
Third Normal form (3NF) 2NF and no transitive dependencies
Boyce-Codd Normal form (BCNF) Every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF) 3NF and no independent multivalued dependencies
Functional dependencies
EmpNum is the determinant attribute and EmpEmail is the dependent attribute
EmpNum EmpEmail
EmpNum EmpFname
EmpNum EmpLname
EmpEmail
EmpNum EmpFname
EmpLname
EmpNum EmpEmail EmpFname EmpLname
91.2914 8
Transitive dependency
DeptName is transitively dependent on EmpNum via DeptNum
EmpNum DeptName
EmpNum DeptNum
EmpNum EmpEmail DeptNum DeptNname
DeptNum DeptName
EmpNum EmpEmail DeptNum DeptNname
CONVERSION TO FIRST NORMAL FORM
CRITERIA 1: Eliminate the Repeating Groups by eliminating the NULLs
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HO HOURS
15
Table in 1st Normal Form
Evergreen 103 June E. Arbough Elect. Engineer
UR
84.50 23.8
15 Evergreen 101 John G. News Database Designer 105.00 19.4
15 Evergreen 105 Alice K. Johnson Database Designer 105.00 35.7
15 Evergreen 106 William Smithfield Programmer 35.75 12.8
15 Evergreen 102 David H. Senior System Analyst 96.75 23.8
18 Amber Wave 114 Annelise Jones Applications Designer 48.10 24.6
18 Amber Wave 118 James J. Frommer General Support 18.36 45.3
18 Amber Wave 104 Anne K. Ramoras System Analyst 96.75 32.4
18 Amber Wave 112 Darlene M. Smithson DSS Analyst 45.95 44.0
22 Rolling Tide 105 Alice K. Johnson Database designer 105.00 35.7
22 Rolling Tide 104 Anne K. Ramoras System Analyst 96.75 23.8
22 Rolling Tide 113 Delbert K. Joenbrood Applications Designer 48.10 24.6
22 Rolling Tide 111 Geoff B. Wabash Clerical Support 26.87 22.0
22 Rolling Tide 106 William Smithfield Programmer 35.75 24.6
25 Starflight 107 Maria D. Alonzo Programmer 35.75 24.6
25 Starflight 115 Travis B. Bawangi System Anlayst 96.75 45.8
25 Starflight 101 John G. News Database Designer 105.00 56.3
25 Starflight 114 Annelise Jones Applications Designer 48.10 33.1
25 Starflight 108 Raplh B. Washington System Analyst 96.75 23.6
CONVERSION TO FIRST NORMAL FORM
CRITERIA 2: Identify the Primary Key
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HO HOURS
15
Table in 1st Normal Form
Evergreen 103 June E. Arbough Elect. Engineer
UR
84.50 23.8
15 Evergreen 101 John G. News Database Designer 105.00 19.4
15 Evergreen 105 Alice K. Johnson Database Designer 105.00 35.7
15 Evergreen 106 William Smithfield Programmer 35.75 12.8
15 Evergreen 102 David H. Senior System Analyst 96.75 23.8
18 Amber Wave 114 Annelise Jones Applications Designer 48.10 24.6
18 Amber Wave 118 James J. Frommer General Support 18.36 45.3
18 Amber Wave 104 Anne K. Ramoras System Analyst 96.75 32.4
18 Amber Wave 112 Darlene M. Smithson DSS Analyst 45.95 44.0
22 Rolling Tide 105 Alice K. Johnson Database designer 105.00 35.7
22 Rolling Tide 104 Anne K. Ramoras System Analyst 96.75 23.8
22 Rolling Tide 113 Delbert K. Joenbrood Applications Designer 48.10 24.6
22 Rolling Tide 111 Geoff B. Wabash Clerical Support 26.87 22.0
22 Rolling Tide 106 William Smithfield Programmer 35.75 24.6
25 Starflight 107 Maria D. Alonzo Programmer 35.75 24.6
25 Starflight 115 Travis B. Bawangi System Anlayst 96.75 45.8
25 Starflight 101 John G. News Database Designer 105.00 56.3
25 Starflight 114 Annelise Jones Applications Designer 48.10 33.1
25 Starflight 108 Raplh B. Washington System Analyst 96.75 23.6
CONVERSION TO FIRST NORMAL FORM
CRITERIA 3: Identify All Dependencies
FIRST NORMAL FORM DEPENDENCY DIAGRAM
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR HOURS
Transitive dependency
Partial dependency Partial dependency
1NF (PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOURS, HOURS)
PARTIAL DEPENDENCIES
(PROJ_NUM PROJ_NAME) PROJ_NUM is the determinant; PROJ_NAME is dependent attri..
(EMP_NUM EMP_NAME, JOB_CLASS, CHG_HOUR)
TRANSITIVE DEPENDENCIES
(JOB_CLASS CHG_HOUR)
SECOND NORMAL FORM CONVERSTION RESULT
Table name: PROJECT
PROJECT (PROJ_NUM, PROJ_NAME)
PROJ_NUM PROJ_NAME
EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
Table name: EMPLOYEE TRANSITIVE DEPENDENCY (JOB_CLASS CHG_HOUR)
EMP_NUM EMP_NAME JOB_CLASS CHG_HOUR
Transitive dependency
Table name: ASSIGNMENT
PROJ_NUM EMP_NUM ASSING_HOURS ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS)
CRITERIA 1: Make new table to eliminate Partial dependencies
CRITERIA 2: Reassign corresponding dependent attributes
THIRD NORMAL FORM CONVERSTION RESULT
PROJ_NUM PROJ_NAME EMP_NUM EMP_NAME JOB_CLASS
Table name: PROJECT Table name: EMPLOYEE
PROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
PROJ_NUM EMP_NUM ASSIGN_HOURS JOB_CLASS CHG_HOUR
Table name: ASSIGNMENT Table name: JOB
ASSIGNMENT (PROJ_NUM, EMP_NUM, ASSIGN_HOURS) JOB (JOB_CLASS, CHG_HOUR)
CRITERIA 1: Make new table to eliminate Transitive dependencies
CRITERIA 2: Reassign corresponding dependent attributes
IMPROVING DATABASE DESIGN
1. Evaluate PK Assignments
Each time a new employee is entered into the
EMPLOYEE table, a JOB_CLASS value must be
entered. Unfortunately, if we will enter DB
Designer and Database Designer it triggers
violation.
Solution:
JOB_CODE JOB_CLASS, CHG_HOUR
2. Evaluate Naming Conventions
Therefore, CHG_HOUR will be changed to
JOB_CHG_HOUR to indicate association to JOB table.
JOB_CLASS JOB_DESCRIPTION in the JOB table
HOURS ASSIGN_HOURS in the ASSIGNMENT table
3. Refine Attribute Atomicity
Easily generate report by sorting
EX. EMP_NAME EMP_FNAME, EMP_LNAME, EMP_INITIAL
4. Identifying New Attributes
If the EMPLOYEE table were used in real-world environment,
other attributes would have be added.
EX. EMP_HIREDATE, EMP_MEDICAREPAYMENT, EMP_SALARYPAYMENT
5. Identifying New Relationship
According to the original report, the user needs to track
which employee is acting as the manager of each project.
This can only be implemented as a relationship between
EMPLOYEE and PROJECT.
Each PROJECT has only one MANAGER.
Thus ensure the EMP_NUM as the foreign key in PROJECT.
This action ensure that access PROJECT’s manager data
without producing unnecessary and undesirable duplication.
6. Refine Primary Key as Required for Data Granularity
Granularity – refers to the level of detail
represented by the values stored in a table’s row.
EX.
The ASSIGNMENT table, attribute ASSIGN_HOURS
can be stored at their lowest granularity such as
hourly total, daily total, weekly total, monthly total or
yearly total.
For what time frame? Hour, day, week,?
7. Maintain Historical Accuracy –
Writing the job charge per hour into the ASSIGNMENT table
is crucial to maintaining the historical accuracy of the table’s
data.
It would be appropriate to name the attribute
ASSIGN_CHG_HOUR rather than JOB_CHG_HOUR because the
value of job charge per hour will change over time.
8. Evaluate Using Derived Attributes –
You can derived attribute in the ASSIGNMENT table to
store the actual charge made to a project.
The derived attribute is ASSIGN_CHARGE, is the result of
multiplying ASSIGN_HOURS by ASSIGN_CHG_HOUR.
This creates a transitive dependency such that:
(ASSIGN_CHARGE + ASSIGN_HOURS) ASSIGN_CHG_HOUR
The Complete Database
PROJ_NUM PROJ_NAME EMP_NUM
PROJ_NUM PROJ_NAME EMP_NUM
15 Evergreen 105
18 Amber Wave 104
22 Rolling Tide 113
25 Starflight 101
Table name: PROJECT
JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR
JOB_CODE JOB_DESCRIPTION JOB_CHG_HOUR
500 Programmer 35.75
501 Systems Analyst 96.75
502 Database Designer 105.00
503 Electrical Engineer 84.50
504 Mechanical Engineer 67.90
505 Civil Engineer 55.78
506 Clerical Support 26.87
507 DSS Analyst 45.95
508 Applications Designer 48.10
509 Bio Technician 34.55
510 General Support 18.36
Table name: JOB
ASSIGN_NUM ASSIGN_DATE PROJ_NUM EMP_NUM ASSIGN_HOURS ASSIGN_CHG_HOUR ASSIGN_CHARGE
ASSIGN_NUM ASSIGN_DATE PROJ_NUM EMP_NUM ASSIGN_HOURS ASSIGN_CHG_HOUR ASSIGN_CHARGE
1001 04-Mar-12 15 103 2.6 84.50 219.70
1002 04-Mar-12 18 118 1.4 18.36 25.70
1003 05-Mar-12 15 101 3.6 105.00 378.00
1004 05-Mar-12 22 113 2.5 48.10 120.25
1005 05-Mar-12 15 103 1.9 84.50 160.55
1006 05-Mar-12 25 115 4.2 96.75 406.35
1007 05-Mar-12 22 105 5.2 105.00 546.00
1008 05-Mar-12 25 101 1.7 105.00 178.50
1009 05-Mar-12 15 105 2.0 105.00 210.00
1010 06-Mar-12 15 102 3.8 96.75 367.65
1011 06-Mar-12 22 104 2.6 96.75 251.55
1012 06-Mar-12 15 101 2.3 105.00 241.50
1013 06-Mar-12 25 114 1.8 48.10 86.58
Table name: ASSIGNMENT
EMP_NUM EMP_LNAME EMP_FNAME EMP_INITIAL EMP_HIREDATE JOB_CODE
EMP_NUM EMP_LNAME EMP_FNAME EMP_INITIAL EMP_HIREDATE JOB_CODE
101 News John G 08-Nov-00 502
102 Senior David H 12-Jul-89 501
103 Arbough June E 01-Dec-97 503
104 Ramoras Anne K 15-Nov-88 501
105 Johnson Alice K 01-Feb-94 502
106 Smithfield William L 22-Jun-05 500
107 Olenko Gerald A 18-Jul-99 500
108 Wabash Geoff B 11-Dec-96 501
109 Smithson Darlen M 04-Apr-89 501
Table name: EMPLOYEE
Surrogate Key Considerations
When primary key is considered to be unsuitable, designers use surrogate keys
Data entries in Table 5.3 are inappropriate because they duplicate existing records
-Yet there has been no violation of either entity integrity or referential integrity
***END***