Learn BI, Analytics and Big Data 
Module: Normalization
Normalization 
BIWHIZ 
Normalization is a set of rules to allocate data to 
tables in such a way as to minimize the 
redundancy and to minimize certain anomalies 
related to insert, delete and update of data. 
BIWHIZ 2
Normalization 
BIWHIZ 
Anomalies 
A major objective of data normalization is to avoid modification anomalies. 
These come in three flavors: 
• An insertion anomaly is a failure to place information into all the places in 
the database. In a properly normalized database, information about a new 
entry needs to be inserted into only one place in the database. 
• A deletion anomaly is a failure to remove information about an existing 
database entry from all places. In a properly normalized database, the 
entry needs to be deleted from only one. 
• A update anomaly is a failure to update information about an existing 
database entry from all places. 
All three kinds of anomalies are highly undesirable, since their occurrence 
constitutes corruption of the database. 
BIWHIZ 3
Normalization 
BIWHIZ 
First Normal Form 
• A Relations is said to be in First Normal Form (1NF) when it contains no 
multi-valued attribute, i.e. every attribute must contain at most one data 
value. 
• In other words a table is in first normal form if all the key attributes have 
been defined and it contains no repeating groups. 
BIWHIZ 4
Normalization 
First Normal Form 
Example: Let us assume an ORDER table as below: 
BIWHIZ 
ORDER 
order_id customer_id product1 product2 product3 
123 456 abc1 def1 ghi1 
456 789 abc2 
This structure creates the following problems: 
•Order 123 has no room for more than 3 products. 
•Order 456 has wasted space for product2 and product3. 
BIWHIZ 5
Normalization 
BIWHIZ 
First Normal Form 
In order to create a table that is in first normal form we must extract the 
repeating groups and place them in a separate table. 
We have removed 'product1', 'product2' and 'product3', so there are no repeating groups. 
6 
ORDER 
order_id customer_id 
123 456 
456 789 
ORDER_LINE 
order_id product 
123 abc1 
123 def1 
123 ghi1 
456 abc2
Normalization 
BIWHIZ 
Second Normal Form 
A relation is said to be in 2nd Normal Form if 
• The relation is in 1st Normal Form 
• Every non key attribute is fully functionally dependent on the whole of the 
primary key (i.e. there are no partial dependencies). 
1. Anomalies can occur when attributes are dependent on only part of a 
BIWHIZ 7 
multi-attribute (composite) key. 
2. A relation is in second normal form when all non-key attributes are 
dependent on the whole key. That is, no attribute is dependent on only 
a part of the key. 
3. Any relation having a key with a single attribute is in second normal 
form.
Normalization 
Second Normal Form 
Take the following table structure as an example: 
order(order_id, cust_id, cust_address, cust_contact, order_date, 
order_total) 
BIWHIZ 
Here we should realize that cust_address and cust_contact are functionally 
dependent on cust_id but not on order_id, therefore they are not dependent 
on the whole key. To make this table 2NF these attributes must be removed 
and placed somewhere else. 
order_cust(order_id, cust_id, order_date, order_total) 
customer(cust_id, cust_address, cust_contact) 
BIWHIZ 8
Normalization 
BIWHIZ 
Second Normal Form 
So what problems can occur if we do not make it into 2nd Normal Form? 
It can result into 
Update anomaly: If you want to update the address of a customer, need to 
update in all orders made by this customer, if update is missed at any place 
then data will be inconsistent. 
Insert anomaly: There is no way to add a customer information he he/she has 
not made any order. 
BIWHIZ 9
Normalization 
BIWHIZ 
Third Normal Form 
A relation is said to be in Third Normal Form if 
• The relation is in Second Normal Form 
• There is no transitive dependency (that is, all the non-key attributes 
depend only on the primary key) 
In other words, A relation is in 3NF when 'All non-key attributes are 
dependent on the key, the whole key and nothing but the key'. 
BIWHIZ 10
Normalization 
BIWHIZ 
Third Normal Form 
An example of a 2NF table that fails to meet the requirements of 3NF is: 
BIWHIZ 11 
Tournament Winners 
Tournament Year Winner 
Winner Date of 
Birth 
Indiana Invitational 1998 Al Fredrickson 21 July 1975 
Cleveland Open 1999 Bob Albertson 28 September 1968 
Des Moines 
1999 Al Fredrickson 21 July 1975 
Masters 
Indiana Invitational 1999 Chip Masterson 14 March 1977
Normalization 
BIWHIZ 
Third Normal Form 
• Because each row in the table needs to tell us who won a particular 
Tournament in a particular Year, the composite key {Tournament, Year} is a 
minimal set of attributes guaranteed to uniquely identify a row. That is, 
{Tournament, Year} is a candidate key for the table. 
• The breach of 3NF occurs because the non-prime attribute Winner Date of 
Birth is transitively dependent on the candidate key {Tournament, Year} 
via the non-prime attribute Winner. The fact that Winner Date of Birth is 
functionally dependent on Winner makes the table vulnerable to logical 
inconsistencies, as there is nothing to stop the same person from being 
shown with different dates of birth on different records. 
BIWHIZ 12
Normalization 
BIWHIZ 
Third Normal Form 
In order to express the same facts without violating 
3NF, it is necessary to split the table into two: 
Tournament Winners 
Tournament Year Winner 
Indiana Invitational 1998 Al Fredrickson 
Cleveland Open 1999 Bob Albertson 
Des Moines Masters 1999 Al Fredrickson 
Indiana Invitational 1999 Chip Masterson 
Winner Dates of Birth 
Winner Date of Birth 
Chip Masterson 14 March 1977 
Al Fredrickson 21 July 1975 
Bob Albertson 28 September 1968 
BIWHIZ 13
Normalization 
BIWHIZ 
So we may ask what is the difference between 2nd 
Normal Form and 3rd Normal Form in these two 
examples? 
In Order example, the customer address and customer 
contact is part of the Order information as these values 
might be needed for delivery purpose so they are 
dependent on part of primary keys. 
In Winner example, date of birth is not needed for the 
transaction(win or loss ) , this is just an extra 
information which is dependent only on Winner, 
nothing to do with a win or loss. 
BIWHIZ 14
Normalization 
Boyce-Code Normal Form (BCNF) 
A relation is said to be in BCNF if 
• The relation is in Third Normal Form 
• And every determinant is a candidate key. 
BIWHIZ 
BIWHIZ 15
Normalization 
BIWHIZ 
Boyce-Code Normal Form (BCNF) 
• Anomalies can occur in relations in 3NF if there is 
a composite key in which part of that key has a 
determinant which is not itself a candidate key. 
• This can be expressed as R(A,B,C), C A where: 
– The relation contains attributes A, B and C. 
– A and B form a candidate key. 
– C is the determinant for A (A is functionally dependent 
on C). 
– C is not part of any key. 
BIWHIZ 16
Normalization 
Boyce-Code Normal Form (BCNF) 
Take the following table structure as an 
example: 
• schedule(campus, course, class, time, 
building) 
• Take the following sample data: 
BIWHIZ 
campus course class time building 
East English 101 1 8:00-9:00 212 AYE 
East English 101 2 10:00-11:00 305 RFK 
West English 101 3 8:00-9:00 102 PPR 
BIWHIZ 17
Normalization 
BIWHIZ 
Boyce-Code Normal Form (BCNF) 
Note that no two buildings on any of the 
university campuses have the same name, thus 
BUILDING CAMPUS. As the determinant is not 
a candidate key this table is NOT in Boyce-Codd 
normal form. 
campus course class time building 
East English 101 1 8:00-9:00 212 AYE 
East English 101 2 10:00-11:00 305 RFK 
West English 101 3 8:00-9:00 102 PPR 
BIWHIZ 18
Normalization 
BIWHIZ 
Boyce-Code Normal Form (BCNF) 
• This table should be decomposed into the 
following relations: 
• R1(course, class, building, time) 
• R2(building, campus) 
BIWHIZ 19
Normalization 
BIWHIZ 
Boyce-Code Normal Form (BCNF) 
What if we do not apply BCNF form?? 
Every time when you want to update Campus 
and Building you have to check the other one 
and make a corresponding update on that as 
well. This can result in update anomaly. 
BIWHIZ 20
Normalization 
BIWHIZ 
Fourth Normal Form 
A table is in fourth normal form (4NF) if and only if it is in 
BCNF and contains no more than one multi-valued 
dependency. 
• Anomalies can occur in relations in BCNF if there is 
more than one multi-valued dependency. 
• If A-- >> B and A-- >> C but B and C are unrelated, ie 
A - - >>(B,C) is false, then we have more than one multi-valued 
dependency. 
BIWHIZ 21
Normalization 
Fourth Normal Form 
Take the following table structure as an example: 
info(employee#, skills, hobbies) 
Take the following sample data: 
BIWHIZ 
22 
employee# skills hobbies 
1 Programming Golf 
1 Programming Bowling 
1 Analysis Golf 
1 Analysis Bowling 
2 Analysis Golf 
2 Analysis Gardening 
2 Management Golf 
2 Management Gardening
Normalization 
BIWHIZ 
Fourth Normal Form 
Exercise: Try adding a new hobby?? 
This table is difficult to maintain since adding a new 
hobby requires multiple new rows corresponding to 
each skill. 
This problem is created by the pair of multi-valued 
dependencies EMPLOYEE# -- >>SKILLS and EMPLOYEE#-- 
>>HOBBIES. A much better alternative would be to 
decompose INFO into two relations: 
• skills(employee#, skill) 
• hobbies(employee#, hobby) 
BIWHIZ 23
Normalization 
BIWHIZ 
Fifth Normal Form 
A table is in fifth normal form (5NF) or Projection-Join 
Normal Form (PJNF) if it is in 4NF and it cannot have a 
lossless decomposition into any number of smaller tables. 
In another words , if we try to decompose it further then 
we will not be able to join them back and get the original 
table. 
(4th and 5th Normal Forms are applicable for Key-Only 
tables so there is no question of looking at 1st, 2nd, 3rd or 
BCNF normal forms, they are already there) 
BIWHIZ 24
Normalization 
BIWHIZ 
Fifth Normal Form 
Take the following table structure as an 
example: 
buying(buyer, vendor, item) 
This is used to track buyers, what they buy, and 
from whom they buy. 
Take the following sample data: 
BIWHIZ 25
Normalization 
BIWHIZ 
buyer vendor item 
Sally Liz Claiborne Bags 
Mary Liz Claiborne Bags 
Sally Jordach Jeans 
Mary Jordach Jeans 
Sally Jordach Sneakers 
BIWHIZ 26 
Fifth Normal Form 
The question is, what do you do if Claiborne starts to sell Jeans? How many rows 
must you create to record this fact? 
The problem is there are pairwise cyclical dependencies in the primary key. That is, 
in order to determine the item you must know the buyer and vendor, and to 
determine the vendor you must know the buyer and the item, and finally to know 
the buyer you must know the vendor and the item. 
The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item, 
and Vendor-Item.
Normalization 
BIWHIZ 
BIWHIZ 27 
Fifth Normal Form 
buyer vendor 
Sally Liz Claiborne 
Mary Liz Claiborne 
Sally Jordach 
Mary Jordach 
buyer item 
Sally Bags 
Mary Bags 
Sally Jeans 
Mary Jeans 
Sally Sneakers 
vendor item 
Liz Claiborne Bags 
Jordach Jeans 
Jordach Sneakers
Questions? 
BIWHIZ 
BIWHIZ 28