Normalization in DBMS
Normalization in DBMS is a technique using which you can organize the data in the database tables
so that:
◦ There is less repetition of data,
◦ A large set of data is structured into a bunch of smaller tables,
◦ and the tables have a proper relationship between them.
DBMS Normalization is a systematic approach to decompose (break down) tables to eliminate
data redundancy(repetition) and undesirable characteristics like Insertion anomaly in DBMS,
Update anomaly in DBMS, and Delete anomaly in DBMS.
It is a multi-step process that puts data into tabular form, removes duplicate data, and set up the
relationship between tables.
Why is it important to Normalise relations ?
To Reduce Data Redundancy and Inconsistency i.e. we only want to keep each piece of data once. If
we keep data more than once:-
• Repeated data makes the database larger.
• Multiple instances of the same values make maintaining the data more difficult.
Why we need Normalization in DBMS?
• Normalization is required for,
• Eliminating redundant(useless) data, therefore handling data integrity, because if data is
repeated it increases the chances of inconsistent data.
• Normalization helps in keeping data consistent by storing the data in one table and
referencing it everywhere else.
• Storage optimization although that is not an issue these days because Database storage is
cheap.
• Breaking down large tables into smaller tables with relationships, so it makes the database
structure more scalable and adaptable.
• Ensuring data dependencies make sense i.e. data is logically stored.
Aim of Normalisation
• The aim of normalisation is to create a set of relations where all attributes are Fully
Functionally Dependent on the whole primary key.
• This is where all non-key attributes are fully determined by the full primary key.
• This will ensure that
• each piece of data is only kept once.
• all attributes are kept in the correct relation.
• All attributes maintain a one to one relationship with the primary key.
• Where this is not the case – we need to do some work!!
Insert, Update and Delete Anomalies
These anomalies occur when the database has not been correctly normalised and a row is to be
inserted, updated or deleted.
The
CustomerOrder relation shown above is not normalised. Can you see that there are attributes
that should have been split into several different entities.
Let us have a look at some examples of what problems will occur :-
• To add a new customer to a database we do not want to also create an additional order.
• If we update the customer’s address we don’t want to have to update it in multiple rows for
each order the customer has placed.
• When we wish to delete a customer we do not want to lose all their order information.
The solution is to Normalise the relations.
Types of DBMS Normal forms
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
1. First Normal Form (1NF)
An entity specification or relation is in 1NF if it is:
• in the form of a rectangular table
• and there are no multivalued attributes
• and there are no repeated groups or attributes
• and all attributes are atomic
• and all values are not derived
• and all the known data has been entered and all data without a known value
contains a Null Value.
Changing Relations from 3NF to BCNF
An entity specification is in Boyce-Codd Normal Form (BCNF) if it is:
• in 3NF
• and does not contain any Boyce-Codd Dependencies.
A Boyce-Codd dependency is
• when an attribute that is part of the primary key is determined by a non-key attribute
• and this non-key attribute has not been defined as a candidate key.
Using the First Normal Form, data redundancy increases, as there will be many columns with same
data in multiple rows but each row as a whole will be unique.
2. Second Normal Form (2NF)
An entity specification or relation is in 2NF if it is:
• 1NF
• and there are no partial dependencies.
Partial dependencies are attributes that are dependent on only a part or one column of the
key.
The only way in which there can be part of a key is if there is a composite key!!!!!
What is Partial Dependency?
When a table has a primary key that is made up of two or more columns, then all the columns(not
included in the primary key) in that table should depend on the entire primary key and not on a part
of it. If any column(which is not in the primary key) depends on a part of the primary key then we
say we have Partial dependency in the table.
• We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID.
• According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.
• But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently.
• This is called partial dependency, which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
3. Third Normal Form (3NF)
An Entity Specification or Relation is in 3NF if it is:
• in 2NF
• and there are no transitive dependencies.
A Transitive Dependency is a dependency between two attributes.
What is Transitive Dependency?
Whenever some indirect relationship happens to cause functional dependency (FC), it is known as
Transitive Dependency. Thus, if A -> B and B -> C are true, then A -> C happens to be a transitive
dependency.
Thus, to achieve 3NF, one must eliminate the Transitive Dependency.
Example :
4. Boyce-Codd Normal Form (BCNF)
An entity specification is in Boyce-Codd Normal Form (BCNF) if it is:
• in 3NF
• and does not contain any Boyce-Codd Dependencies.
A Boyce-Codd dependency is
• when an attribute that is part of the primary key is determined by a non-key attribute
• and this non-key attribute has not been defined as a candidate key.
They are rare so most relations that are in 3NF are also in BCNF.
Before coversion:
After Conversion
Functional Dependency
• A functional dependency is an association between two attributes of the same relational
database table.
• One of the attributes is called the determinant and the other attribute is called the
determined.
• For each value of the determinant there is associated one and only one value of the
determined.
• If A is the determinant and B is the determined then we say that A functionally determines B
and graphically represent this as A -> B. The symbols A à B· can also be expressed as B is
functionally determined by A.
• Example : Below the person schema
person(national_id, name, address);
Here national_id functionally determines the person_name. So functional dependency is
national_id -> name;
Dependency Diagram
A dependency diagram, shown in Figure illustrates the various dependencies that might exist in a
non-normalized table. A non-normalized table is one that has data redundancy in it.
• Depicts all dependencies found within given table structure
• Helpful in getting bird’s-eye view of all relationships among table’s attributes
• Makes it less likely that you will overlook an important dependency
The following dependencies are identified in this table:
• ProjectNo and EmpNo, combined, are the PK.
• Partial Dependencies:
• ProjectNo —> ProjName
• EmpNo —> EmpName, DeptNo,
• ProjectNo, EmpNo —> HrsWork
• Transitive Dependency:
• DeptNo —> DeptName