HSVG245 – Programming for Geomatics Applications
Department of Surveying and Geomatics
Database Normalisation
Kumbirai Matingo
matingok@[Link]
Contents
Introduction to Database Normalisation ................................................................................... 3
Why do we need Normalization? ............................................................................................. 3
Prerequisites for Understanding Database Normalization ............................................. 5
Features of Database Normalization ...................................................................................... 5
Normal Forms in DBMS .................................................................................................................... 6
Advantages of Normalization .................................................................................................... 7
Disadvantages of Normalization .............................................................................................. 7
Normal Forms in DBMS .................................................................................................................... 7
1. First Normal Form (1NF): Eliminating Duplicate Records ........................................... 8
2. Second Normal Form (2NF): Eliminating Partial Dependency .................................. 9
What is Partial Dependency? ................................................................................................... 10
3. Third Normal Form (3NF): Eliminating Transitive Dependency .............................. 11
What is Transitive Dependency? ............................................................................................. 12
Introduction to Database Normalisation
Normalization is an important process in database design that helps improve the
database’s efficiency, consistency, and accuracy. It makes it easier to manage and
maintain the data and ensures that the database is adaptable to changing business
needs.
Database normalization is the process of organizing the attributes of the database
to reduce or eliminate data redundancy (having the same data but at different
places).
• Data redundancy unnecessarily increases the size of the database as the same
data is repeated in many places. Inconsistency problems also arise during
insert, delete, and update operations.
• In the relational model, there exist standard methods to quantify how efficient
a database is.
These methods are called normal forms and there are algorithms to covert a
given database into normal forms.
• Normalization generally involves splitting a table into multiple ones which
must be linked each time a query is made requiring data from the split tables.
Why do we need Normalization?
The primary objective for normalizing the relations is to eliminate the below
anomalies (also previously mentioned in the Relational Models lecture). Failure to
reduce anomalies results in data redundancy, which may threaten data integrity and
cause additional issues as the database increases.
Normalization consists of a set of procedures that assist you in developing an
effective database structure.
• Insertion Anomalies: Insertion anomalies occur when it is not possible to
insert data into a database because the required fields are missing or because
the data is incomplete. For example, if a database requires that every record
has a primary key, but no value is provided for a particular record, it cannot be
inserted into the database.
• Deletion anomalies: Deletion anomalies occur when deleting a record from a
database and can result in the unintentional loss of data. For example, if a
database contains information about customers and orders, deleting a
customer record may also delete all the orders associated with that customer.
• Updation anomalies: Updation anomalies occur when modifying data in a
database and can result in inconsistencies or errors. For example, if a database
contains information about employees and their salaries, updating an
employee’s salary in one record but not in all related records could lead to
incorrect calculations and reporting.
Before Normalization – the table is prone to redundancy and anomalies (insertion,
update and deletion).
Problems with the Employee_Department Relation
Insertion Anomaly
If a new department is created but no employee is assigned to it yet, we cannot store
its location because we need an employee record to insert.
Update Anomaly
If the location of the HR department changes, we must update it in multiple rows (for
both Nick Wise and Lily Case). If one row is missed, the data becomes inconsistent.
Data Redundancy
The department location is repeated for every employee in the same department.
After Normalization – the data is divided into logical tables to ensure consistency,
avoid redundancy and remove anomalies making the database efficient and reliable.
Prerequisites for Understanding Database Normalization
In database normalization, we mainly put only tightly related information together.
To find the closeness, we need to find which attributes are dependent on each other.
To understand dependencies, we need to learn the below concepts.
Keys are like unique identifiers in a table. For example, in a table of students, the
student ID is a key because it uniquely identifies each student. Without keys, it would
be hard to tell one record apart from another, especially if some information (like
names) is the same. Keys ensure that data is not duplicated and that every record can
be uniquely accessed.
Functional dependency helps define the relationships between data in a table. For
example, if you know a student’s ID, you can find their name, age, and class. This
relationship shows how one piece of data (like the student ID) determines other
pieces of data in the same table. Functional dependency helps us understand these
rules and connections, which are crucial for organizing data properly.
Once we figure out dependencies, we split tables to make sure that only closely
related data is together in a table.
When we split tables, we need to ensure that we do not lose information.
Additional reading on:
• Dependency Preserving Decomposition
• Lossless Decomposition in DBMS
Features of Database Normalization
Elimination of Data Redundancy
One of the main features of normalization is to eliminate the data redundancy that
can occur in a database. Data redundancy refers to the repetition of data in different
parts of the database. Normalization helps in reducing or eliminating this
redundancy, which can improve the efficiency and consistency of the database.
Ensuring Data Consistency
Normalization helps in ensuring that the data in the database is consistent and
accurate. By eliminating redundancy, normalization helps in preventing
inconsistencies and contradictions that can arise due to different versions of the
same data.
Simplification of Data Management
Normalization simplifies the process of managing data in a database. By breaking
down a complex data structure into simpler tables, normalization makes it easier to
manage the data, update it, and retrieve it.
Improved Database Design
Normalization helps in improving the overall design of the database. By organizing
the data in a structured and systematic way, normalization makes it easier to design
and maintain the database. It also makes the database more flexible and adaptable
to changing business needs.
Avoiding Update Anomalies
Normalization helps in avoiding update anomalies, which can occur when updating a
single record in a table affects multiple records in other tables. Normalization
ensures that each table contains only one type of data and that the relationships
between the tables are clearly defined, which helps in avoiding such anomalies.
Standardization
Normalization helps in standardizing the data in the database. By organizing the data
into tables and defining relationships between them, normalization helps in ensuring
that the data is stored in a consistent and uniform manner.
Normal Forms in DBMS
Normal Forms Description of Normal Forms
First Normal A relation is in first normal form if every attribute in that relation
Form (1NF) is single-valued attribute.
Second Normal A relation that is in First Normal Form and every non-primary-
Form (2NF) key attribute is fully functionally dependent on the primary key,
then the relation is in Second Normal Form (2NF).
Third Normal A relation is in the third normal form, if there is no transitive
Form (3NF) dependency for non-prime attributes as well as it is in the
second normal form. A relation is in 3NF if at least one of the
following conditions holds in every non-trivial function
dependency X –> Y.
• X is a super key.
• Y is a prime attribute (each element of Y is part of some
candidate key).
Other normal forms such as.
- Boyce-Codd Normal Form (BCNF)
- Fourth Normal Form (4NF) and
- Fifth Normal Form (5NF)
Also exist but for the purposes of this course, we will only focus on the first three.
Advantages of Normalization
• Normalization eliminates data redundancy and ensures that each piece of
data is stored in only one place, reducing the risk of data inconsistency and
making it easier to maintain data accuracy.
• By breaking down data into smaller, more specific tables, normalization helps
ensure that each table stores only relevant data, which improves the overall
data integrity of the database.
• Normalization simplifies the process of updating data, as it only needs to be
changed in one place rather than in multiple places throughout the database.
• Normalization enables users to query the database using a variety of different
criteria, as the data is organized into smaller, more specific tables that can be
joined together as needed.
• Normalization can help ensure that data is consistent across different
applications that use the same database, making it easier to integrate different
applications and ensuring that all users have access to accurate and consistent
data.
Disadvantages of Normalization
• Normalization can result in increased performance overhead due to the need
for additional join operations and the potential for slower query execution
times.
• Normalization can result in the loss of data context, as data may be split
across multiple tables and require additional joins to retrieve.
• Proper implementation of normalization requires expert knowledge of
database design and the normalization process.
• Normalization can increase the complexity of a database design, especially if
the data model is not well understood or if the normalization process is not
carried out correctly.
Normal Forms in DBMS
In the world of database management, Normal Forms are important for ensuring
that data is structured logically, reducing redundancy, and maintaining data integrity.
When working with databases, especially relational databases, it is critical to follow
normalization techniques that help to eliminate unnecessary duplication, improve
performance, and minimize the risk of anomalies.
In this section, we will explain normalization in DBMS, explain all the normal
forms, and explore the benefits of using them in real-world applications.
1. First Normal Form (1NF): Eliminating Duplicate Records
A table is in 1NF if it satisfies the following conditions:
• All columns contain atomic values (i.e., indivisible values).
• Each row is unique (i.e., no duplicate rows).
• Each column has a unique name.
• The order in which data is stored does not matter.
Consider the below COURSES Relation
In the above table, Courses has a multi-valued attribute, so it is not in 1NF. The
below table is in 1NF as there is no multi-valued attribute.
1NF is a key idea in relational database architecture. It guarantees that data is
organized to facilitate data processing, remove redundancy, and support data
integrity. 1NF establishes the foundation for more
complex normalization strategies that further improve the correctness and
efficiency of database systems by imposing atomic values and forbidding recurring
groupings inside rows.
2. Second Normal Form (2NF): Eliminating Partial Dependency
A relation is in 2NF if it satisfies the conditions of 1NF and additionally. No partial
dependency exists, meaning every non-prime attribute (non-key attribute) must
depend on the entire primary key, not just a part of it.
Consider a table storing information about students, courses and their fees.
• There are many courses having the same course fee. Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO.
• COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO.
• COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO.
• The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the table.
• COURSE_FEE is a non-prime attribute because it is not part of the candidate
key {STUD_NO, COURSE_NO}.
• But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
• Therefore, non-prime attribute COURSE_FEE is dependent on a proper subset
of the candidate key, which is a partial dependency and so this relation is not
in 2NF.
In 2NF, we eliminate such dependencies by breaking the table into two separate
tables:
1. A table that links students and courses.
2. A table that stores course fees.
Now, each table is in 2NF:
• The Course Table ensures that COURSE_FEE depends only on COURSE_NO.
• The Student-Course Table ensures there are no partial dependencies because
it only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is free
from partial dependencies. This makes the database more efficient and easier to
maintain.
Why is 2NF Important?
By ensuring that a database table adheres to Second Normal Form, we achieve
several key benefits:
1. Reduces Redundancy: In our example, we no longer store the same course fee
multiple times. Instead, we store it once in the Course Fee table and reference it in
the Student-Course table.
2. Minimizes Update Anomalies: With data being centralized in the right tables,
you’re less likely to run into problems when you update or delete information. For
example, if a course fee changes, you only need to update it in one place.
3. Improves Data Integrity: By eliminating partial dependencies, 2NF ensures that
the database structure is logical, which in turn ensures that data relationships are
consistent.
4. Enhances Query Efficiency: Queries will be more efficient, as tables are smaller
and more focused on specific data, making it faster to retrieve the necessary
information.
What is Partial Dependency?
A functional dependency denoted as X→Y where X and Y are an attribute set of a
relation, is a partial dependency, if some attribute A∈X can be removed and the
dependency still holds. For example, if you have a functional dependency X→Y,
where X is a composite candidate key (made of multiple columns), and we can
remove one column from X, but the dependency still works, then it’s a partial
dependency.
In a composite key (a key made of multiple attributes), a partial dependency
happens when one of the non-prime attributes depends only on a part of the
composite key. Here’s how to identify partial dependencies in your database:
• Look for functional dependencies where one attribute depends on a part of
the primary key, not the entire key.
• If an attribute (like COURSE_FEE in our example) depends on just a part of the
key (COURSE_NO), it’s a partial dependency.
• To remove partial dependencies, break the table into smaller tables that store
only relevant data together.
Second Normal Form (2NF) helps make databases more organized by removing
partial dependencies. It reduces duplicate data, prevents errors, and ensures data is
stored accurately. Following 2NF makes it easier to manage, update, and retrieve
information from your database. Whether we’re building a small application or a
large enterprise system, following 2NF principles will lead to better performance
and data consistency.
3. Third Normal Form (3NF): Eliminating Transitive Dependency
A relation is in 3NF if it satisfies 2NF and additionally, there are no transitive
dependencies. In simpler terms, non-prime attributes should not depend on
other non-prime attributes.
The Third Normal Form (3NF) builds on the rules of the First (1NF) and Second
(2NF) Normal Forms. Reaching 3NF ensures that the database is well-structured,
efficient, and free from data issues or inconsistencies.
Even though tables in 2NF have less duplication than 1NF, they can still face
problems like update errors. For example, if only one row is updated while another is
not, the data becomes inconsistent. This happens due to transitive dependencies. To
solve this, we move the table to 3NF, which removes such dependencies and makes
the database more reliable.
NB: If A->B and B->C are two FDs then A->C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively
dependent attribute(s) from the relation by placing the attribute(s) in a new relation
along with a copy of the determinant. Consider the examples given below.
Consider the below relation,
In the relation CANDIDATE given above:
• Functional dependency Set: {CAND_NO -> CAND_NAME, CAND_NO -
>CAND_STATE, CAND_STATE -> CAND_COUNTRY, CAND_NO -> CAND_AGE}
• So, Candidate key here would be: {CAND_NO}
• For the relation given here in the table, CAND_NO -> CAND_STATE and
CAND_STATE -> CAND_COUNTRY are actually true. Thus, CAND_COUNTRY
depends transitively on CAND_NO. This transitive relation violates the rules of
being in the 3NF. So, if we want to convert it into the third normal form, then
we have to decompose the relation CANDIDATE (CAND_NO, CAND_NAME,
CAND_STATE, CAND_COUNTRY, CAND_AGE) as:
CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)
STATE_COUNTRY (STATE, COUNTRY).
What is Transitive Dependency?
A transitive dependency occurs when a non-key attribute depends on another non-
key attribute rather than directly on the primary key. For instance, consider a table
with the attributes (A, B, C) where A is the primary key and B and C are non-key
attributes. If B determines C then C is transitively dependent on the A through B. This
can lead to data anomalies and redundancy which 3NF aims to eliminate by the
ensuring that all non-key attributes depend only on the primary key.
A crucial stage in database normalization is Third Normal Form (3NF). It deals with
transitive dependencies and improves data integrity through effective information
organization. 3NF ensures that non-key properties only depend on the primary
key , removing redundancy and helping to create a well-organized and
normalized relational database model .