0% found this document useful (0 votes)
10 views24 pages

Database Normalisation Notes 2026

The document discusses database normalization, a methodology used to design relational database systems by eliminating data redundancy and ensuring data integrity. It outlines the process of normalization through various stages, including unnormalized form (UNF), first normal form (1NF), second normal form (2NF), and third normal form (3NF), while also introducing key concepts such as primary, secondary, foreign, simple, compound, and composite keys. Additionally, it explains the degrees of relationships between entities, including one-to-one, one-to-many, and many-to-many relationships.

Uploaded by

charlesedinoa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views24 pages

Database Normalisation Notes 2026

The document discusses database normalization, a methodology used to design relational database systems by eliminating data redundancy and ensuring data integrity. It outlines the process of normalization through various stages, including unnormalized form (UNF), first normal form (1NF), second normal form (2NF), and third normal form (3NF), while also introducing key concepts such as primary, secondary, foreign, simple, compound, and composite keys. Additionally, it explains the degrees of relationships between entities, including one-to-one, one-to-many, and many-to-many relationships.

Uploaded by

charlesedinoa
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DATABASE NORMALIZATION

Data Analysis and Normalisation

The basic concept of a relational database system is that data is only held once, ie there is no
duplication, and the data can be accessed by many users at the same time for many purposes.

A ready-made, tailored database is not normally purchased off the shelf. You can employ the
services of an external company to develop a tailored database for you. Not only is this
expensive but the external company still needs to go through the processes as described here.

You can purchase the software to perform such functions as creating the tables, adding and
amending data, creating reports, managing security, etc but the database itself still needs to be
designed.

It is worth noting at this stage that the database, and the software to manage it, is collectively
called a Database Management System. If the database conforms to the rules of a relational
database, as originally defined by Dr E F Codd, then this is called a Relational Database
Management System (RDBMS).

In this section we will examine briefly the process of data analysis and an introduction to a
methodology to carry out a design, based on that analysis, known asnormalisation.

When designing a database, it is important to ensure that it will support the organisation's needs
both now and in the future. Furthermore, it should result in a design that avoids the pitfalls as
described in the earlier sections.

A database designer examines in detail the data requirements of the system under development.
The process involves analysing existing data files, screen layouts and reports, interviewing key
staff, documenting the findings, etc. However, this is quite an extensive process and is outside
the scope of this Unit.

The deliverables from this data collection exercise allow the Database Designer to use a
data analysis technique to design the databaseData Analysis Techniques

Two widely used data analysis techniques are Entity Relationship Modelling
(ER) and Normalisation. During this unit we will be investigating the latter.

1
Normalisation is a data analysis methodology to design a database system. It allows the database
designer to understand the current data structures within an organisation. Furthermore, it allows
for any future changes and enhancements to the system.

One of the deliverables from the process is known as a logical data model. Subsequently, this
can be used to create a physical implementation into the desired database system.

It is important to note that the logical data model is independent of the database system used to
implement it. It includes none of the physical characteristics (eg indexes, files structures, etc) of
that database system. Therefore, should an organisation change its database system, the original
design can still be used to re-implement it.

Keys

2
Before examining the normalisation process in detail, you need to understand some more
terminology, ie keys. Keys were briefly discussed earlier. However, there is more than one type
of key.

During this discussion we will use the terms entity, entity occurrence and attribute. The
symbols used in the diagrams are fairly self explanatory. However, they will be explained in
more detail later.

Primary Key

A primary key comprises one or more attributes in an entity that uniquely identifies an entity
occurrence (ie no other entity occurrence can contain the same key). For example, your student
number is a primary key as this uniquely identifies you within the college student records
system. An employee number uniquely identifies a member of staff within a company. An IP
address uniquely addresses a PC on the Internet.

A primary key is mandatory. That is, each entity occurrence must have a value for its primary
key.

Secondary Key

An entity may have one or more choices for the primary key. Collectively these are known as
candidate keys. One is selected as the primary key. Those not selected are known as secondary
keys.

For example, an employee has an employee number, a National Insurance (NI) number and an
email address. If the employee number is chosen as the primary key then the NI number and
email address are secondary keys. However, it is important to note that if any employee does not
have a NI number or email address (ie the attribute is not mandatory) then it cannot be chosen as
a primary key.

Foreign Key

A foreign key is one or more attributes in one entity, which enables a link (or relationship) to
another entity. That is, a foreign key in one entity links to a primary key in another entity.
However, if the business rules permit, a foreign key may be optional.

3
For example, an employee works in a department. The department number column in the
employee entity is a foreign key, which links to the department entity.

Foreign keys will be explained in more detail when we explore the normalisation process later in
this section.

Simple Key

Any of the keys described before (ie primary, secondary or foreign) may comprise one or more
attributes. A simple key consists of a single attribute to uniquely identify an entity occurrence,
for example, a student number, which uniquely identifies a particular student. No two students
would have the same student number.

Compound Key

A compound key consists of more than one attribute to uniquely identify an entity occurrence.

Each attribute, which makes up the key, is also a simple key in its own right.

For example, we have an entity named enrolment, which holds the courses on which a student is
enrolled. In this scenario a student is allowed to enrol on more than one course. This has a
compound key of both student number and course number, which is required to uniquely identify
a student on a particular course.

4
Student number and course number combined is a compound primary key for the enrolment
entity.

Student number in the enrolment entity is a simple key in its own right, which is used as a
foreign key to link to the student entity.

Course number in the enrolment entity is a simple key in its own right, which is used as a foreign
key to link to the course entity.

Composite Key

A composite key consists of more than one attribute to uniquely identify an entity occurrence.

This differs from a compound key in that one or more of the attributes, which make up the key,
are not simple keys in their own right.

For example, you have a database holding your CD collection. One of the entities is called
tracks, which holds details of the tracks on a CD. This has a composite key of CD name, track
number.

CD name in the track entity is a simple key, linking to the CD entity, but track number is not a
simple key in its own right.

SAQ 6

For each of the entities you listed in SAQ 5, list possible primary keys. Then, suggest the
secondary keys, if any.

Entity Primary Key Secondary Key


Student student no SQA no
Course course no

5
Unit unit no
Result student no, unit no
Classroom classroom id
Lecturer lecturer no NI no
Department department no
Attendance student no, unit no, date, time
Unit Giving unit no, classroom id, date, time,
Expertise lecturer no, unit no

The Normalisation Process

Normalisation is a data analysis technique to design a database system. It allows the database
designer to understand the current data structures within an organisation. Furthermore, it aids any
future changes and enhancements to the system.

The end result of normalisation is a set of entities, which removes unnecessary redundancy (ie
duplication of data) and avoids the anomalies discussed earlier.

Normalisation follows a staged process that obeys a set of rules.

The steps of normalisation are:

 Select the data source and convert into an unnormalised table (UNF)

 Transform the unnormalised data into first normal form (1NF)

 Transform data in first normal form (1NF) into second normal form (2NF)

 Transform data in second normal form (2NF) into third normal form (3NF)

Occasionally, the data may still be subject to anomalies in third normal form. In this case, we
may have to perform further transformations.

 Transform third normal form to Boyce-Codd normal form (BCNF)


 Transform Boyce-Codd normal form to fourth normal form (4NF)
 Transform fourth normal form to fifth normal form (5NF)

6
For this unit you only need to be aware of the above three forms. However, the process to
transform them is not assessed and, therefore, not described.

Finally, from the normalised tables, a data model is produced.

Earlier you saw an example of a data model containing several entities. The normalisation
process helps us determine the entities required in the system being modeled. These entities can
then be represented in a data model. Later, following the normalisation process to determine the
entities, we will see how the data model itself is constructed.

First, the process of normalisation (to 3NF) is demonstrated by use of an example.

The first step is to select the data source (ie the report above) and convert into an unnormalised
table (UNF). The process is as follows:

 Create column headings for the table for each data item on the report (ignoring any
calculated fields). A calculated field is one that can be derived from other information on
the form. In this case total staff and average hourly rate.

 Enter sample data into table (this data is not simply the data on the report but a
representative sample. In this example it shows several employees working on several
projects. The same employee can work on different projects and at a different hourly
rate).

7
 Identify a key for table (and underline it).

 Remove duplicate data (in this example, for the chosen key of Project Code, the values
for Project Code, Project Title, Project Manager and Project Budget are duplicated if
there are two or more employees working on the same project).

Unnormalised Form (UNF)

First Normal Form (1NF)

The next step is to transform the table of unnormalised data into first normal form (1NF). The
rule is: remove any repeating attributes to a new table. The process is as follows:

 Identify repeating attributes.

8
 Remove these repeating attributes to a new table together with a copy of the key from the
UNF table.

 Assign a key to the new table (and underline it). The key from the original unnormalised
table always becomes part of the key of the new table. Acompound key is created. The
value for this key must be unique for each entity occurrence.

Second Normal Form (2NF)

The next step is to transform the data in first normal form (1NF) into second normal form (2NF).
The rule is: remove any non-key attributes that only depend on part of the table key to a
new table. Ignore tables with (a) a simple key or (b) with no non-key attributes (these go straight
to 2NF with no conversion). The process is as follows:

 Take each non-key attribute in turn and ask the question: is this attribute dependent
on one part of the key?

o If yes, remove attribute to new table with a copy of the part of the key it is
dependent upon. The key it is dependent upon becomes the key in the new table.
Underline the key in this new table.
o If no, check against other part of the key and repeat above process
o If still no, ie not dependent on either part of key, keep attribute in current table.

9
3rd Normal Form (3NF)

The next step is to transform the data in second normal form (2NF) into third normal form
(3NF). The rule is: remove to a new table any non-key attributes that are more dependent
on other non-key attributes than the table key. Ignore tables with zero or only one non-key
attribute (these go straight to 3NF with no conversion). The process is as follows:

 If a non-key attribute is more dependent on another non-key attribute than the table key:
o Move the dependent attribute, together with a copy of the non-key attribute upon
which it is dependent, to a new table.
o Make the non-key attribute, upon which it is dependent, the key in the new table.
Underline the key in this new table.
o Leave the non-key attribute, upon which it is dependent, in the original table and
mark it a foreign key (*).

10
The Normalisation Process

We've now been through the complete process. Having started off with an unnormalised table we
finished with four normalised tables in 3NF. You will notice that duplication has been removed
(apart from the keys needed to establish the links between those tables).

The process may look complicated. However, if you follow the rules completely, and do not
miss out any steps, then you should arrive at the correct solution. If you omit a rule there is a
high probability that you will end up with too few tables or incorrect keys.

A summary of the complete normalisation process is given on the following pages.

Step 1: UNF

Select the data source and convert into an unnormalised table (UNF)

Process:

 Create column headings (ignoring any calculated fields)

11
 Enter sample data into table

 Identify a key for table (and underline it)

Remove duplicate data

Step 2: 1NF

Transform a table of unnormalised data into first normal form (1NF)

Rule: Remove any repeating attributes to a new table

Process:

 Identify repeating attributes

 Remove repeating attributes to a new table together with a copy of the key from the UNF
table

 Assign a key to the new table (and underline it). The key from the unnormalised
table always becomes part of the key of the new table. Acompound key is created. The
value for this key must be unique for each entity occurrence.

Step 3: 2NF

Transform data in first normal form (1NF) into second normal form (2NF)

Rule: Remove any non-key attributes that only depend on part of the table key to a new
table

Ignore tables with a) a simple key or b) with no non-key attributes (these go straight to 2NF with
no conversion)

Process:

 Take each non-key attribute in turn and ask the question


- is this attribute dependent on one part of the key?

12
 If yes, remove attribute to new table with a copy of the part of the key it is dependent
upon. The key it is dependent upon becomes the key in the new table. Underline the key
in this new table.

 If no, check against other part of the key and repeat above process.

 If still no, ie not dependent on either part of key, keep attribute in current table.

Step 4: 3NF

Transform data in second normal form (2NF) into third normal form (3NF)

Rule: Remove to a new table any non-key attributes that are more dependent on other non-
key attributes than the table key

Ignore tables with zero or only one non-key attribute (these go straight to 3NF with no
conversion).

Process:

 If a non-key attribute is more dependent on another non-key attribute than the table key
 Move the dependent attribute, together with a copy of the non-key attribute upon which
it is dependent, to a new table

 Make the non-key attribute, upon which it is dependent, the key in the new table.
Underline the key in this new table.

 Leave the non-key attribute, upon which it is dependent, in the original table and mark it
a foreign key (*).

Degrees of Relationship (Cardinality)

The degree of relationship (also known as cardinality) is the number of occurrences in one
entity which are associated (or linked) to the number of occurrences in another.

There are three degrees of relationship, known as:

13
1. one-to-one (1:1)
2. one-to-many (1:M)
3. many-to-many (M:N)

The latter one is correct, it is M:N and not M:M.

One-to-one (1:1)

This is where one occurrence of an entity relates to only one occurrence in another entity.

A one-to-one relationship rarely exists in practice, but it can. However, you may consider
combining them into one entity.

For example, an employee is allocated a company car, which can only be driven by that
employee.

Therefore, there is a one-to-one relationship between employee and company car.

One-to-Many (1:M)

Is where one occurrence in an entity relates to many occurrences in another entity.

For example, taking the employee and department entities shown on the previous page, an
employee works in one department but a department has many employees.

Therefore, there is a one-to-many relationship between department and employee.

14
Many-to-Many (M:N)

This is where many occurrences in an entity relate to many occurrences in another entity.

The normalisation process discussed earlier would prevent any such relationships but the
definition is included here for completeness.

As with one-to-one relationships, many-to-many relationships rarely exist. Normally they occur
because an entity has been missed.

For example, an employee may work on several projects at the same time and a project has a
team of many employees.

Therefore, there is a many-to-many relationship between employee and project.

However, in the normalisation process this many-to-many is resolved by the entity Project Team.

Optional Relationships

A relationship may also be optional. Either end of the relationship can include zero occurrences
as an option. This is defined by the business rules of the system being implemented.

Taking the three examples above, the business rules may allow for the following.

 Not all employees are allocated a company car.


 A car is defined as a pool car and not allocated to a specific employee.
 A new department is created but, as yet, there are no employees working within it.
 A new project is defined but as yet the team has not been established.
 A new employee starts within the company but, as yet, is not assigned to a project.

Taking the first business rule, graphically this can be shown as:

15
The circle (O) represents optionality. This will be discussed further later in this section.

Recursive Relationships

In the examples above, a relationship has always been between occurrences in two different
entities. However, it is possible for the same entity to participate in the relationship. This is
termed a recursive relationship.

Let us take the example of an employee who is also a manager. But a manager is also an
employee, whose details will be held in the employee entity. To implement this a foreign key of
the employee's manager number would be held in each employee record.

Employee entity

Employee no

Employee surname

Employee forename

Employee DOB

Employee NI number

Manager no * (this is the employee no of the employee's manager)

Graphically, this can be shown as:

This is commonly know as a 'pig's ear'.

Another, more complicated example (don't worry, you will not be assessed on this) is showing
the relationship between a person and their parents. This can be represented graphically by:

16
Constructing the Data Model

The following describes an overview of the method to construct a data model. It is not
exhaustive but it includes sufficient detail to create a basic data model.

You should be aware that the following symbols are not the only ones that may be used when
creating a data model. Different methodologies have adopted different symbols. Therefore,
should you be involved in data analysis in the future, you need to use the methodology (and
symbols) assumed by the company for whom you are employed.

First, let us look at the symbols used within a data model. You have come across them already in
previous examples.

An entity is represented by a rectangle with the name of the entity inside. The convention is to
use a singular name.

Relationships are represented by a line joining the two entities. A crow's foot (see below)
represents the many sides of a relationship.

17
The Basic Data Model

On a piece of paper, draw and name each entity.

Taking our Project Management system, normalised earlier, as an example, we have the
following entities:

It does not matter for now in which order they are placed on the paper. However, later you may
wish to rearrange them if you have crossing relationship lines.

18
Examine each pair of entities in turn and determine whether there is a relationship between them.
This is easily seen by looking for foreign keys.

Remember, a foreign key in one entity enables a link (relationship) with the primary key in
another entity. Further, primary and foreign keys may consist of more than one attribute. Also,
some of the foreign keys may be part of a primary key.

The entities of the Project Management system are:

Employee (Employee No, Employee Name, Department No *)

Department (Department No, Department Name)

Project (Project Code, Project Title, Project Manager, Project Budget)

Project Team (Project Code, Employee No, Hourly Rate)

The foreign keys are:

Foreign Key Found In Links To


Department No Employee Department
Project Code Project Team Project
Employee No Project Team Employee

Degrees of Relationship

The degrees of the relationships (cardinality) are then added to the data model. Remember to
show any optional relationships.

19
Finally, you need to name the relationships and, optionally show the primary and foreign keys.

In relational database design, we not only want to create a structure that stores all of the data, but
we also want to do it in a way that minimize potential errors when we work with the data. The
default language for accessing data from a relational database is SQL. In particular, SQL can be
used to manipulate data in the following ways: insert new data, delete unwanted data,
and update existing data. Similarly, in an un-normalized design, there are 3 problems that can
occur when we work with the data:

20
INSERT ANOMALY: This refers to the situation when it is impossible to insert certain types of
data into the database.

DELETE ANOMALY: The deletion of data leads to unintended loss of additional data, data
that we had wished to preserve.

UPDATE ANOMALY: This refers to the situation where updating the value of a column leads
to database inconsistencies (i.e., different rows on the table have different values).

To address the 3 problems above, we go through the process of normalization. When we go


through the normalization process, we increase the number of tables in the database, while
decreasing the amount of data stored in each table. There are several different levels of database
normalization:

 1st Normal Form (1NF)


 2nd Normal Form (2NF)
 3rd Normal Form (3NF)
 Bryce-Codd Normal Form (BCNF)
 4th Normal Form (4NF)
 5th Normal Form (5NF)

1st Normal Form Definition

A database is in first normal form if it satisfies the following conditions:

 Contains only atomic values


 There are no repeating groups

An atomic value is a value that cannot be divided. For example, in the table shown below, the
values in the [Color] column in the first row can be divided into "red" and "green", hence
[TABLE_PRODUCT] is not in 1NF.

A repeating group means that a table contains two or more columns that are closely related. For
example, a table that records data on a book and its author(s) with the following columns: [Book
ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and
[Author 3] are all repeating the same attribute.

1st Normal Form Example

How do we bring an unnormalized table into first normal form? Consider the following example:

21
This table is not in first normal form because the [Color] column can contain multiple values.
For example, the first row includes values "red" and "green."

To bring this table to first normal form, we split the table into two tables and now we have the
resulting tables:

Now first normal form is satisfied, as the columns on each table all hold just one value.

2nd Normal Form Definition

A database is in second normal form if it satisfies the following conditions:

 It is in first normal form


 All non-key attributes are fully functional dependent on the primary key

In a table, if attribute B is functionally dependent on A, but is not functionally dependent on a


proper subset of A, then B is considered fully functional dependent on A. Hence, in a 2NF table,
all non-key attributes cannot be dependent on a subset of the primary key. Note that if the
primary key is not a composite key, all non-key attributes are always fully functional dependent
on the primary key. A table that is in 1st normal form and contains only a single key as the
primary key is automatically in 2nd normal form.

22
2nd Normal Form Example

Consider the following example:

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only
part of the primary key. Therefore, this table does not satisfy second normal form.

To bring this table to second normal form, we break the table into two tables, and now we have
the following:

What we have done is to remove the partial functional dependency that we initially had. Now, in
the table [TABLE_STORE], the column [Purchase Location] is fully dependent on the primary
key of that table, which is [Store ID].

3rd Normal Form Definition

A database is in third normal form if it satisfies the following conditions:

 It is in second normal form


 There is no transitive functional dependency

23
By transitive functional dependency, we mean we have the following relationships in the table:
A is functionally dependent on B, and B is functionally dependent on C. In this case, C is
transitively dependent on A via B.

3rd Normal Form Example

Consider the following example:

In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional
dependency, and this structure does not satisfy third normal form.

To bring this table to third normal form, we split the table into two as follows:

Now all non-key attributes are fully functional dependent only on the primary key. In
[TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In
[TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].

24

You might also like