Relational-Database Design
Anomalies in DBMS are caused by poor management of storing everything in
the flat database, lack of normalization, data redundancy, and improper use of
primary or foreign keys. These issues result in inconsistencies during insert,
update, or delete operations, leading to data integrity problems.
Anomalies in DBMS
1. Insertion Anomaly
An insertion anomaly occurs when it is not possible to insert data into a
database because some required information is missing or incomplete.
Example:
If a database requires every record to have a primary key, and no value is
provided for that key, the record cannot be inserted into the table.
2. Deletion Anomaly
A deletion anomaly occurs when deleting a record unintentionally results in the
loss of other important data.
Example:
If a database contains information about customers and their orders, deleting a
customer record may also delete all the orders associated with that customer,
leading to loss of valuable information.
3. Update Anomaly
An update anomaly occurs when modifying data in a database leads to
inconsistencies because the same data is stored in multiple places.
Example:
If a database contains employee salary information in multiple records, and the
salary is updated in one record but not in others, it may result in incorrect
calculations and reporting.
Note: These anomalies can be removed with the process of Normalization,
which generally splits the database which results in reducing the anomalies in
the database.
Example:
STUDENT Table:
STUD_N STUD_NA STUD_PHO STUD_STA STUD- STUD_AG
O ME NE TE COUNTRY E
Los
1 John 2139716271 California 20
Angeles
2 Robert 7139898291 Houston Texas 19
3 Jack 6027898291 Phoenix Arizona 18
Washingt
4 David 2069816543 Seattle 21
on
STUDENT_COURSE:
STUD_NO COURSE_NO COURSE_NAME
1 C1 DBMS
2 C2 Computer Networks
1 C2 Computer Networks
Note: Deleting the record would violate the foreign key constraint, which
ensures data consistency between the two tables.
Removal of Anomalies
Anomalies in DBMS are removed using normalization, which organizes data
into well-structured tables to reduce redundancy and maintain data integrity.
According to E. F. Codd, normalization:
Eliminates repeated data
Removes insertion, deletion, and update anomalies
Establishes proper relationships between tables
Note: After normalization, the database becomes more structured, reducing
the likelihood of insertion, update and deletion anomalies.
Normal Forms in DBMS
Normal forms are a set of progressive rules (or design checkpoints) for
relational schemas that reduce redundancy and prevent data anomalies. Each
normal form ( 1NF, 2NF, 3NF, BCNF, 4NF, 5NF ) is stricter than the previous one:
meeting a higher normal form implies the lower ones are satisfied. Think of
them as layers of cleanliness for your tables: the deeper you go, the fewer
redundancy and integrity problems you’ll have.
Benefits of using Normal Forms
Reduce duplicate data and wasted storage.
Prevent insert, update, and delete anomalies.
Improve data consistency and integrity.
Make the schema easier to maintain and evolve.
Hierarchy of Database Normal Forms
The above diagram shows the hierarchy of database normal forms. Each inner
circle represents a stricter level of normalization, starting from 1NF (basic
structure) to 5NF (most refined). As you move inward, data redundancy
reduces, and data integrity improves. Each level builds upon the previous one
to ensure a cleaner and more efficient database design.
First Normal Form (1NF)
First Normal Form (1NF) ensures that the structure of a database table is
organized in a way that makes it easier to manage and query.
A relation is in first normal form if every attribute in that relation is
single-valued attribute or it does not contain any composite or multi-
valued attribute.
It is the first and essential step in reduce redundancy, improve data
integrity and reducing anomalies in relational database design.
A relation (table) is said to be in First Normal Form (1NF) if:
All the attributes (columns) contain only atomic (indivisible) values.
Each column contains values of a single type.
Each record (row) is unique, meaning it can be identified by a primary
key.
There are no repeating groups or arrays in any row.
Rules for First Normal Form (1NF) in DBMS
To follow the First Normal Form (1NF) in a database, these simple rules must be
followed:
ADVERTISING
Every Column Should Have Single Values
Each column in a table must contain only one value in a cell. No cell should
hold multiple values. If a cell contains more than one value, the table does not
follow 1NF.
Example: A table with columns like [Writer 1], [Writer 2], and [Writer 3]
for the same book ID is not in 1NF because it repeats the same type of
information (writers). Instead, all writers should be listed in separate
rows.
All Values in a Column Should Be of the Same Type
Each column must store the same type of data. You cannot mix different types
of information in the same column.
Example: If a column is meant for dates of birth (DOB), you cannot use it
to store names. Each type of information should have its own column.
Every Column Must Have a Unique Name
Each column in the table must have a unique name. This avoids confusion
when retrieving, updating, or adding data.
Example: If two columns have the same name, the database system may
not know which one to use.
The Order of Data Doesn’t Matter
In 1NF, the order in which data is stored in a table doesn’t affect how the table
works. You can organize the rows in any way without breaking the rules.
Example:
Consider the below COURSES Relation :
In the above table, Courses has a multi-valued attribute, so it is not in 1NF. To
make the table in 1NF we have to remove the multivalued attributes from the
table as given below:
1NF
Now the table is in 1NF as there is no multi-valued attribute present in the
table.
Second Normal Form (2NF)
Second Normal Form (2NF) is based on the concept of fully functional
dependency. It is a way to organize a database table so that it
reduces redundancy and ensures data consistency. Fully Functional
Dependency means a non-key attribute depends on the entire primary key, not
just part of it.
For a table to be in 2NF, it must first meet the following requirements
1. Meet 1NF Requirements: The table must first satisfy First Normal Form
(1NF), meaning:
ADVERTISING
ADVERTISING
All columns contain single, indivisible values.
No repeating groups of columns.
2. Eliminate Partial Dependencies: A partial dependency occurs when a non-
prime attribute (not part of the candidate key) depends only on a part of a
composite primary key, rather than the entire key.
By ensuring these steps, a table in 2NF is more efficient and less prone to errors
during updates, inserts, and deletes.
What is Partial Dependency?
The FD (functional dependency) A->B happens to be a partial dependency if B
is functionally dependent on A, and also B can be determined by any other
proper subset of A.
In other words, if you have a composite key (a primary key made up of more
than one attribute), and an attribute depends on only a subset of that
composite key, rather than the entire key, that is considered a partial
dependency.
A partial dependency would occur whenever a non-prime attribute depends
functionally on a part of the given candidate key.
Example:
StaffBranch Relation
In the given relation StaffBranch, we have the functional dependency:
staffNo, sName → branchNo.
This means that the combination of staffNo and sName determines branchNo.
BranchNo is also functionally dependent on a subset of the composite key,
specifically staffNo. This means that branchNo can be determined by just
staffNo.
staffNo → branchNo
This is a partial dependency because branchNo depends on only a part of the
composite key (staffNo, sName), not the entire key.
Example of Second Normal Form (2NF)
Consider a table storing information about students, courses, and their fees:
There are many courses having the same course fee. Here, COURSE_FEE
cannot alone decide the value of COURSE_NO or STUD_NO.
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO.
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO.
The candidate key for this table is {STUD_NO, COURSE_NO} because the
combination of these two columns uniquely identifies each row in the
table.
COURSE_FEE is a non-prime attribute because it is not part of the
candidate key {STUD_NO, COURSE_NO}.
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key.
Therefore, Non-prime attribute COURSE_FEE is dependent on a proper
subset of the candidate key, which is a partial dependency and so this
relation is not in 2NF.
In 2NF, we eliminate such dependencies by breaking the table into two
separate tables:
1. A table that links students and courses.
2. A table that stores course fees.
Now, each table is in 2NF:
The Course Table ensures that COURSE_FEE depends only
on COURSE_NO.
The Student-Course Table ensures there are no partial dependencies
because it only relates students to courses.
Now, the COURSE_FEE is no longer repeated in every row, and each table is
free from partial dependencies. This makes the database more
efficient and easier to maintain.
Violation of 2nf
Limitations of Second Normal Form (2NF)
While Second Normal Form (2NF) addresses partial dependencies and helps
reduce redundancy, it has some limitations:
1. Doesn't Handle Transitive Dependencies: 2NF ensures that non-prime
attributes are fully dependent on the entire primary key, but it doesn't address
transitive dependencies. In a transitive dependency, an attribute depends on
another non-key attribute.
For example, if A → B and B → C, then A indirectly determines C. This can lead
to further redundancy and anomalies.
2. Doesn't Ensure Optimization: Although 2NF eliminates partial
dependencies, it may still leave some redundancy in the data, particularly
when dealing with larger and more complex datasets. It doesn’t guarantee the
most efficient or optimized structure for a database.
3. Complexity in Handling Multi-Attribute Keys: When dealing with composite
primary keys (keys made of multiple attributes), ensuring full dependency can
still lead to a complex design. A further step of normalization (Third Normal
Form or 3NF) is required to resolve transitive dependencies and achieve better
data organization.
4. Not Sufficient for Some Use Cases: While 2NF is useful for reducing
redundancy in some situations, in real-world applications where data integrity
and efficiency are crucial, additional normalization (like 3NF) might be needed
to address more complex dependencies and optimize data storage and
retrieval.
Third Normal Form (3NF)
The Third Normal Form (3NF) builds on the First (1NF) and Second (2NF)
Normal Forms. Achieving 3NF ensures that the database structure is free of
transitive dependencies, reducing the chances of data anomalies. Even though
tables in 2NF have reduced redundancy compared to 1NF, they may still
encounter issues like update anomalies.
A relation is in Third Normal Form (3NF) if it satisfies the following two
conditions:
1. It is in Second Normal Form (2NF): This means the table has no partial
dependencies (i.e., no non-prime attribute is dependent on a part of a
candidate key).
2. There is no transitive dependency for non-prime attributes: In simpler
terms, no non-key attribute should depend on another non-key
attribute. Instead, all non-key attributes should depend directly on the
primary key.
Understanding Transitive Dependency
To fully grasp 3NF, it’s essential to understand transitive dependency.
A transitive dependency occurs when one non-prime attribute depends on
another non-prime attribute rather than depending directly on the primary key.
This can create redundancy and inconsistencies in the database.
For example, if we have the following relationship between attributes:
A -> B (A determines B)
B -> C (B determines C)
This means that A indirectly determines C through B, creating a transitive
dependency. 3NF eliminates these transitive dependencies to ensure that non-
key attributes are directly dependent only on the primary key.
Conditions for a Table to be in 3NF
A table is in Third Normal Form (3NF) if, for every non-trivial functional
dependency X→Y, at least one of the following holds:
X is a superkey: This means that the attribute(s) on the left-hand side of
the functional dependency (X) must be a superkey (a key that uniquely
identifies a tuple in the table).
Y is a prime attribute: This means that every element of the attribute set
Y must be part of a candidate key (i.e., a prime attribute).
Example 1: Third Normal Form (3NF)
Consider the following relation for a Candidate table with the following
attributes and functional dependencies:
1. Functional dependency Set:
The set of functional dependencies is as follows:
CAND_NO → CAND_NAME
CAND_NO → CAND_STATE
CAND_STATE → CAND_COUNTRY
CAND_NO → CAND_AGE
2. Determining the Candidate Key:
The candidate key for this relation is {CAND_NO}, since CAND_NO uniquely
identifies all other attributes in the table.
3. Identifying Transitive Dependency:
The issue here arises from the transitive
dependency between CAND_NO and CAND_COUNTRY:
CAND_NO → CAND_STATE
CAND_STATE → CAND_COUNTRY
This means that CAND_COUNTRY is transitively dependent
on CAND_NO via CAND_STATE, which violates the Third Normal Form
(3NF) rule that states that no non-prime attribute (non-key attribute) should be
transitively dependent on the primary key.
Converting the Relation into 3NF
To remove the transitive dependency and ensure the relation is in 3NF, we
decompose the original CANDIDATE relation into two separate relations:
1. CANDIDATE: This will store information about the candidates, including
their CAND_NO, CAND_NAME, CAND_STATE, and CAND_AGE:\
text{CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)}
2. STATE_COUNTRY: This relation will store information about the states
and their respective countries:\text{STATE_COUNTRY (CAND_STATE,
CAND_COUNTRY)}
Final Decomposed Relations:
1. CANDIDATE (CAND_NO, CAND_NAME, CAND_STATE, CAND_AGE)
2. STATE_COUNTRY (CAND_STATE, CAND_COUNTRY)
Why This Decomposition Works:
The CANDIDATE relation now no longer has a transitive
dependency. CAND_STATE no longer
determines CAND_COUNTRY within this relation.
The STATE_COUNTRY relation handles the CAND_STATE →
CAND_COUNTRY dependency separately, ensuring that all data is now
organized in a way that satisfies 3NF.
Why is 3NF Important?
1. Eliminates Redundancy: 3NF helps to remove unnecessary duplication of
data by ensuring that non-prime attributes (attributes not part of any
candidate key) depend directly on the primary key, not on other non-prime
attributes.
2. Prevents Anomalies: A table in 3NF is free from common anomalies such as:
Insertion Anomaly: The inability to insert data without having to insert
unwanted or redundant data.
Update Anomaly: The need to update multiple rows of data when a
change occurs in one place.
Deletion Anomaly: The unintended loss of data when a record is
deleted.
3. Preserves Functional Dependencies: 3NF ensures that all functional
dependencies are preserved, meaning that the relationships between
attributes are maintained.
4. Lossless Decomposition: When decomposing a relation to achieve 3NF, the
decomposition should be lossless, meaning no information is lost in the
process of normalization.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is an advanced version of 3NF used to reduce
redundancy in databases. It ensures that for every functional dependency, the
left side must be a superkey. This helps create cleaner and more consistent
database designs, especially when there are multiple candidate keys.
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD)
X−>Y in a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and
make sure that they are candidate keys.
Key Notes:
1. To verify BCNF, identify all determinants (left side of FDs) and check whether
each is a candidate key.
2. If a relation is in BCNF, it is automatically in 3NF, 2NF, and 1NF as well.
The normal forms become stricter as we move from 1NF → 2NF → 3NF →
BCNF:
1NF: Each field must hold atomic (indivisible) values.
2NF: No partial dependency on a primary key.
3NF: No transitive dependency on a primary key.
BCNF: Every determinant must be a candidate key.
This progression ensures better structure and removes redundancy at each
level.
Why Do We Need BCNF?
2NF and 3NF may allow anomalies if a functional dependency exists
where the determinant is not a superkey.
BCNF handles edge cases where 3NF fails to remove all redundancy,
especially in tables with multiple candidate keys.
Prevents update, insert, and delete anomalies by ensuring every
determinant is a superkey.
Makes database design more robust and easier to maintain over time.
Improves data consistency and clarity by removing hidden or indirect
dependencies.
We are going to discuss some basic examples which let you understand the
properties of BCNF. We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) ->
subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive dependency). A
relation R is in BCNF if for every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher->subject),
teacher is not a key. This relation suffers with anomalies −
For example, if we delete the student Tahira , we will also lose the
information that [Link] teaches C. This issue occurs because the
teacher is a determinant but not a candidate key.
R is divided into two relations R1(Teacher, Subject) and R2(Student, Teacher).