0% found this document useful (0 votes)
125 views39 pages

DBMS Schema Refinement & Normalization Guide

Uploaded by

sksasireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views39 pages

DBMS Schema Refinement & Normalization Guide

Uploaded by

sksasireddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DBMS (R23) UNIT-4

UNIT IV: Schema Refinement (Normalization)

1. Purpose of Normalization or schema refinement


2. concept offunctionaldependency
3. Lossless join and dependency preserving decomposition
4. normal forms based on functionaldependency - (1NF, 2NF and 3 NF),
5. concept of surrogate key
6. Boyce-Codd normal form(BCNF)
7. MVD
8. Fourth normal form(4NF)
9. Fifth Normal Form (5NF).

1
DBMS (R23) UNIT-4

CONCEPT-1:

WHAT IS SCHEMA REFINEMENT IN DBMS:

Schema refinement in a Database Management System (DBMS) refers to the process of


improving or optimizing the design of a database schema. The goal is to make the schema more
efficient, consistent, and easy to maintain while avoiding redundancy, ensuring data integrity, and
reducing the complexity of operations.

In DBMS, a schema defines the structure of the database, including tables, attributes, relationships,
and constraints. Schema refinement typically involves activities such as:

1. Normalization: The process of organizing data to reduce redundancy and improve data
integrity. This involves dividing large tables into smaller, more manageable tables and
defining relationships between them using keys (like primary and foreign keys).
Normalization helps eliminate anomalies like insertion, update, and deletion anomalies.
2. Eliminating Redundancy: Redundant data, which can lead to unnecessary storage use and
potential inconsistencies, is removed through normalization or restructuring of the schema.
3. Optimizing Queries: Refining the schema may involve reorganizing data or creating
indexes and views to make query execution more efficient.
4. Ensuring Data Integrity: Constraints such as primary keys, foreign keys, unique keys, and
check constraints are applied to ensure that data entered into the database is accurate,
consistent, and follows the business rules.
5. Improving Maintainability: Simplifying the schema, making it easier to add new features
or modify existing ones, helps ensure long-term maintainability. This can involve making
the schema modular and easy to understand.

Normalization in DBMS:

Normalization is the process of organizing data in a database to minimize redundancy and


dependency. The main goal of normalization is to ensure that the data is stored efficiently and to
reduce the chances of data anomalies like update, insertion, and deletion anomalies.

Purpose of Normalization:

1. Eliminate Data Redundancy: By organizing data into smaller tables, normalization


removes repeated data. This reduces storage space and avoids the unnecessary duplication of
information.
2. Ensure Data Integrity: Normalization helps maintain data consistency and ensures that
updates, deletions, and insertions are done correctly without causing inconsistencies in the
database. For example, if you store an address in multiple places, it can lead to
inconsistencies if one instance is updated and another isn't.
3. Simplify Data Maintenance: A normalized database is easier to maintain. For example, if
data needs to be changed or deleted, there’s only one place where it needs to be updated,
reducing the chances of errors.
4. Reduce Anomalies: Normalization reduces the chances of several types of anomalies, such
as:
o Update Anomaly: If the same data is repeated in multiple places, updating one
instance may not update the others.

2
DBMS (R23) UNIT-4

o Insertion Anomaly: Difficulties arise when adding new data due to the structure of
the database.
o Deletion Anomaly: Deleting one piece of data may accidentally remove other
important data.

5. Improve Query Performance: In some cases, normalized data can be more efficient for
certain types of queries, especially if it results in smaller and more focused tables.

Normal Forms in Normalization:

There are several levels or normal forms (NF) used in normalization, each one addressing a
specific kind of redundancy or anomaly. The most common normal forms are:

1. First Normal Form (1NF):


o Ensures that the table has no repeating groups and that each field contains only
atomic (indivisible) values.
o No column should have multiple values or sets of values (like a list or array in one
field).

2. Second Normal Form (2NF):


o Achieved when a table is in 1NF and all non-key attributes are fully dependent on
the primary key (i.e., no partial dependency).
o Partial dependency occurs when a non-key column depends only on a part of the
primary key (in the case of composite primary keys).

3. Third Normal Form (3NF):


o Achieved when a table is in 2NF and there is no transitive dependency (i.e., non-key
attributes should not depend on other non-key attributes).
o Ensures that every non-key attribute is directly dependent on the primary key.

4. Boyce-Codd Normal Form (BCNF):


o A stricter version of 3NF. It ensures that every determinant (attribute that determines
other attributes) is a candidate key.
o BCNF is applied when there are exceptions in 3NF where a non-candidate key
determines another attribute.

5. Fourth Normal Form (4NF):


o Achieved when a table is in BCNF and has no multi-valued dependencies. This
eliminates situations where a single column has multiple independent values.

6. Fifth Normal Form (5NF):


o Ensures that a table is in 4NF and does not have any join dependency or redundancy.

CONCEPT-2:

3
DBMS (R23) UNIT-4

Concept of Functional Dependency in DBMS

Functional Dependency (FD) is a relationship between two sets of attributes in a database. It


describes how the value of one attribute (or a set of attributes) determines the value of another
attribute (or set of attributes). In simpler terms, if you know the value of one attribute (or a
combination of attributes), you can determine the value of another attribute.

Formal Definition:

For two sets of attributes X and Y in a relation, X → Y (read as "X determines Y") is a functional
dependency if, for every pair of tuples (rows) in the relation, whenever two tuples have the same
value for attributes in X, they must have the same value for attributes in Y.

 X is called the determinant.


 Y is called the dependent.

Key Points:

 X → Y means that if two rows have the same value for attributes in X, they must also have
the same value for attributes in Y.
 A functional dependency does not necessarily imply uniqueness, but rather a relationship
between attributes.
 FDs are essential in normalization as they help identify how to break down relations to
eliminate redundancy and dependency.

Benefits of Functional Dependency in DBMS

4
DBMS (R23) UNIT-4

Functional dependency in a database management system offers several advantages for businesses
and organizations:
 Prevents Duplicate Data:
Functional dependency helps avoid storing the same data repeatedly in the database, reducing
redundancy and saving storage space.
 Improves Data Quality and Accuracy:
By organizing data efficiently and minimizing duplication, functional dependency ensures the
data is reliable, consistent, and of high quality.
 Reduces Errors:
Keeping data organized and concise lowers the chances of errors in records or datasets,
making it easier to manage and update information.
 Saves Time and Costs:
Properly organized data allows for quicker and easier access, improving productivity and
reducing the time and cost of managing information.
 Defines Rules and Behaviors:
Functional dependency allows setting rules and constraints that control how data is stored,
accessed, and maintained, ensuring better data management.
 Helps Identify Poor Database Design:
It highlights issues like scattered or missing data across tables, helping identify and fix design
flaws to maintain consistency and integrity.

Types of Functional Dependencies:

1. Trivial Functional Dependency: A functional dependency X → Y is trivial if Y is a subset


of X. For example, {StudentID, StudentName} → {StudentID} is trivial because
StudentID is part of the determinant.
2. Non-Trivial Functional Dependency: A functional dependency X → Y is non-trivial if Y
is not a subset of X. For example, StudentID → StudentName is non-trivial because
StudentName is not part of the determinant StudentID.
3. Transitive Functional Dependency: A functional dependency X → Z is called transitive if
there are intermediate dependencies like X → Y and Y → Z. For example, if StudentID →
StudentName and StudentName → StudentAddress, then StudentID →
StudentAddress is a transitive dependency.

Armstrong's Axioms of Functional Dependency Property


In order to think about FDs, William Armstrong created the Armstrong's Axioms property in 1974.
This characteristic provides the rules that apply only in the event that the following conditions are
met:

1. Augmentation:

This rule of Armstrong's axioms state that, If S -> T, then we can say SR -> TR.

2. Reflexivity:

This rule of Armstrong's axioms state that, If S -> T, then T is the subset of S.

5
DBMS (R23) UNIT-4

3. Transitivity:

This rule of Armstrong's axioms state that, If S -> T and T -> U, then we can say S -> U.

1. Trivial Functional Dependency


In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X →
Y and Y is the subset of X, then it is called trivial functional dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
 ABC -> AB
 ABC -> A
 ABC -> ABC
Example 2:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of
trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example 1 :
 Id -> Name
 Name -> DOB
Example 2:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

6
DBMS (R23) UNIT-4

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}

3. Semi Non Trivial Functional Dependencies


A semi non-trivial functional dependency occurs when part of the dependent attribute (right-hand
side) is included in the determinant (left-hand side), but not all of it. This is a middle ground
between trivial and non-trivial functional dependencies. X -> Y is called semi non-trivial when X
intersect Y is not NULL.
Example:
Consider the following table:
Student_ID Course_ID Course_Name

101 CSE101 Computer Science

102 CSE102 Data Structures

103 CSE101 Computer Science

Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
 Part of the dependent attribute (Course_ID) is already included in the determinant
({Student_ID, Course_ID}).
 However, the dependency is not completely trivial because {StudentID}→CourseID is not
implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
Example:
bike_model manuf_year color

tu1001 2007 Black

tu1001 2007 Red

tu2012 2008 Black

tu2012 2008 Red

7
DBMS (R23) UNIT-4

bike_model manuf_year color

tu2222 2009 Black

tu2222 2009 Red

In this table:
 X: bike_model
 Y: color
 Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year does not
depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS .
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a →
b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.
Example:
enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity,
enrol_no → building_no is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and
X->Z which states that those dependencies are fully functional.
Read more about Fully Functional Dependency.

8
DBMS (R23) UNIT-4

7. Partial Functional Dependency


In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the composite
key and Z is non key attribute. Then X->Z is a partial functional dependency in RBDMS.

CONCEPT-3:

Lossless Join and Dependency-Preserving Decomposition in DBMS

In relational database design, decomposition refers to breaking down a large relation (table) into
smaller relations (tables) to achieve better data organization, minimize redundancy, and improve
normalization. However, decomposing a relation needs to be done carefully to ensure that:

1. Lossless Join: No information is lost during the decomposition, and the original relation can
be perfectly reconstructed by joining the decomposed relations.
2. Dependency-Preserving: All functional dependencies (FDs) are preserved, meaning no
dependency is lost when relations are decomposed.

1. Lossless Join Decomposition

A decomposition of a relation into two or more sub-relations is lossless join if it is possible to


recombine (join) these sub-relations to get back the original relation without any loss of
information.

Lossless Join Condition:

A decomposition of a relation R into two sub-relations R1 and R2 is lossless if and only if:

(R1 ∩ R2) → R1 or (R1 ∩ R2) → R2

In simple terms:

 The common attributes between R1 and R2 (i.e., R1 ∩ R2) should be a superkey in at least
one of the sub-relations. This ensures that when you join R1 and R2, you can uniquely
identify tuples without ambiguity, preserving the integrity of the data.

Example of Lossless Join Decomposition:

Consider a relation R with the attributes StudentID, Course, and Instructor.

StudentID Course Instructor

101 DBMS Dr. Smith

101 OS Dr. Lee

102 DBMS Dr. Smith

9
DBMS (R23) UNIT-4

Let's decompose this relation into two sub-relations:

1. R1 (StudentID, Course)
2. R2 (Course, Instructor)

Now, the common attribute between R1 and R2 is Course.

 R1 ∩ R2 = {Course}, which is a superkey in R2 (because Course uniquely identifies


Instructor in R2).

Since the intersection attribute is a superkey in one of the relations, this decomposition is lossless.
By joining R1 and R2 on the Course attribute, we can perfectly reconstruct the original relation R
without any loss of data.

2. Dependency-Preserving Decomposition

A decomposition is dependency-preserving if the set of functional dependencies (FDs) that hold in


the original relation are also enforceable in the decomposed relations, without needing to join the
relations.

 If the decomposition is dependency-preserving, then you do not need to recombine the


sub-relations to enforce the original functional dependencies.

Dependency-Preserving Condition:

A decomposition is dependency-preserving if the functional dependencies that hold on the original


relation can be represented as functional dependencies on the decomposed relations without losing
any of them.

Example of Dependency-Preserving Decomposition:

Consider the relation R with attributes StudentID, Course, and Instructor. Suppose the functional
dependencies are:

1. StudentID → Course
2. Course → Instructor

Now, let's decompose R into two sub-relations:

1. R1 (StudentID, Course)
2. R2 (Course, Instructor)

For R1, the FD StudentID → Course holds, and for R2, the FD Course → Instructor holds.

Thus, all functional dependencies in the original relation R are preserved in the decomposed
relations R1 and R2.

10
DBMS (R23) UNIT-4

This decomposition is dependency-preserving because all the functional dependencies from the
original relation are satisfied by the decomposed relations.

Lossless Join and Dependency-Preserving Decomposition:

In an ideal scenario, when performing a decomposition, you want it to be both lossless join and
dependency-preserving. However, it's not always possible to achieve both at the same time.

Example Where Lossless Join and Dependency-Preserving Are Not Both Achieved:

Consider the relation R with the attributes StudentID, Course, Instructor, and InstructorPhone:

StudentID Course Instructor InstructorPhone

101 DBMS Dr. Smith 123-456-7890

102 DBMS Dr. Lee 234-567-8901

Functional dependencies:

1. StudentID → Course
2. Course → Instructor
3. Instructor → InstructorPhone

We decompose this relation into two sub-relations:

1. R1 (StudentID, Course)
2. R2 (Course, Instructor, InstructorPhone)

 This decomposition is lossless join, as Course is a superkey in R2.


 However, this decomposition is not dependency-preserving, because the dependency
Instructor → InstructorPhone cannot be directly enforced in R1, and we would need to
join R1 and R2 to enforce this dependency.

Thus, achieving both lossless join and dependency-preserving is not always possible, especially in
certain complex cases. In those cases, you may prioritize one property (e.g., lossless join) over the
other (e.g., dependency-preserving).

Key Differences Between Lossless Join and Dependency-Preserving Decomposition:


Property Lossless Join Dependency-Preserving

Ensures that all original functional


Ensures no loss of information during
Objective dependencies can be enforced without needing
decomposition.
to join decomposed relations.

11
DBMS (R23) UNIT-4

Property Lossless Join Dependency-Preserving

At least one of the common attributes


All functional dependencies should be
Condition in the decomposed relations must be a
enforceable in the decomposed relations.
superkey.

Important to ensure that data integrity (through


When Required to avoid losing information
functional dependencies) is maintained without
Important after decomposing a relation.
needing to join relations.

NOTE:

 Lossless Join Decomposition ensures that no data is lost during the decomposition, and you
can always recombine the decomposed relations to recover the original relation.
 Dependency-Preserving Decomposition ensures that the functional dependencies in the
original relation are still enforceable in the decomposed relations, so you don't have to join
them to enforce those dependencies.

CONCEPT-4:

Normal Forms Based on Functional Dependency in DBMS

In database management, normal forms are used to organize a database schema in such a way that it
minimizes redundancy, avoids undesirable characteristics like anomalies, and ensures data integrity.
Functional dependencies (FDs) play a central role in defining these normal forms. Let's go
through the most commonly used normal forms based on functional dependency: 1NF, 2NF,
3NF, and BCNF.

1. First Normal Form (1NF)

A relation is in 1NF if:

1. The domain of each attribute contains only atomic (indivisible) values.


2. Each column contains a unique value for each row (no repeating groups or arrays).

Example:

Consider a table that stores information about students and their courses:

StudentID StudentName Courses

101 John Doe DBMS, Networking

102 Alice Johnson DBMS, OS

12
DBMS (R23) UNIT-4

This table violates 1NF because the Courses column contains multiple values (i.e., multiple courses
for a student) in a single attribute. To bring this table to 1NF, we must split the values in the
Courses column into separate rows:

StudentID StudentName Course

101 John Doe DBMS

101 John Doe Networking

102 Alice Johnson DBMS

102 Alice Johnson OS

Now, the table is in 1NF, with atomic values in every column.

2. Second Normal Form (2NF)

A relation is in 2NF if:

1. It is in 1NF.
2. It does not have any partial dependency. That is, all non-prime attributes (attributes that
are not part of a candidate key) are fully functionally dependent on the entire primary key.

Partial dependency occurs when a non-prime attribute depends only on a part of a composite
primary key.

Example:

Consider the following table with a composite primary key consisting of StudentID and CourseID:

StudentID CourseID Instructor InstructorPhone

101 CS101 Dr. Smith 123-456-7890

101 CS102 Dr. Lee 123-456-7890

102 CS101 Dr. Smith 234-567-8901

In this case, the primary key is the combination of StudentID and CourseID. But the Instructor and
InstructorPhone depend only on CourseID, not on the entire primary key. This is a partial
dependency.

To convert this to 2NF, we split the relation into two tables:

Student-Course Table:

13
DBMS (R23) UNIT-4

StudentID CourseID

101 CS101

101 CS102

102 CS101

Course-Details Table:

CourseID Instructor InstructorPhone

CS101 Dr. Smith 123-456-7890

CS102 Dr. Lee 123-456-7890

Now, the table is in 2NF, as there are no partial dependencies.

3. Third Normal Form (3NF)

A relation is in 3NF if:

1. It is in 2NF.
2. It has no transitive dependency. That is, non-prime attributes should not depend on other
non-prime attributes.

Transitive dependency occurs when an attribute depends on another attribute, which in turn
depends on the primary key.

Example:

Consider the following table:

StudentID StudentName Instructor InstructorPhone

101 John Doe Dr. Smith 123-456-7890

102 Alice Johnson Dr. Lee 234-567-8901

In this case, the InstructorPhone depends on the Instructor, which depends on the StudentID. This is
a transitive dependency, because InstructorPhone is indirectly dependent on StudentID through
Instructor.

To eliminate the transitive dependency, we split the relation into two tables:

Student Table:
14
DBMS (R23) UNIT-4

StudentID StudentName Instructor

101 John Doe Dr. Smith

102 Alice Johnson Dr. Lee

Instructor Table:

Instructor InstructorPhone

Dr. Smith 123-456-7890

Dr. Lee 234-567-8901

Now, the table is in 3NF because there are no transitive dependencies.

4. Boyce-Codd Normal Form (BCNF)

A relation is in BCNF if:

1. It is in 3NF.
2. For every non-trivial functional dependency X→YX \to YX→Y, X must be a superkey
(i.e., X should uniquely identify a tuple in the relation).

In other words, a relation is in BCNF if every determinant is a candidate key.

Example:

Consider the following table:

StudentID CourseID Instructor InstructorPhone

101 CS101 Dr. Smith 123-456-7890

102 CS102 Dr. Lee 234-567-8901

Here, the Instructor is a determinant for InstructorPhone, but Instructor is not a superkey. This
violates BCNF, as the non-prime attribute InstructorPhone is dependent on Instructor, which is
not a candidate key.

To make the relation BCNF, we split it into two relations:

Student-Course Table:

15
DBMS (R23) UNIT-4

StudentID CourseID Instructor

101 CS101 Dr. Smith

102 CS102 Dr. Lee

Instructor Table:

Instructor InstructorPhone

Dr. Smith 123-456-7890

Dr. Lee 234-567-8901

Now, the relation is in BCNF, as all determinants (like Instructor) are candidate keys in their
respective tables.

Summary of Normal Forms Based on Functional Dependencies:

1. 1NF: Ensures that attributes have atomic values (no repeating groups or arrays).
o Example: Splitting columns with multiple values into separate rows.

2. 2NF: Eliminates partial dependencies (non-prime attributes fully depend on the primary
key).
o Example: Separating composite primary key-related data into different tables.

3. 3NF: Eliminates transitive dependencies (non-prime attributes do not depend on other non-
prime attributes).
o Example: Splitting tables where attributes indirectly depend on the primary key.

4. BCNF: Every determinant must be a superkey.


o Example: Ensuring that all functional dependencies involve superkeys.

Each step of normalization (1NF → 2NF → 3NF → BCNF) refines the database schema further by
eliminating redundancy and improving data integrity through functional dependencies.

CONCEPT-5:

Concept of surrogate key


A key is a column, or group of columns, in a database management system (DBMS) that uniquely
identifies every row in a table. Natural keys and surrogate keys are the two categories of keys.
 Natural Key: A column, or group of columns, that is generated from the table’s data is
known as a natural key. For instance, since it uniquely identifies every client in the table, the
customer ID column in a customer table serves as a natural key.
 Surrogate key: A column that is not generated from the data in the database is known as a
surrogate key. Rather, the DBMS generates a unique identifier for you. In database tables,
surrogate keys are frequently utilized as primary keys.
16
DBMS (R23) UNIT-4

Characteristics of a Surrogate Key:

1. System-generated: The key is generated by the database system or the application. It has no
inherent meaning in the context of the data, unlike natural keys that might reflect real-world
data.
2. Unique: It is always unique for each record in the table.
3. Non-nullable: Surrogate keys are usually defined as NOT NULL because they must
uniquely identify each row.
4. Immutable: Once assigned, a surrogate key does not change, even if the data in the record
changes.
5. Integer or UUID: Surrogate keys are commonly integers (auto-incremented) or UUIDs, but
can also be other types that guarantee uniqueness.

Advantages of Using Surrogate Keys:

1. Simplification: Surrogate keys simplify the relationships between tables, especially when
natural keys are complex or large.
2. No Business Meaning: Since surrogate keys have no business meaning, they help in
avoiding problems if business rules change. For example, a product code (natural key) might
change, but the surrogate key remains the same.
3. Improved Performance: Surrogate keys are often integers or short, fixed-length strings,
which can be more efficient for indexing and lookups compared to large, compound natural
keys.
4. Avoiding Redundancy: Surrogate keys help avoid redundancy when a composite natural
key is used (especially when one or more columns in a composite key can be null).

Disadvantages of Using Surrogate Keys:

1. No Meaning: Surrogate keys have no business meaning, so it can be harder to understand


relationships between tables at a glance.
2. Requires Mapping: To understand the data, one might need to perform a lookup to the
original source data to interpret the surrogate key.
3. Dependence on System: The application or system generating the surrogate key must
ensure its uniqueness and proper management.

When to Use a Surrogate Key:

1. When a Natural Key is Composite or Complex: When the natural key consists of multiple
attributes (composite key), it can become cumbersome to use in foreign key relationships. A
surrogate key simplifies the foreign key relationship.

Example: If the combination of FirstName, LastName, and DateOfBirth is used as a


natural key (for example, in a Person table), it can be simplified by using a single surrogate
key (e.g., PersonID).

2. When the Natural Key is Likely to Change: If there is a possibility that the natural key
might change (e.g., a ProductCode that may change in the future), a surrogate key can
provide a stable identifier.
17
DBMS (R23) UNIT-4

3. For Data Integrity and Performance: Surrogate keys, especially when they are integers or
UUIDs, help with indexing, improving query performance, and maintaining data integrity.

Consider an example: 1
Suppose we have two tables of two different schools having the same column registration_no,
name, and percentage, each table having its own natural primary key, that is registration_no.
Table of school A:
registration_no name percentage

210101 Harry 90

210102 Maxwell 65

210103 Lee 87

210104 Chris 76

Table of school B:

registration_no name percentage

CS107 Taylor 49

CS108 Simon 86

CS109 Sam 96

CS110 Andy 58

Now, suppose we want to merge the details of both the schools in a single table.
Resulting table will be:

surr_no registration_no name percentage

1 210101 Harry 90

2 210102 Maxwell 65

3 210103 Lee 87

4 210104 Chris 76

18
DBMS (R23) UNIT-4

surr_no registration_no name percentage

5 CS107 Taylor 49

6 CS108 Simon 86

7 CS109 Sam 96

8 CS110 Andy 58

As we can observe the above table and see that registration_no cannot be the primary key of the
table as it does not match with all the records of the table though it is holding all unique values of
the table . Now , in this case, we have to artificially create one primary key for this table. We can
do this by adding a column surr_no in the table that contains anonymous integers and has no
direct relation with other columns . This additional column of surr_no is the surrogate key of the
table.

EXAMPLE-2:

Consider a database of Employees in a company. The Employee table might include a natural key
such as an Employee Number (which is meaningful to the business), but for technical reasons, a
surrogate key (e.g., an auto-incrementing integer or a UUID) might be used instead.

Employee Table with Surrogate Key (EmployeeID):


EmployeeID EmployeeNumber Name Department

1 12345 John Doe HR

2 67890 Jane Smith IT

3 11223 Mark Johnson Finance

In this example:

 EmployeeID is the surrogate key, a unique system-generated identifier (an integer).


 EmployeeNumber is the natural key, which might be used in real-world operations (like
identifying an employee).

In this table:

 EmployeeID serves as the primary key.


 The EmployeeNumber could potentially be unique across the organization but is not
necessarily guaranteed to be used in the database schema as the primary key, especially
when there are concerns like changes in numbering or multiple records related to a single
natural key.

19
DBMS (R23) UNIT-4

CONCEPT-6:

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form (3NF) that ensures
a more simplified and efficient database design. It enforces that every non-trivial functional
dependency must have a superkey on its left-hand side. This approach addresses potential issues
with candidate keys and ensures the database is free from redundancy.
BCNF eliminates redundancy more effectively than 3NF by strictly requiring that all functional
dependencies originate from super-keys.
BCNF is essential for good database schema design in higher-level systems where consistency
and efficiency are important, particularly when there are many candidate keys (as one often finds
with a delivery system).
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: X should be a super-key for every functional dependency (FD) X−>Y in a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and make sure that
they are candidate keys.
To determine the highest normal form of a given relation R with functional dependencies, the first
step is to check whether the BCNF condition holds. If R is found to be in BCNF, it can be safely
deduced that the relation is also in 3NF, 2NF, and 1NF. The 1NF has the least restrictive
constraint – it only requires a relation R to have atomic values in each tuple. The 2NF has a
slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less restrictive
than the BCNF. In this manner, the restriction increases as we traverse down the hierarchy.
We are going to discuss some basic examples which let you understand the properties of BCNF.
We will discuss multiple examples here.
Example 1
Consider a relation R with attributes (student, teacher, subject).

FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) -> subject}
 Candidate keys are (student, teacher) and (student, subject).
 The above relation is in 3NF (since there is no transitive dependency). A relation R is in
BCNF if for every non-trivial FD X->Y, X must be a key.
 The above relation is not in BCNF, because in the FD (teacher->subject), teacher is not a key.
This relation suffers with anomalies −
 For example, if we delete the student Tahira , we will also lose the information that [Link]
teaches C. This issue occurs because the teacher is a determinant but not a candidate key.

R is divided into two relations R1(Teacher, Subject) and R2(Student, Teacher).


For more, refer to BCNF in DBMS.
Example 2
Let us consider the student database, in which data of the student are mentioned.

20
DBMS (R23) UNIT-4

Stu_ID Stu_Branch Stu_Course Branch_NumberStu_Course_No

Computer Science &


101 DBMS B_001 201
Engineering

Computer Science & Computer


101 B_001 202
Engineering Networks

Electronics &
102 VLSI Technology B_003 401
Communication Engineering

Electronics & Mobile


102 B_003 402
Communication Engineering Communication

Functional Dependency of the above is as mentioned:


Stu_ID −> Stu_Branch
Stu_Course −> {Branch_Number, Stu_Course_No}
Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF?
The table present above is not in BCNF, because as we can see that neither Stu_ID nor
Stu_Course is a Super Key. As the rules mentioned above clearly tell that for a table to be in
BCNF, it must follow the property that for functional dependency X−>Y, X must be in Super Key
and here this property fails, that’s why this table is not in BCNF.
How to Satisfy BCNF?
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full
procedure through which we transform this table into BCNF. Let us first divide this main table
into two tables Stu_Branch and Stu_Course Table.
Stu_Branch Table
Stu_ID Stu_Branch

101 Computer Science & Engineering

102 Electronics & Communication Engineering

Candidate Key for this table: Stu_ID.


Stu_Course Table

21
DBMS (R23) UNIT-4

Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Candidate Key for this table: Stu_Course.


Stu_Enroll Table
Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the condition of Super
Key, that in functional dependency X−>Y, X is a Super Key.
Example 3
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
{ BC->D, AC->BE, B->E }
Explanation:
 Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all
attributes of the relation, So AC will be the candidate key. A or C can’t be derived from any
other attribute of the relation, so there will be only 1 candidate key {AC}.
 Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this
example and others will be non-prime {B, D, E} in this example.
 Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-
valued or composite attributes.

22
DBMS (R23) UNIT-4

The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).

The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be a prime attribute. So the
highest normal form of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
From the above functional dependency, we get that the candidate key of R is AB and AC. A
careful observation is required to conclude that the above dependency is a Transitive Dependency
as the prime attribute B transitively depends on the key AB through C. Now, the first and the third
FD are in BCNF as they both contain the candidate key (or simply KEY) on their left sides. The
second dependency, however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as all three FDs satisfy
the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency preserving , however, it
always satisfies the lossless join condition. For example, relation R (V, W, X, Y, Z), with
functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not always possible to
eliminate them completely.

CONCEPT-7:

Multivalued Functional Dependency (MVD) in DBMS

Multivalued Functional Dependency (MVD) is a type of dependency in a relational database


where one attribute (or a set of attributes) determines multiple independent values for another
attribute (or set of attributes). It is an extension of Functional Dependency (FD), and it plays an
important role in higher levels of normalization (specifically Fourth Normal Form (4NF)).

Formal Definition of Multivalued Functional Dependency (MVD):

For a relation R, a multivalued functional dependency is represented as:

X →→ Y

23
DBMS (R23) UNIT-4

Where:

 X is a set of attributes (the determinant).


 Y is a set of attributes (the dependent).
 X →→ Y means that for every value of X in the relation, there is a set of values for Y, and
these values are independent of the other attributes in the relation.

Key Points about MVD:

1. Independence: The values of Y do not depend on the other non-determined attributes in the
relation.
2. Symmetry: The relation holds independently for all combinations of tuples with the same
value of X.

When Does MVD Occur?

An MVD exists in a relation if, for any given value of X, the set of values for Y is independent of
the other attributes (attributes not in X). This implies that there can be multiple independent sets of
values for Y that correspond to a given value of X.

Example of Multivalued Functional Dependency:

Consider a relation R with the following attributes: StudentID, Course, and Instructor.

StudentID Course Instructor

101 Database Dr. Smith

101 Operating Sys Dr. Lee

101 Networking Dr. Johnson

102 Database Dr. Smith

102 Operating Sys Dr. Lee

In this case, StudentID →→ Course and StudentID →→ Instructor are multivalued functional
dependencies:

1. StudentID →→ Course means that for each StudentID, there are multiple independent
values for Course (e.g., Student 101 has courses "Database", "Operating Sys", and
"Networking"). The values for Course are independent of the Instructor.
2. StudentID →→ Instructor means that for each StudentID, there are multiple independent
values for Instructor (e.g., Student 101 has instructors "Dr. Smith", "Dr. Lee", and "Dr.
Johnson"). The values for Instructor are independent of the Course.

Thus, there are independent sets of values for both Course and Instructor for each StudentID, but
the two sets are independent of each other.

24
DBMS (R23) UNIT-4

Problem with Multivalued Dependencies:

If we do not handle MVDs properly, it can lead to redundant data and update anomalies. For
instance, if you have a student enrolled in multiple courses, you may end up repeating the student's
information in several rows, increasing the possibility of redundancy.

Normalization with Multivalued Functional Dependencies (4NF):

The presence of multivalued functional dependencies violates the Fourth Normal Form (4NF),
which requires that:

 A relation is in 4NF if it is in Boyce-Codd Normal Form (BCNF) and does not have any
multivalued dependencies unless the multivalued dependency is trivial (i.e., when the
determinant is a superset of the dependent attributes).

To convert a relation to 4NF, you must:

1. Identify and eliminate the MVDs: For every MVD X →→ Y, split the relation into
separate relations that eliminate the redundancy caused by the multivalued dependency.
2. Create separate tables: For each X →→ Y, create a new table with X and Y attributes,
while leaving other attributes unaffected.

Example of Decomposing a Relation to 4NF:

Let’s take the earlier relation as an example:

StudentID Course Instructor

101 Database Dr. Smith

101 Operating Sys Dr. Lee

101 Networking Dr. Johnson

102 Database Dr. Smith

102 Operating Sys Dr. Lee

This table has the MVD StudentID →→ Course and StudentID →→ Instructor, which violates
4NF.

Decompose the table into two relations to eliminate the MVDs:

1. Student-Course Relation (for StudentID →→ Course):

25
DBMS (R23) UNIT-4

StudentID Course

101 Database

101 Operating Sys

101 Networking

102 Database

102 Operating Sys

2. Student-Instructor Relation (for StudentID →→ Instructor):

StudentID Instructor

101 Dr. Smith

101 Dr. Lee

101 Dr. Johnson

102 Dr. Smith

102 Dr. Lee

After the decomposition, each relation is now in 4NF, and the redundancy caused by multivalued
dependencies is eliminated.

CONCEPT-8:

Fourth Normal Form (4NF) in DBMS

Fourth Normal Form (4NF) is a higher level of database normalization that deals specifically with
multi-valued dependencies (MVDs). It builds upon the concepts of Boyce-Codd Normal Form
(BCNF), addressing situations where BCNF doesn't fully eliminate redundancy caused by multi-
valued dependencies.

Definition of 4NF:

A relation is in Fourth Normal Form (4NF) if and only if it is in Boyce-Codd Normal Form
(BCNF) and has no multi-valued dependencies (MVDs).

 Multi-valued Dependency (MVD): A multi-valued dependency occurs when one attribute


determines a set of values for another attribute (or set of attributes), and this set of values is
independent of other attributes in the relation.

26
DBMS (R23) UNIT-4

MVD Formal Definition:

 A multi-valued dependency X →→ Y means that if two tuples agree on the values of X,


they must have all possible combinations of values from Y in the corresponding tuples,
independent of the values in the other attributes of the relation.

Key Concepts in 4NF:

1. Multi-valued Dependency (MVD): A multi-valued dependency occurs when one attribute


(or a set of attributes) in a table uniquely determines multiple values for another attribute.
The key idea is that these multiple values are independent of other attributes.
2. BCNF vs 4NF: BCNF handles functional dependencies but doesn't address redundancy
caused by multi-valued dependencies. 4NF resolves such issues by requiring that no multi-
valued dependencies exist unless the left side of the dependency is a superkey.
3. Multivalued Dependency vs Functional Dependency:
o Functional Dependency: A relationship between attributes where one attribute (or
set of attributes) uniquely determines another.
o Multi-valued Dependency: A relationship where one attribute (or set) determines a
set of other attributes, but the set of determined values does not depend on other
attributes in the relation.

When Do We Have an MVD?

An MVD occurs when you have:

 A set of attributes that can have multiple values, and


 The set of values for an attribute is independent of other attributes in the relation.

4NF Condition:

A relation is in 4NF if:

 The relation is in BCNF, and


 There are no multi-valued dependencies unless the left side of the dependency is a
superkey.

Example of Violation of 4NF:

Consider the following Student-Course Table:

StudentID Course Language

101 DBMS English

101 OS French

102 DBMS Spanish

102 OS French

27
DBMS (R23) UNIT-4

Functional Dependencies:

 StudentID →→ Course (A student can take multiple courses)


 StudentID →→ Language (A student can know multiple languages)

Here, StudentID →→ Course and StudentID →→ Language are multi-valued dependencies,


because:

 A student can enroll in multiple courses, and


 A student can know multiple languages.

However, the Course and Language attributes are independent of each other. For example, the
course "DBMS" is independent of the language "English."

Thus, this table violates 4NF because it has multi-valued dependencies where StudentID
determines both Course and Language independently.

Decomposition to Achieve 4NF:

To make the table conform to 4NF, we decompose it into two relations:

1. Student-Course Relation:

StudentID Course

101 DBMS

101 OS

102 DBMS

102 OS

2. Student-Language Relation:

StudentID Language

101 English

101 French

102 Spanish

102 French

Why is This Decomposition Valid?

 In the decomposed relations:

28
DBMS (R23) UNIT-4

o Student-Course no longer has multi-valued dependencies because each tuple now


refers to a single course.
o Student-Language no longer has multi-valued dependencies because each tuple
refers to a single language.

Both relations are now in BCNF and 4NF, because there are no multi-valued dependencies in either
of them.

Key Points About 4NF:

1. Avoids Redundancy: By eliminating multi-valued dependencies, 4NF helps to avoid


redundancy in the database, reducing the risk of inconsistent data and anomalies.
2. Complexity: Achieving 4NF can sometimes involve decomposing relations into multiple
smaller relations, which can make the design more complex.
3. Lossless Decomposition: When decomposing relations to achieve 4NF, the decomposition
must be lossless and dependency-preserving.

4NF vs 5NF:

 5NF (Fifth Normal Form) addresses join dependencies, which is a further step beyond
4NF.
 While 4NF deals with multi-valued dependencies, 5NF deals with more complex forms of
redundancy that can be eliminated by decomposing a table into smaller relations based on
join dependencies.

NOTE:

 4NF is concerned with removing multi-valued dependencies that can cause redundancy
and anomalies in a database.
 A relation is in 4NF if it is already in BCNF and if there are no multi-valued dependencies
unless the left side of the dependency is a superkey.
 Achieving 4NF can involve decomposing relations into smaller tables to ensure that each
non-prime attribute is fully functionally dependent on the superkey and that no multi-valued
dependencies exist.

CONCEPT-9:

Fifth Normal Form (5NF) in DBMS

Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a level of
database normalization that ensures there is no redundancy in the database due to join
dependencies. It builds on the previous normal forms, such as BCNF and 4NF, and addresses more
complex forms of redundancy that can arise when dealing with certain types of relationships
between attributes in a table.

29
DBMS (R23) UNIT-4

Definition of 5NF:

A relation is in Fifth Normal Form (5NF) if and only if it is in Fourth Normal Form (4NF) and
cannot be decomposed into smaller relations without losing information. In other words, a relation
is in 5NF if:

 It is in 4NF.
 It is free from join dependencies that cause redundancy.

Join Dependency: A join dependency exists when a relation can be decomposed into multiple
smaller relations, and the original relation can be reconstructed by joining those smaller relations.
5NF specifically handles situations where such decompositions could lead to redundancy.

Key Concepts of 5NF:

1. Join Dependency: A join dependency is a condition where a relation can be decomposed


into smaller relations, but when these smaller relations are joined back, the original relation
is restored. In 5NF, a relation must not have any non-trivial join dependencies that cannot
be represented by the join of its sub-relations.
2. Lossless Decomposition: Decompositions into smaller relations in 5NF are lossless,
meaning that no data is lost, and the original relation can be reconstructed by performing a
natural join on the decomposed relations.
3. Trivial Join Dependency: A trivial join dependency exists when the original relation is the
same as the join of its sub-relations. For example, decomposing a relation into its exact
components (i.e., a relation and itself) is trivially a join dependency.

Why 5NF is Needed:

5NF is needed to address situations where there are composite keys and complex relationships
between attributes that could cause redundancy due to the way data is split across multiple relations.
These redundancies might not be addressed by 4NF or BCNF.

In simpler terms, 5NF eliminates redundancy caused by join dependencies.

Example of 5NF:

Consider a table storing information about Projects, Employees, and the Skills required for the
projects.

ProjectID EmployeeID Skill

P1 E1 Java

P1 E1 SQL

P2 E2 Python

P2 E2 SQL

P3 E3 Java
30
DBMS (R23) UNIT-4

ProjectID EmployeeID Skill

P3 E3 Python

Functional Dependencies:

1. ProjectID, EmployeeID → Skill (A particular employee working on a project requires a


specific skill).
2. EmployeeID → Skill (An employee has a set of skills they are capable of).

Violation of 5NF:

In this table, there is redundancy due to the fact that an employee might be assigned to different
projects, and each employee might have multiple skills for a project. Specifically, the combination
of ProjectID, EmployeeID, and Skill may lead to an unnecessary repetition of information in the
table.

The redundancy in this case arises due to the join dependency between the three attributes
(ProjectID, EmployeeID, and Skill).

To eliminate this redundancy and move to 5NF, we would decompose the table into smaller
relations where each piece of information is stored only once:

1. Project-Employee relation:

ProjectID EmployeeID

P1 E1

P2 E2

P3 E3

2. Employee-Skill relation:

EmployeeID Skill

E1 Java

E1 SQL

E2 Python

E2 SQL

E3 Java

31
DBMS (R23) UNIT-4

EmployeeID Skill

E3 Python

3. Project-Skill relation:

ProjectID Skill

P1 Java

P1 SQL

P2 Python

P2 SQL

P3 Java

P3 Python

Why This Decomposition Works:

 Each relation is now free from any redundancy due to join dependencies.
 The original table can be reconstructed by performing natural joins on the three
decomposed relations.
 There is no loss of information, and the decomposition is lossless.
 Each sub-relation handles a specific aspect of the data, which means there is no redundancy
in terms of storing repeated combinations of ProjectID, EmployeeID, and Skill.

Key Points of 5NF:

1. 5NF Eliminates Redundancy: By addressing join dependencies, 5NF ensures that no


unnecessary redundancy exists in the database. This reduces the potential for data anomalies
and inconsistency.
2. Complexity: Achieving 5NF often requires decomposing a relation into many smaller
relations. While it reduces redundancy, it can lead to increased complexity in the database
design.
3. Lossless Join: The decomposition into 5NF is always lossless, meaning no data is lost
during the process of decomposition and reconstitution of the relations.
4. Data Integrity: Ensures the integrity of data by ensuring that the information in the
database is consistent and cannot be misrepresented by redundant or incomplete
information.

32
DBMS (R23) UNIT-4

To explain all the normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) with a single example, let's
use a Student-Course-Professor Table that has some redundancies and issues at various stages of
normalization. We will progressively apply normalization rules to this table and explain each form.

Initial Table (UNF - Unnormalized Form)

We start with the following unnormalized table containing data about students, the courses they
take, and the professors teaching those courses:

StudentID StudentName Course Professor ProfessorPhone CourseLocation

101 John DBMS Dr. Smith 123-456-7890 Room A

101 John OS Dr. Lee 234-567-8901 Room B

102 Sarah DBMS Dr. Smith 123-456-7890 Room A

102 Sarah OS Dr. Lee 234-567-8901 Room B

103 Mike DBMS Dr. Smith 123-456-7890 Room A

This table is not normalized because:

 The ProfessorPhone is repeated for every student enrolled in the same course taught by the
same professor.
 StudentName depends only on StudentID, not on the combination of StudentID and
Course.

We will now normalize this table step by step through the various normal forms.

1. First Normal Form (1NF)

A relation is in 1NF if:

 All columns contain atomic values (no repeating groups).


 Each record is unique.

To convert to 1NF:

We need to ensure that there are no repeating groups and that each attribute contains only atomic
values.

Since the original table already contains atomic values and each row is unique, it is already in 1NF.

StudentID StudentName Course Professor ProfessorPhone CourseLocation

101 John DBMS Dr. Smith 123-456-7890 Room A

33
DBMS (R23) UNIT-4

StudentID StudentName Course Professor ProfessorPhone CourseLocation

101 John OS Dr. Lee 234-567-8901 Room B

102 Sarah DBMS Dr. Smith 123-456-7890 Room A

102 Sarah OS Dr. Lee 234-567-8901 Room B

103 Mike DBMS Dr. Smith 123-456-7890 Room A

2. Second Normal Form (2NF)

A relation is in 2NF if:

 It is in 1NF.
 Every non-prime attribute is fully functionally dependent on the primary key (no partial
dependency).

Partial Dependency:

In the above table, the primary key is a composite of StudentID and Course. However,
StudentName depends only on StudentID, and ProfessorPhone and CourseLocation depend
only on Professor (and indirectly on Course). These are partial dependencies.

To convert to 2NF:

We need to remove partial dependencies. We can do this by decomposing the table into two:

1. Student-Course Table (removes partial dependency for StudentName):

StudentID StudentName

101 John

102 Sarah

103 Mike

2. Course-Professor Table (removes partial dependency for ProfessorPhone and


CourseLocation):

Course Professor ProfessorPhone CourseLocation

DBMS Dr. Smith 123-456-7890 Room A

34
DBMS (R23) UNIT-4

Course Professor ProfessorPhone CourseLocation

OS Dr. Lee 234-567-8901 Room B

3. Student-Course-Professor Table:

StudentID Course

101 DBMS

101 OS

102 DBMS

102 OS

103 DBMS

Now, StudentName is fully dependent on StudentID, and the professor-related information is in a


separate table, which eliminates partial dependencies.

3. Third Normal Form (3NF)

A relation is in 3NF if:

 It is in 2NF.
 There is no transitive dependency (i.e., non-prime attributes must depend only on the
primary key).

Transitive Dependency:

In the Course-Professor Table, ProfessorPhone depends on Professor, which in turn is associated


with Course. This creates a transitive dependency: Course → Professor → ProfessorPhone.

To convert to 3NF:

We need to remove the transitive dependency by splitting the data further. We create a new table for
Professor information:

1. Professor Table:

Professor ProfessorPhone

Dr. Smith 123-456-7890

Dr. Lee 234-567-8901

35
DBMS (R23) UNIT-4

Professor ProfessorPhone

2. Student-Course Table (remains the same):

StudentID Course

101 DBMS

101 OS

102 DBMS

102 OS

103 DBMS

3. Course-Professor Table (removes ProfessorPhone):

Course Professor CourseLocation

DBMS Dr. Smith Room A

OS Dr. Lee Room B

Now, ProfessorPhone is fully dependent on Professor, and there are no transitive dependencies.

4. Boyce-Codd Normal Form (BCNF)

A relation is in BCNF if:

 It is in 3NF.
 For every functional dependency, the left-hand side is a superkey.

Violation of BCNF:

In the Course-Professor Table, the Course attribute determines Professor and CourseLocation,
but Course is not a superkey (the superkey would be a combination of Course and Professor).

To convert to BCNF:

We need to decompose the Course-Professor Table further to ensure every determinant is a


superkey:

1. Course Table:
36
DBMS (R23) UNIT-4

Course CourseLocation

DBMS Room A

OS Room B

2. Professor Table (remains the same):

Professor ProfessorPhone

Dr. Smith 123-456-7890

Dr. Lee 234-567-8901

3. Student-Course Table (remains the same):

StudentID Course

101 DBMS

101 OS

102 DBMS

102 OS

103 DBMS

Now, Course determines CourseLocation and is part of the primary key, ensuring BCNF.

5. Fourth Normal Form (4NF)

A relation is in 4NF if:

 It is in BCNF.
 There are no multi-valued dependencies.

Multi-valued Dependency:

In the Student-Course Table, a student can enroll in multiple courses, and each course has
multiple professors. This creates a potential multi-valued dependency, where a student can have
multiple independent sets of values for courses and professors.

To convert to 4NF:

We need to remove the multi-valued dependency by decomposing the relation:


37
DBMS (R23) UNIT-4

1. Student-Course Table:

StudentID Course

101 DBMS

101 OS

102 DBMS

102 OS

103 DBMS

2. Student-Professor Table (removes multi-valued dependency):

StudentID Professor

101 Dr. Smith

101 Dr. Lee

102 Dr. Smith

102 Dr. Lee

103 Dr. Smith

Now, we have separated the multi-valued dependencies into distinct tables, ensuring no
redundancy.

6. Fifth Normal Form (5NF)

A relation is in 5NF if:

 It is in 4NF.
 It cannot be decomposed further without losing information (i.e., no join dependencies).

Join Dependency:

In our example, there are no additional join dependencies because each table contains information
that cannot be further decomposed without losing data.

Therefore, the relation is already in 5NF.

38
DBMS (R23) UNIT-4

Final Tables After All Normalizations:

1. Student Table:

StudentID StudentName

101 John

102 Sarah

103 Mike

2. Course Table:

Course CourseLocation

DBMS Room A

OS Room B

3. Professor Table:

Professor ProfessorPhone

Dr. Smith 123-456-7890

Dr. Lee 234-567-8901

4. Student-Course Table:

StudentID Course

101 DBMS

101 OS

39

Common questions

Powered by AI

Surrogate keys significantly simplify foreign key relationships, especially when dealing with complex or composite natural keys, by providing a uniform, single-attribute key. They eliminate the difficulties associated with handling complex natural keys across tables, reducing storage costs and improving query performance due to simplified indexing .

Fourth Normal Form (4NF) extends Boyce-Codd Normal Form (BCNF) by specifically addressing redundancy caused by multi-valued dependencies. While BCNF eliminates redundancy related to composite keys by ensuring that every determinant is a superkey, 4NF handles scenarios with multi-valued dependencies, ensuring that a non-trivial multi-valued dependency does not exist unless it is trivial .

For a database schema to be in Boyce-Codd Normal Form (BCNF), it must first be in Third Normal Form (3NF), and every non-trivial functional dependency must have a superkey as its determinant. This ensures a more simplified and efficient database design by eliminating potential redundancy related to non-superkey candidates .

Natural keys are columns or groups of columns that are derived from the table's data and hold a business meaning, such as a Customer ID in a customer table. They are useful when the identifier has a real-world meaning and likely remains stable. Surrogate keys, on the other hand, are system-generated identifiers with no inherent business meaning, such as an auto-incremented integer. These are preferred in situations where natural keys are composite or complex, likely to change, or when improved performance and simplicity in relation management is necessary .

Surrogate keys generally improve performance because they are usually integers or short, fixed-length strings, which are efficient for indexing and lookups. These keys streamline joins and increase the speed of queries compared to larger, compound natural keys or when handling multi-column natural keys .

To resolve a multi-valued dependency and achieve Fourth Normal Form (4NF), decompose the original table into separate tables that isolate the independent multi-valued associations. For each multi-valued dependency X →→ Y, create a new relation with only attributes X and Y. This reduces redundancy and prevents data anomalies .

To ensure a relation is in Third Normal Form (3NF), decompose the relation so that it is in 2NF and any transitive dependencies are removed. This requires that all non-prime attributes are directly and solely dependent on the primary key, without intermediary dependencies. This normalization reduces redundancy and enhances data integrity and consistency .

A surrogate key is most beneficial in maintaining data integrity in scenarios where a natural key might change over time, such as a ProductCode. A surrogate key provides a stable identifier, ensuring integrity as it remains constant even if business rules change and natural keys are updated .

Surrogate keys can obscure relationships between tables because they have no business meaning, complicating intuitive understanding of data connections. This issue is mitigated by maintaining comprehensive documentation and relationships charts that map surrogate keys to their natural counterparts, or by creating views or joins with meaningful data for human validation .

A relation is in Fifth Normal Form (5NF) if it cannot be decomposed into smaller relations without losing data through lossless join dependencies. This form eliminates any join dependencies, ensuring that a relation is fully decomposed to eliminate redundancy. 5NF builds on 4NF by ensuring there are no non-trivial join dependencies left, meaning the relation cannot be further divided without loss of data integrity .

You might also like