DBMS Schema Refinement & Normalization Guide
DBMS Schema Refinement & Normalization Guide
1
DBMS (R23) UNIT-4
CONCEPT-1:
In DBMS, a schema defines the structure of the database, including tables, attributes, relationships,
and constraints. Schema refinement typically involves activities such as:
1. Normalization: The process of organizing data to reduce redundancy and improve data
integrity. This involves dividing large tables into smaller, more manageable tables and
defining relationships between them using keys (like primary and foreign keys).
Normalization helps eliminate anomalies like insertion, update, and deletion anomalies.
2. Eliminating Redundancy: Redundant data, which can lead to unnecessary storage use and
potential inconsistencies, is removed through normalization or restructuring of the schema.
3. Optimizing Queries: Refining the schema may involve reorganizing data or creating
indexes and views to make query execution more efficient.
4. Ensuring Data Integrity: Constraints such as primary keys, foreign keys, unique keys, and
check constraints are applied to ensure that data entered into the database is accurate,
consistent, and follows the business rules.
5. Improving Maintainability: Simplifying the schema, making it easier to add new features
or modify existing ones, helps ensure long-term maintainability. This can involve making
the schema modular and easy to understand.
Normalization in DBMS:
Purpose of Normalization:
2
DBMS (R23) UNIT-4
o Insertion Anomaly: Difficulties arise when adding new data due to the structure of
the database.
o Deletion Anomaly: Deleting one piece of data may accidentally remove other
important data.
5. Improve Query Performance: In some cases, normalized data can be more efficient for
certain types of queries, especially if it results in smaller and more focused tables.
There are several levels or normal forms (NF) used in normalization, each one addressing a
specific kind of redundancy or anomaly. The most common normal forms are:
CONCEPT-2:
3
DBMS (R23) UNIT-4
Formal Definition:
For two sets of attributes X and Y in a relation, X → Y (read as "X determines Y") is a functional
dependency if, for every pair of tuples (rows) in the relation, whenever two tuples have the same
value for attributes in X, they must have the same value for attributes in Y.
Key Points:
X → Y means that if two rows have the same value for attributes in X, they must also have
the same value for attributes in Y.
A functional dependency does not necessarily imply uniqueness, but rather a relationship
between attributes.
FDs are essential in normalization as they help identify how to break down relations to
eliminate redundancy and dependency.
4
DBMS (R23) UNIT-4
Functional dependency in a database management system offers several advantages for businesses
and organizations:
Prevents Duplicate Data:
Functional dependency helps avoid storing the same data repeatedly in the database, reducing
redundancy and saving storage space.
Improves Data Quality and Accuracy:
By organizing data efficiently and minimizing duplication, functional dependency ensures the
data is reliable, consistent, and of high quality.
Reduces Errors:
Keeping data organized and concise lowers the chances of errors in records or datasets,
making it easier to manage and update information.
Saves Time and Costs:
Properly organized data allows for quicker and easier access, improving productivity and
reducing the time and cost of managing information.
Defines Rules and Behaviors:
Functional dependency allows setting rules and constraints that control how data is stored,
accessed, and maintained, ensuring better data management.
Helps Identify Poor Database Design:
It highlights issues like scattered or missing data across tables, helping identify and fix design
flaws to maintain consistency and integrity.
1. Augmentation:
This rule of Armstrong's axioms state that, If S -> T, then we can say SR -> TR.
2. Reflexivity:
This rule of Armstrong's axioms state that, If S -> T, then T is the subset of S.
5
DBMS (R23) UNIT-4
3. Transitivity:
This rule of Armstrong's axioms state that, If S -> T and T -> U, then we can say S -> U.
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a
subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of
trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant.
i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional dependency.
Example 1 :
Id -> Name
Name -> DOB
Example 2:
roll_no name age
42 abc 17
43 pqr 18
44 xyz 18
6
DBMS (R23) UNIT-4
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a
subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional
dependency, since age is not a subset of {roll_no, name}
Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
Part of the dependent attribute (Course_ID) is already included in the determinant
({Student_ID, Course_ID}).
However, the dependency is not completely trivial because {StudentID}→CourseID is not
implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each
other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then it is
called a multivalued functional dependency.
Example:
bike_model manuf_year color
7
DBMS (R23) UNIT-4
In this table:
X: bike_model
Y: color
Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year does not
depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS .
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a →
b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.
Example:
enrol_no name dept building_no
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity,
enrol_no → building_no is a valid functional dependency. This is an indirect functional
dependency, hence called Transitive functional dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and
X->Z which states that those dependencies are fully functional.
Read more about Fully Functional Dependency.
8
DBMS (R23) UNIT-4
CONCEPT-3:
In relational database design, decomposition refers to breaking down a large relation (table) into
smaller relations (tables) to achieve better data organization, minimize redundancy, and improve
normalization. However, decomposing a relation needs to be done carefully to ensure that:
1. Lossless Join: No information is lost during the decomposition, and the original relation can
be perfectly reconstructed by joining the decomposed relations.
2. Dependency-Preserving: All functional dependencies (FDs) are preserved, meaning no
dependency is lost when relations are decomposed.
A decomposition of a relation R into two sub-relations R1 and R2 is lossless if and only if:
In simple terms:
The common attributes between R1 and R2 (i.e., R1 ∩ R2) should be a superkey in at least
one of the sub-relations. This ensures that when you join R1 and R2, you can uniquely
identify tuples without ambiguity, preserving the integrity of the data.
9
DBMS (R23) UNIT-4
1. R1 (StudentID, Course)
2. R2 (Course, Instructor)
Since the intersection attribute is a superkey in one of the relations, this decomposition is lossless.
By joining R1 and R2 on the Course attribute, we can perfectly reconstruct the original relation R
without any loss of data.
2. Dependency-Preserving Decomposition
Dependency-Preserving Condition:
Consider the relation R with attributes StudentID, Course, and Instructor. Suppose the functional
dependencies are:
1. StudentID → Course
2. Course → Instructor
1. R1 (StudentID, Course)
2. R2 (Course, Instructor)
For R1, the FD StudentID → Course holds, and for R2, the FD Course → Instructor holds.
Thus, all functional dependencies in the original relation R are preserved in the decomposed
relations R1 and R2.
10
DBMS (R23) UNIT-4
This decomposition is dependency-preserving because all the functional dependencies from the
original relation are satisfied by the decomposed relations.
In an ideal scenario, when performing a decomposition, you want it to be both lossless join and
dependency-preserving. However, it's not always possible to achieve both at the same time.
Example Where Lossless Join and Dependency-Preserving Are Not Both Achieved:
Consider the relation R with the attributes StudentID, Course, Instructor, and InstructorPhone:
Functional dependencies:
1. StudentID → Course
2. Course → Instructor
3. Instructor → InstructorPhone
1. R1 (StudentID, Course)
2. R2 (Course, Instructor, InstructorPhone)
Thus, achieving both lossless join and dependency-preserving is not always possible, especially in
certain complex cases. In those cases, you may prioritize one property (e.g., lossless join) over the
other (e.g., dependency-preserving).
11
DBMS (R23) UNIT-4
NOTE:
Lossless Join Decomposition ensures that no data is lost during the decomposition, and you
can always recombine the decomposed relations to recover the original relation.
Dependency-Preserving Decomposition ensures that the functional dependencies in the
original relation are still enforceable in the decomposed relations, so you don't have to join
them to enforce those dependencies.
CONCEPT-4:
In database management, normal forms are used to organize a database schema in such a way that it
minimizes redundancy, avoids undesirable characteristics like anomalies, and ensures data integrity.
Functional dependencies (FDs) play a central role in defining these normal forms. Let's go
through the most commonly used normal forms based on functional dependency: 1NF, 2NF,
3NF, and BCNF.
Example:
Consider a table that stores information about students and their courses:
12
DBMS (R23) UNIT-4
This table violates 1NF because the Courses column contains multiple values (i.e., multiple courses
for a student) in a single attribute. To bring this table to 1NF, we must split the values in the
Courses column into separate rows:
1. It is in 1NF.
2. It does not have any partial dependency. That is, all non-prime attributes (attributes that
are not part of a candidate key) are fully functionally dependent on the entire primary key.
Partial dependency occurs when a non-prime attribute depends only on a part of a composite
primary key.
Example:
Consider the following table with a composite primary key consisting of StudentID and CourseID:
In this case, the primary key is the combination of StudentID and CourseID. But the Instructor and
InstructorPhone depend only on CourseID, not on the entire primary key. This is a partial
dependency.
Student-Course Table:
13
DBMS (R23) UNIT-4
StudentID CourseID
101 CS101
101 CS102
102 CS101
Course-Details Table:
1. It is in 2NF.
2. It has no transitive dependency. That is, non-prime attributes should not depend on other
non-prime attributes.
Transitive dependency occurs when an attribute depends on another attribute, which in turn
depends on the primary key.
Example:
In this case, the InstructorPhone depends on the Instructor, which depends on the StudentID. This is
a transitive dependency, because InstructorPhone is indirectly dependent on StudentID through
Instructor.
To eliminate the transitive dependency, we split the relation into two tables:
Student Table:
14
DBMS (R23) UNIT-4
Instructor Table:
Instructor InstructorPhone
1. It is in 3NF.
2. For every non-trivial functional dependency X→YX \to YX→Y, X must be a superkey
(i.e., X should uniquely identify a tuple in the relation).
Example:
Here, the Instructor is a determinant for InstructorPhone, but Instructor is not a superkey. This
violates BCNF, as the non-prime attribute InstructorPhone is dependent on Instructor, which is
not a candidate key.
Student-Course Table:
15
DBMS (R23) UNIT-4
Instructor Table:
Instructor InstructorPhone
Now, the relation is in BCNF, as all determinants (like Instructor) are candidate keys in their
respective tables.
1. 1NF: Ensures that attributes have atomic values (no repeating groups or arrays).
o Example: Splitting columns with multiple values into separate rows.
2. 2NF: Eliminates partial dependencies (non-prime attributes fully depend on the primary
key).
o Example: Separating composite primary key-related data into different tables.
3. 3NF: Eliminates transitive dependencies (non-prime attributes do not depend on other non-
prime attributes).
o Example: Splitting tables where attributes indirectly depend on the primary key.
Each step of normalization (1NF → 2NF → 3NF → BCNF) refines the database schema further by
eliminating redundancy and improving data integrity through functional dependencies.
CONCEPT-5:
1. System-generated: The key is generated by the database system or the application. It has no
inherent meaning in the context of the data, unlike natural keys that might reflect real-world
data.
2. Unique: It is always unique for each record in the table.
3. Non-nullable: Surrogate keys are usually defined as NOT NULL because they must
uniquely identify each row.
4. Immutable: Once assigned, a surrogate key does not change, even if the data in the record
changes.
5. Integer or UUID: Surrogate keys are commonly integers (auto-incremented) or UUIDs, but
can also be other types that guarantee uniqueness.
1. Simplification: Surrogate keys simplify the relationships between tables, especially when
natural keys are complex or large.
2. No Business Meaning: Since surrogate keys have no business meaning, they help in
avoiding problems if business rules change. For example, a product code (natural key) might
change, but the surrogate key remains the same.
3. Improved Performance: Surrogate keys are often integers or short, fixed-length strings,
which can be more efficient for indexing and lookups compared to large, compound natural
keys.
4. Avoiding Redundancy: Surrogate keys help avoid redundancy when a composite natural
key is used (especially when one or more columns in a composite key can be null).
1. When a Natural Key is Composite or Complex: When the natural key consists of multiple
attributes (composite key), it can become cumbersome to use in foreign key relationships. A
surrogate key simplifies the foreign key relationship.
2. When the Natural Key is Likely to Change: If there is a possibility that the natural key
might change (e.g., a ProductCode that may change in the future), a surrogate key can
provide a stable identifier.
17
DBMS (R23) UNIT-4
3. For Data Integrity and Performance: Surrogate keys, especially when they are integers or
UUIDs, help with indexing, improving query performance, and maintaining data integrity.
Consider an example: 1
Suppose we have two tables of two different schools having the same column registration_no,
name, and percentage, each table having its own natural primary key, that is registration_no.
Table of school A:
registration_no name percentage
210101 Harry 90
210102 Maxwell 65
210103 Lee 87
210104 Chris 76
Table of school B:
CS107 Taylor 49
CS108 Simon 86
CS109 Sam 96
CS110 Andy 58
Now, suppose we want to merge the details of both the schools in a single table.
Resulting table will be:
1 210101 Harry 90
2 210102 Maxwell 65
3 210103 Lee 87
4 210104 Chris 76
18
DBMS (R23) UNIT-4
5 CS107 Taylor 49
6 CS108 Simon 86
7 CS109 Sam 96
8 CS110 Andy 58
As we can observe the above table and see that registration_no cannot be the primary key of the
table as it does not match with all the records of the table though it is holding all unique values of
the table . Now , in this case, we have to artificially create one primary key for this table. We can
do this by adding a column surr_no in the table that contains anonymous integers and has no
direct relation with other columns . This additional column of surr_no is the surrogate key of the
table.
EXAMPLE-2:
Consider a database of Employees in a company. The Employee table might include a natural key
such as an Employee Number (which is meaningful to the business), but for technical reasons, a
surrogate key (e.g., an auto-incrementing integer or a UUID) might be used instead.
In this example:
In this table:
19
DBMS (R23) UNIT-4
CONCEPT-6:
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher, (Teacher) -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive dependency). A relation R is in
BCNF if for every non-trivial FD X->Y, X must be a key.
The above relation is not in BCNF, because in the FD (teacher->subject), teacher is not a key.
This relation suffers with anomalies −
For example, if we delete the student Tahira , we will also lose the information that [Link]
teaches C. This issue occurs because the teacher is a determinant but not a candidate key.
20
DBMS (R23) UNIT-4
Electronics &
102 VLSI Technology B_003 401
Communication Engineering
21
DBMS (R23) UNIT-4
101 201
101 202
102 401
102 402
22
DBMS (R23) UNIT-4
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper
subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E
is in 2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be a prime attribute. So the
highest normal form of relation will be the 2nd Normal form.
Note: A prime attribute cannot be transitively dependent on a key in BCNF relation.
Consider these functional dependencies of some relation R
AB ->C
C ->B
AB ->B
From the above functional dependency, we get that the candidate key of R is AB and AC. A
careful observation is required to conclude that the above dependency is a Transitive Dependency
as the prime attribute B transitively depends on the key AB through C. Now, the first and the third
FD are in BCNF as they both contain the candidate key (or simply KEY) on their left sides. The
second dependency, however, is not in BCNF but is definitely in 3NF due to the presence of the
prime attribute on the right side. So, the highest normal form of R is 3NF as all three FDs satisfy
the necessary conditions to be in 3NF.
Example 3
For example consider relation R(A, B, C)
A -> BC,
B -> A
A and B both are super keys so the above relation is in BCNF.
Note: BCNF decomposition may always not be possible with dependency preserving , however, it
always satisfies the lossless join condition. For example, relation R (V, W, X, Y, Z), with
functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
It would not satisfy dependency preserving BCNF decomposition.
Note: Redundancies are sometimes still present in a BCNF relation as it is not always possible to
eliminate them completely.
CONCEPT-7:
X →→ Y
23
DBMS (R23) UNIT-4
Where:
1. Independence: The values of Y do not depend on the other non-determined attributes in the
relation.
2. Symmetry: The relation holds independently for all combinations of tuples with the same
value of X.
An MVD exists in a relation if, for any given value of X, the set of values for Y is independent of
the other attributes (attributes not in X). This implies that there can be multiple independent sets of
values for Y that correspond to a given value of X.
Consider a relation R with the following attributes: StudentID, Course, and Instructor.
In this case, StudentID →→ Course and StudentID →→ Instructor are multivalued functional
dependencies:
1. StudentID →→ Course means that for each StudentID, there are multiple independent
values for Course (e.g., Student 101 has courses "Database", "Operating Sys", and
"Networking"). The values for Course are independent of the Instructor.
2. StudentID →→ Instructor means that for each StudentID, there are multiple independent
values for Instructor (e.g., Student 101 has instructors "Dr. Smith", "Dr. Lee", and "Dr.
Johnson"). The values for Instructor are independent of the Course.
Thus, there are independent sets of values for both Course and Instructor for each StudentID, but
the two sets are independent of each other.
24
DBMS (R23) UNIT-4
If we do not handle MVDs properly, it can lead to redundant data and update anomalies. For
instance, if you have a student enrolled in multiple courses, you may end up repeating the student's
information in several rows, increasing the possibility of redundancy.
The presence of multivalued functional dependencies violates the Fourth Normal Form (4NF),
which requires that:
A relation is in 4NF if it is in Boyce-Codd Normal Form (BCNF) and does not have any
multivalued dependencies unless the multivalued dependency is trivial (i.e., when the
determinant is a superset of the dependent attributes).
1. Identify and eliminate the MVDs: For every MVD X →→ Y, split the relation into
separate relations that eliminate the redundancy caused by the multivalued dependency.
2. Create separate tables: For each X →→ Y, create a new table with X and Y attributes,
while leaving other attributes unaffected.
This table has the MVD StudentID →→ Course and StudentID →→ Instructor, which violates
4NF.
25
DBMS (R23) UNIT-4
StudentID Course
101 Database
101 Networking
102 Database
StudentID Instructor
After the decomposition, each relation is now in 4NF, and the redundancy caused by multivalued
dependencies is eliminated.
CONCEPT-8:
Fourth Normal Form (4NF) is a higher level of database normalization that deals specifically with
multi-valued dependencies (MVDs). It builds upon the concepts of Boyce-Codd Normal Form
(BCNF), addressing situations where BCNF doesn't fully eliminate redundancy caused by multi-
valued dependencies.
Definition of 4NF:
A relation is in Fourth Normal Form (4NF) if and only if it is in Boyce-Codd Normal Form
(BCNF) and has no multi-valued dependencies (MVDs).
26
DBMS (R23) UNIT-4
4NF Condition:
101 OS French
102 OS French
27
DBMS (R23) UNIT-4
Functional Dependencies:
However, the Course and Language attributes are independent of each other. For example, the
course "DBMS" is independent of the language "English."
Thus, this table violates 4NF because it has multi-valued dependencies where StudentID
determines both Course and Language independently.
1. Student-Course Relation:
StudentID Course
101 DBMS
101 OS
102 DBMS
102 OS
2. Student-Language Relation:
StudentID Language
101 English
101 French
102 Spanish
102 French
28
DBMS (R23) UNIT-4
Both relations are now in BCNF and 4NF, because there are no multi-valued dependencies in either
of them.
4NF vs 5NF:
5NF (Fifth Normal Form) addresses join dependencies, which is a further step beyond
4NF.
While 4NF deals with multi-valued dependencies, 5NF deals with more complex forms of
redundancy that can be eliminated by decomposing a table into smaller relations based on
join dependencies.
NOTE:
4NF is concerned with removing multi-valued dependencies that can cause redundancy
and anomalies in a database.
A relation is in 4NF if it is already in BCNF and if there are no multi-valued dependencies
unless the left side of the dependency is a superkey.
Achieving 4NF can involve decomposing relations into smaller tables to ensure that each
non-prime attribute is fully functionally dependent on the superkey and that no multi-valued
dependencies exist.
CONCEPT-9:
Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a level of
database normalization that ensures there is no redundancy in the database due to join
dependencies. It builds on the previous normal forms, such as BCNF and 4NF, and addresses more
complex forms of redundancy that can arise when dealing with certain types of relationships
between attributes in a table.
29
DBMS (R23) UNIT-4
Definition of 5NF:
A relation is in Fifth Normal Form (5NF) if and only if it is in Fourth Normal Form (4NF) and
cannot be decomposed into smaller relations without losing information. In other words, a relation
is in 5NF if:
It is in 4NF.
It is free from join dependencies that cause redundancy.
Join Dependency: A join dependency exists when a relation can be decomposed into multiple
smaller relations, and the original relation can be reconstructed by joining those smaller relations.
5NF specifically handles situations where such decompositions could lead to redundancy.
5NF is needed to address situations where there are composite keys and complex relationships
between attributes that could cause redundancy due to the way data is split across multiple relations.
These redundancies might not be addressed by 4NF or BCNF.
Example of 5NF:
Consider a table storing information about Projects, Employees, and the Skills required for the
projects.
P1 E1 Java
P1 E1 SQL
P2 E2 Python
P2 E2 SQL
P3 E3 Java
30
DBMS (R23) UNIT-4
P3 E3 Python
Functional Dependencies:
Violation of 5NF:
In this table, there is redundancy due to the fact that an employee might be assigned to different
projects, and each employee might have multiple skills for a project. Specifically, the combination
of ProjectID, EmployeeID, and Skill may lead to an unnecessary repetition of information in the
table.
The redundancy in this case arises due to the join dependency between the three attributes
(ProjectID, EmployeeID, and Skill).
To eliminate this redundancy and move to 5NF, we would decompose the table into smaller
relations where each piece of information is stored only once:
1. Project-Employee relation:
ProjectID EmployeeID
P1 E1
P2 E2
P3 E3
2. Employee-Skill relation:
EmployeeID Skill
E1 Java
E1 SQL
E2 Python
E2 SQL
E3 Java
31
DBMS (R23) UNIT-4
EmployeeID Skill
E3 Python
3. Project-Skill relation:
ProjectID Skill
P1 Java
P1 SQL
P2 Python
P2 SQL
P3 Java
P3 Python
Each relation is now free from any redundancy due to join dependencies.
The original table can be reconstructed by performing natural joins on the three
decomposed relations.
There is no loss of information, and the decomposition is lossless.
Each sub-relation handles a specific aspect of the data, which means there is no redundancy
in terms of storing repeated combinations of ProjectID, EmployeeID, and Skill.
32
DBMS (R23) UNIT-4
To explain all the normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF) with a single example, let's
use a Student-Course-Professor Table that has some redundancies and issues at various stages of
normalization. We will progressively apply normalization rules to this table and explain each form.
We start with the following unnormalized table containing data about students, the courses they
take, and the professors teaching those courses:
The ProfessorPhone is repeated for every student enrolled in the same course taught by the
same professor.
StudentName depends only on StudentID, not on the combination of StudentID and
Course.
We will now normalize this table step by step through the various normal forms.
To convert to 1NF:
We need to ensure that there are no repeating groups and that each attribute contains only atomic
values.
Since the original table already contains atomic values and each row is unique, it is already in 1NF.
33
DBMS (R23) UNIT-4
It is in 1NF.
Every non-prime attribute is fully functionally dependent on the primary key (no partial
dependency).
Partial Dependency:
In the above table, the primary key is a composite of StudentID and Course. However,
StudentName depends only on StudentID, and ProfessorPhone and CourseLocation depend
only on Professor (and indirectly on Course). These are partial dependencies.
To convert to 2NF:
We need to remove partial dependencies. We can do this by decomposing the table into two:
StudentID StudentName
101 John
102 Sarah
103 Mike
34
DBMS (R23) UNIT-4
3. Student-Course-Professor Table:
StudentID Course
101 DBMS
101 OS
102 DBMS
102 OS
103 DBMS
It is in 2NF.
There is no transitive dependency (i.e., non-prime attributes must depend only on the
primary key).
Transitive Dependency:
To convert to 3NF:
We need to remove the transitive dependency by splitting the data further. We create a new table for
Professor information:
1. Professor Table:
Professor ProfessorPhone
35
DBMS (R23) UNIT-4
Professor ProfessorPhone
StudentID Course
101 DBMS
101 OS
102 DBMS
102 OS
103 DBMS
Now, ProfessorPhone is fully dependent on Professor, and there are no transitive dependencies.
It is in 3NF.
For every functional dependency, the left-hand side is a superkey.
Violation of BCNF:
In the Course-Professor Table, the Course attribute determines Professor and CourseLocation,
but Course is not a superkey (the superkey would be a combination of Course and Professor).
To convert to BCNF:
1. Course Table:
36
DBMS (R23) UNIT-4
Course CourseLocation
DBMS Room A
OS Room B
Professor ProfessorPhone
StudentID Course
101 DBMS
101 OS
102 DBMS
102 OS
103 DBMS
Now, Course determines CourseLocation and is part of the primary key, ensuring BCNF.
It is in BCNF.
There are no multi-valued dependencies.
Multi-valued Dependency:
In the Student-Course Table, a student can enroll in multiple courses, and each course has
multiple professors. This creates a potential multi-valued dependency, where a student can have
multiple independent sets of values for courses and professors.
To convert to 4NF:
1. Student-Course Table:
StudentID Course
101 DBMS
101 OS
102 DBMS
102 OS
103 DBMS
StudentID Professor
Now, we have separated the multi-valued dependencies into distinct tables, ensuring no
redundancy.
It is in 4NF.
It cannot be decomposed further without losing information (i.e., no join dependencies).
Join Dependency:
In our example, there are no additional join dependencies because each table contains information
that cannot be further decomposed without losing data.
38
DBMS (R23) UNIT-4
1. Student Table:
StudentID StudentName
101 John
102 Sarah
103 Mike
2. Course Table:
Course CourseLocation
DBMS Room A
OS Room B
3. Professor Table:
Professor ProfessorPhone
4. Student-Course Table:
StudentID Course
101 DBMS
101 OS
39
Surrogate keys significantly simplify foreign key relationships, especially when dealing with complex or composite natural keys, by providing a uniform, single-attribute key. They eliminate the difficulties associated with handling complex natural keys across tables, reducing storage costs and improving query performance due to simplified indexing .
Fourth Normal Form (4NF) extends Boyce-Codd Normal Form (BCNF) by specifically addressing redundancy caused by multi-valued dependencies. While BCNF eliminates redundancy related to composite keys by ensuring that every determinant is a superkey, 4NF handles scenarios with multi-valued dependencies, ensuring that a non-trivial multi-valued dependency does not exist unless it is trivial .
For a database schema to be in Boyce-Codd Normal Form (BCNF), it must first be in Third Normal Form (3NF), and every non-trivial functional dependency must have a superkey as its determinant. This ensures a more simplified and efficient database design by eliminating potential redundancy related to non-superkey candidates .
Natural keys are columns or groups of columns that are derived from the table's data and hold a business meaning, such as a Customer ID in a customer table. They are useful when the identifier has a real-world meaning and likely remains stable. Surrogate keys, on the other hand, are system-generated identifiers with no inherent business meaning, such as an auto-incremented integer. These are preferred in situations where natural keys are composite or complex, likely to change, or when improved performance and simplicity in relation management is necessary .
Surrogate keys generally improve performance because they are usually integers or short, fixed-length strings, which are efficient for indexing and lookups. These keys streamline joins and increase the speed of queries compared to larger, compound natural keys or when handling multi-column natural keys .
To resolve a multi-valued dependency and achieve Fourth Normal Form (4NF), decompose the original table into separate tables that isolate the independent multi-valued associations. For each multi-valued dependency X →→ Y, create a new relation with only attributes X and Y. This reduces redundancy and prevents data anomalies .
To ensure a relation is in Third Normal Form (3NF), decompose the relation so that it is in 2NF and any transitive dependencies are removed. This requires that all non-prime attributes are directly and solely dependent on the primary key, without intermediary dependencies. This normalization reduces redundancy and enhances data integrity and consistency .
A surrogate key is most beneficial in maintaining data integrity in scenarios where a natural key might change over time, such as a ProductCode. A surrogate key provides a stable identifier, ensuring integrity as it remains constant even if business rules change and natural keys are updated .
Surrogate keys can obscure relationships between tables because they have no business meaning, complicating intuitive understanding of data connections. This issue is mitigated by maintaining comprehensive documentation and relationships charts that map surrogate keys to their natural counterparts, or by creating views or joins with meaningful data for human validation .
A relation is in Fifth Normal Form (5NF) if it cannot be decomposed into smaller relations without losing data through lossless join dependencies. This form eliminates any join dependencies, ensuring that a relation is fully decomposed to eliminate redundancy. 5NF builds on 4NF by ensuring there are no non-trivial join dependencies left, meaning the relation cannot be further divided without loss of data integrity .