TOPIC 1: WHY NORMALIZATION?
(The Problem)
The Problem: Bad Database Design
Real-Life Scenario:
You're building a student grade management system. A junior developer creates this
table:
text
STUDENT_COURSE_GRADE table:
student_id | name | email | address
| city | course_id | course_name | instructor | grade
-----------|---------|------------------|---------------------
---|---------|-----------|-------------|------------|-------
1 | Ram | ram@[Link] | 123 Street, Apt 4
| Delhi | CS101 | DBMS | Dr. Smith | A
1 | Ram | ram@[Link] | 123 Street, Apt 4
| Delhi | CS102 | OOP | Dr. Jones | B
2 | Priya | priya@[Link]| 456 Ave, Apt 7
| Mumbai | CS101 | DBMS | Dr. Smith | A+
3 | Raj | raj@[Link] | 789 Lane, Apt 2
| Delhi | CS103 | Networks | Dr. Patel | C
Problems:
Problem 1: Data Redundancy (Duplication)
Ram's information is repeated:
● Email ram@[Link] appears multiple times
● Address 123 Street, Apt 4 appears multiple times
● Name Ram appears multiple times
Problem: Storage wasted. If Ram changes email, update in multiple places or have
inconsistency.
Problem 2: Update Anomalies (Dangerous Updates)
Ram changes his email to newram@[Link].
Question: Do we update:
● Just first row? (Inconsistency — different rows show different email)
● All rows? (Extra work, error-prone)
Real Risk: Some rows updated, some not → Data inconsistency → Trust issues.
Problem 3: Insert Anomalies (Can't Add Data)
New course "Compiler Design" added by Dr. Lee. But no student enrolled yet.
Problem: Can't add course without student_id (would be NULL, violates PK). Course
data stuck.
Problem 4: Delete Anomalies (Unintended Loss)
Ram drops the DBMS course (CS101).
Problem: If we delete that row:
● Student grade record deleted ✓
● But student info (name, email, address) also deleted ✗ (We didn't want to
delete student!)
The Solution: Normalization
Purpose: Break bad table into smaller, well-designed tables to eliminate redundancy
and anomalies.
Real-Life Analogy:
Like organizing a messy file cabinet:
BEFORE: Everything in one huge folder
● Student info, course info, grades all mixed
● Hard to find things, duplicates, chaos
AFTER: Separate folders
● Student folder: student info only
● Course folder: course info only
● Grade folder: which student got what grade
● Easy to find
● No duplication
● Update one place, done
TOPIC 2: FUNCTIONAL DEPENDENCIES (FD)
What is a Functional Dependency?
Simple Definition:
An attribute X functionally determines attribute Y (written X → Y) if:
Whenever X value is same, Y value is also same.
Real-Life Examples:
1. Email → Student_Name
If we know email: ram@[Link], we always know the name is Ram.
text
email | student_name
---------------------|-------------
ram@[Link] | Ram
priya@[Link] | Priya
raj@[Link] | Raj
Email uniquely determines name.
2. StudentID → Email, Address, City
If we know StudentID = 1, we know:
● Email =
● ram@[Link]
● Address = 123 Street
● City = Delhi
text
StudentID → Email
StudentID → Address
StudentID → City
(All three depend on StudentID)
3. CourseID → CourseName, Instructor
If we know CourseID = CS101, we know:
● CourseName = DBMS
● Instructor = Dr. Smith
What is NOT a Functional Dependency?
StudentID → Grade
Why NOT? Same StudentID can have different grades in different courses.
text
StudentID | CourseID | Grade
-----------|----------|-------
1 | CS101 | A
1 | CS102 | B ← Same StudentID, different
grades
StudentID alone doesn't determine Grade.
Needs (StudentID, CourseID) together to determine Grade.
Why FD Matters (Purpose)
FD tells us: "Which attributes can depend on which other attributes?"
This helps us design tables:
● Student info together (all depend on StudentID)
● Course info together (all depend on CourseID)
● Grades separate (depend on both StudentID + CourseID)
How to Find FD in Data
Example: Given data, find FDs:
text
StudentID | Name | Branch | GPA
-----------|-------|--------|------
1 | Ram | CSE | 3.8
2 | Priya | ECE | 3.9
3 | Raj | CSE | 3.5
FDs:
● StudentID → Name (each ID has one name)
● StudentID → Branch (each ID has one branch)
● StudentID → GPA (each ID has one GPA)
● NOT Branch → Name (CSE students have multiple names: Ram, Raj)
TOPIC 3: NORMAL FORMS (1NF, 2NF, 3NF, BCNF)
The Goal
Design tables so that:
✅ No redundancy
✅ No update anomalies
1.
✅ No insert anomalies
2.
✅ No delete anomalies
3.
4.
Higher normal form = Better design
1NF (First Normal Form) — No Repeating Groups
Rule: Each attribute must contain only single values (no arrays, no multi-values in
one cell).
Real-Life Example:
BAD (Not 1NF):
text
StudentID | Name | Courses
-----------|-------|------------------------------------------
1 | Ram | {CS101, CS102, CS103} ← Multiple courses
in one cell!
2 | Priya | {CS101, CS104}
Problem: Can't easily query "which students take CS101?" (courses embedded in
text).
GOOD (1NF):
text
StudentID | Name | CourseID
-----------|-------|----------
1 | Ram | CS101
1 | Ram | CS102
1 | Ram | CS103
2 | Priya | CS101
2 | Priya | CS104
Solution: Separate rows for each course-student pair.
How to achieve 1NF:
● Remove multi-valued attributes (like arrays)
● Create separate rows or separate table
2NF (Second Normal Form) — No Partial Dependencies
Prerequisites: Must be in 1NF first.
Rule: Non-key attributes must depend on entire primary key, not just part of it.
Real-Life Example:
BAD (1NF but not 2NF):
text
StudentID | CourseID | Instructor | Grade
-----------|----------|------------|-------
1 | CS101 | Dr. Smith | A
1 | CS102 | Dr. Jones | B
2 | CS101 | Dr. Smith | A+
Problem: Instructor depends on CourseID only, not on (StudentID, CourseID).
● If CourseID = CS101, Instructor is always Dr. Smith
● StudentID is irrelevant for determining Instructor
● Partial dependency: Instructor depends on part of the key (CourseID)
Redundancy: Dr. Smith repeated for every student in CS101.
GOOD (2NF):
Split into two tables:
Table 1: ENROLLMENT (StudentID, CourseID, Grade)
text
StudentID | CourseID | Grade
-----------|----------|-------
1 | CS101 | A
1 | CS102 | B
2 | CS101 | A+
Table 2: COURSE (CourseID, Instructor)
text
CourseID | Instructor
---------|----------
CS101 | Dr. Smith
CS102 | Dr. Jones
Now: Each attribute depends on entire primary key of its table.
● In ENROLLMENT: Grade depends on (StudentID, CourseID)
● In COURSE: Instructor depends on CourseID (which is entire key)
3NF (Third Normal Form) — No Transitive Dependencies
Prerequisites: Must be in 2NF first.
Rule: Non-key attributes must depend directly on primary key, not indirectly through
another attribute.
Real-Life Example:
BAD (2NF but not 3NF):
text
StudentID | Name | BranchID | BranchName | BranchLocation
-----------|-------|----------|------------|----------------
1 | Ram | 10 | CSE | Building A
2 | Priya | 20 | ECE | Building B
3 | Raj | 10 | CSE | Building A
Problem: Transitive Dependency
● StudentID → BranchID (direct)
● BranchID → BranchName (indirect through BranchID)
● StudentID → BranchName (transitive through BranchID)
Problem: BranchName, BranchLocation depend on BranchID, not directly on
StudentID.
Redundancy: "CSE, Building A" repeated for every CSE student.
Update Anomaly: If Building A changes to Building C, update multiple rows or
inconsistency.
GOOD (3NF):
Split into two tables:
Table 1: STUDENT (StudentID, Name, BranchID)
text
StudentID | Name | BranchID
-----------|-------|----------
1 | Ram | 10
2 | Priya | 20
3 | Raj | 10
Table 2: BRANCH (BranchID, BranchName, BranchLocation)
text
BranchID | BranchName | BranchLocation
---------|------------|----------------
10 | CSE | Building A
20 | ECE | Building B
Now: No transitive dependencies.
● In STUDENT: Name, BranchID depend directly on StudentID
● In BRANCH: BranchName, BranchLocation depend directly on BranchID
● Building location stored once, update in one place
BCNF (Boyce-Codd Normal Form) — Stricter 3NF
Rule: Every determinant must be a candidate key.
In Simple Terms: If X → Y, then X must be a candidate key (can identify unique
record).
Real-Life Example:
Scenario: University course assignment
text
Professor | Course | Time
-----------|-----------|----------
Dr. Smith | DBMS | 9 AM
Dr. Smith | OOP | 10 AM
Dr. Jones | DBMS | 2 PM
Dr. Patel | Networks | 11 AM
FDs:
● (Professor, Course) → Time
● Course → Professor (each course taught by one professor)
Problem: Course → Professor means Course is a determinant but not a candidate
key.
This violates BCNF (though it's in 3NF).
GOOD (BCNF):
Table 1: COURSE_PROFESSOR (CourseID, Professor)
text
CourseID | Professor
---------|----------
CS101 | Dr. Smith
CS102 | Dr. Smith
CS103 | Dr. Jones
CS104 | Dr. Patel
Table 2: CLASS_SCHEDULE (Professor, CourseID, Time)
text
Professor | CourseID | Time
-----------|----------|----------
Dr. Smith | CS101 | 9 AM
Dr. Smith | CS102 | 10 AM
Dr. Jones | CS103 | 2 PM
Dr. Patel | CS104 | 11 AM
Now: All determinants are candidate keys → BCNF satisfied.
Summary: Normal Forms Comparison
Normal Rule Focus Problem Eliminated
Form
1NF No repeating groups Atomic values only Multi-valued attributes
Non-key attrs depend Redundancy from partial
2NF No partial dependencies
on entire key key dependency
No transitive Non-key attrs depend Redundancy from indirect
3NF
dependencies directly on PK dependencies
Every determinant is
BCNF Stricter than 3NF Rare cases 3NF misses
candidate key
In Practice: Aim for 3NF (good balance) or BCNF (if requirements allow).
TOPIC 4: MULTIVALUED DEPENDENCIES (MVD) & 4NF
What is Multivalued Dependency?
Definition: Attribute X multivalued determines Y (X ↠ Y) if:
The set of Y values depends only on X value (not on other attributes).
Real-Life Example:
Scenario: University tracks professors' skills and languages
text
Professor | Skill | Language
-----------|--------------|----------
Dr. Smith | Teaching | English
Dr. Smith | Teaching | Hindi
Dr. Smith | Research | English
Dr. Smith | Research | Hindi
MVD: Professor ↠ Skill (Prof's skills independent of languages)
MVD: Professor ↠ Language (Prof's languages independent of skills)
Key point: Dr. Smith's skills {Teaching, Research} are independent of languages
{English, Hindi}.
So every skill pairs with every language (Cartesian product).
Problem with MVD:
Redundancy: All combinations stored explicitly:
● Teaching + English ✓
● Teaching + Hindi ✓
● Research + English ✓
● Research + Hindi ✓
If Dr. Smith learns French, add 2 new rows (one for each skill).
4NF (Fourth Normal Form)
Rule: Eliminate MVD by creating separate tables.
BAD (3NF but violates 4NF):
text
Professor | Skill | Language
-----------|--------------|----------
Dr. Smith | Teaching | English
Dr. Smith | Teaching | Hindi
Dr. Smith | Research | English
Dr. Smith | Research | Hindi
GOOD (4NF):
Table 1: PROFESSOR_SKILL (Professor, Skill)
text
Professor | Skill
-----------|----------
Dr. Smith | Teaching
Dr. Smith | Research
Table 2: PROFESSOR_LANGUAGE (Professor, Language)
text
Professor | Language
-----------|----------
Dr. Smith | English
Dr. Smith | Hindi
Benefits:
● No redundancy
● Easy to add new skill/language (add one row each)
● No unnecessary combinations
TOPIC 5: LOSSLESS vs LOSSY DECOMPOSITION
What is Decomposition?
Breaking one table into multiple tables.
Question: When we decompose, do we lose information?
Lossless-Join Decomposition (GOOD)
Definition: When we decompose and then join back, we get exactly the original table.
No information lost.
Real-Life Example:
Original Table:
text
StudentID | Name | CourseID | Grade
-----------|-------|----------|-------
1 | Ram | CS101 | A
1 | Ram | CS102 | B
2 | Priya | CS101 | A+
Decompose into:
Table 1: STUDENT (StudentID, Name)
text
StudentID | Name
-----------|-----
1 | Ram
2 | Priya
Table 2: ENROLLMENT (StudentID, CourseID, Grade)
text
StudentID | CourseID | Grade
-----------|----------|-------
1 | CS101 | A
1 | CS102 | B
2 | CS101 | A+
Join back (StudentID):
text
StudentID | Name | CourseID | Grade
-----------|-------|----------|-------
1 | Ram | CS101 | A
1 | Ram | CS102 | B
2 | Priya | CS101 | A+
Result: Exactly original table ✓ Lossless-join!
Lossy Decomposition (BAD)
Definition: When we decompose, information is lost. Joining back gives spurious
data.
Real-Life Example:
Original Table:
text
StudentID | Name | CourseID
-----------|-------|----------
1 | Ram | CS101
1 | Ram | CS102
2 | Priya | CS101
BAD Decomposition:
Table 1: STUDENT (StudentID, Name)
text
StudentID | Name
-----------|-----
1 | Ram
2 | Priya
Table 2: COURSE (CourseID) ← Missing StudentID!
text
CourseID
----------
CS101
CS102
Try to join: Can't! No common attribute between tables.
Even if we had COURSE(StudentID, CourseID), joining creates spurious data:
text
StudentID | Name | CourseID
-----------|-------|----------
1 | Ram | CS101 ← OK
1 | Ram | CS102 ← OK
2 | Priya | CS101 ← OK
2 | Priya | CS102 ← WRONG! Priya never took CS102
Lost information: Now Priya appears to take CS102, which she didn't.
How to Check Lossless-Join?
Using FDs: If decomposition respects FDs, it's likely lossless.
Real Test: Decompose R into R1 and R2. Check if:
R1 ⋈ R2 = R (joining back gives original)
TOPIC 6: DEPENDENCY PRESERVATION
Definition: A decomposition is dependency-preserving if all original FDs are
preserved in the decomposed tables.
Real-Life Example:
Original Table with FD: StudentID → Name, CourseID → Instructor
text
StudentID | Name | CourseID | Instructor
-----------|-------|----------|----------
1 | Ram | CS101 | Dr. Smith
Decomposition:
Table 1: STUDENT (StudentID, Name)
FD: StudentID → Name ✓ (preserved)
Table 2: COURSE (CourseID, Instructor)
FD: CourseID → Instructor ✓ (preserved)
All FDs preserved → Dependency-preserving decomposition ✓
NOW: ALL UNIT III PYQS WITH ANSWERS
2-MARK QUESTIONS (Section A)
Q 2022-23(e): "List all prime and non-prime attributes in
R(A,B,C,D,E) with FD set F = {AB→C, B→E, C→D}."
ANSWER:
Step 1: Find Primary Key
Check which attributes can determine all others:
● A alone? No (can't determine B, D, E)
● B alone? No (can only get E, not A, C)
● AB together?
● AB → C (given)
● AB → E (B → E, so AB → E)
● AB → D (C → D, and AB → C, so AB → D)
● AB → all ✓
Primary Key = {A, B}
Step 2: Prime vs Non-Prime
● Prime attributes: In primary key = A, B
● Non-prime attributes: Not in key = C, D, E
Answer:
● Prime attributes: A, B
● Non-prime attributes: C, D, E
Q 2024-25(e): "List all prime and non-prime attributes in
R(A,B,C,D,E) with FD set F = {AB→C, B→E, C→D}."
ANSWER:
(Same as above)
Q 2022-23(f): "Why do we normalize database?"
ANSWER:
1. Eliminate Redundancy: Remove duplicate data storage
2. Prevent Update Anomalies: Avoid inconsistency when updating
3. Prevent Insert Anomalies: Allow inserting data even if some attributes empty
4. Prevent Delete Anomalies: Delete one record without losing related data
5. Improve Data Integrity: Maintain consistency and accuracy
6. Reduce Storage: Store data once, use everywhere
10-MARK QUESTIONS (Section B & C)
Q 2022-23(2c): "Given FD set F = {AB→C, C→A, BC→D,
ACD→B, BE→C, EC→FA, CF→BD, D→E}, Find a minimum
cover."
ANSWER:
Minimum Cover (Canonical Cover): Smallest equivalent FD set with no redundant
dependencies.
Algorithm:
Step 1: Remove extraneous attributes on RHS (decompose)
Split all FDs to single attribute on RHS:
text
AB→C, C→A, BC→D, ACD→B, BE→C, EC→F, EC→A, CF→B, CF→D, D→E
Step 2: Remove redundant FDs
Check which FDs are redundant (can be inferred from others).
After checking dependencies:
● D→E is essential
● EC→A is redundant (can infer from others)
● EC→F is essential
● etc.
Minimum Cover (one possible answer):
text
AB→C, C→A, BC→D, D→E, EC→F, CF→B
(Exact answer depends on careful analysis of all dependencies)
Q 2023-24(5a): "Given FD set on R(V,W,X,Y,Z): {Z→V, W→Y,
XY→Z, V→WX}. Check if decompositions are lossless."
ANSWER:
Decomposition 1: R1(V,W,X), R2(V,Y,Z)
Check if R1 ⋈ R2 = R:
● Common attribute: V
● FD Z→V tells us Z determines V
● But Z not in R1 alone
● Likely lossy (need detailed check via joining)
Decomposition 2: R1(V,W,X), R2(X,Y,Z)
Check if R1 ⋈ R2 = R:
● Common attribute: X
● XY→Z (given)
● X from R1, Y from R2
● When joined on X, can reconstruct Z ✓
● Likely lossless
Q 2023-24(5b): "R(A,B,C,D,E,F,G,H,I,J) with FDs {AB→C,
A→DE, B→F, F→GH, D→IJ}. Find key and decompose into
2NF and 3NF."
ANSWER:
Step 1: Find Primary Key
Check which attributes determine all:
● A alone? No (B not determinable)
● B alone? No (A not determinable)
● AB together?
● AB→C (given)
● A→DE (given)
● B→F (given)
● F→GH (given)
● D→IJ (given)
● AB → all ✓
Key = {A, B}
Step 2: Decompose to 2NF (Remove Partial Dependencies)
Partial dependencies from key {A,B}:
● A→D, A→E (depend on A only)
● B→F (depends on B only)
Create separate tables for partial dependencies:
Table 1: R1(A, D, E, I, J)
● A→DE, D→IJ
Table 2: R2(B, F, G, H)
● B→F, F→GH
Table 3: R3(A, B, C)
● AB→C (full key dependency)
Step 3: Decompose to 3NF (Remove Transitive Dependencies)
From Table 1: A→D→IJ (transitive)
Split:
Table 1a: R1a(A, D, E)
● A→DE
Table 1b: R1b(D, I, J)
● D→IJ
From Table 2: B→F→GH (transitive)
Split:
Table 2a: R2a(B, F)
● B→F
Table 2b: R2b(F, G, H)
● F→GH
Final 3NF Tables:
text
R1a(A, D, E)
R1b(D, I, J)
R2a(B, F)
R2b(F, G, H)
R3(A, B, C)
Q 2023-24(5c): "Describe MVD with example. Discuss 4NF
and 5NF."
ANSWER:
Multivalued Dependency (MVD) Definition:
Attribute X multivalued determines Y (X ↠ Y) if the set of Y values for a given X value
is independent of other attributes.
Real Example:
University professor-skills-languages:
text
Professor | Skill | Language
-----------|-----------|----------
Dr. Smith | Teaching | English
Dr. Smith | Teaching | Hindi
Dr. Smith | Research | English
Dr. Smith | Research | Hindi
MVDs:
● Professor ↠ Skill
● Professor ↠ Language
(Skill and Language independent; every skill pairs with every language)
Problem: Redundancy and anomalies (similar to 2NF/3NF problems).
4NF (Fourth Normal Form)
Rule: Remove MVDs by placing independent multivalued attributes in separate
tables.
Example:
Table 1: PROFESSOR_SKILL
text
Professor | Skill
-----------|----------
Dr. Smith | Teaching
Dr. Smith | Research
Table 2: PROFESSOR_LANGUAGE
text
Professor | Language
-----------|----------
Dr. Smith | English
Dr. Smith | Hindi
Benefits:
● No redundancy (skills/languages stored once)
● Easy to add new skill (just add row to Table 1)
● No spurious combinations
5NF (Fifth Normal Form / Project-Join Normal Form)
Rule: Eliminate join dependencies (complex relationships between multiple
attributes).
Advanced: Rarely used in practice. Requires decomposition into multiple tables
where only certain joins are valid.
Example: 3-way relationships like (Project, Part, Supplier) where not all combinations
valid.
text
Supplier S can supply Part P for Project J only if all three
exist together.
5NF ensures: Only valid combinations of suppliers, parts, projects are stored.
Q 2024-25(5a): "Library database: Book(Title, Author,
Catalog_no, Publisher, Year, Price), Collection(Title, Author,
Catalog_no) with FDs. Find highest normal form."
Given FDs:
text
I. Title, Author → Catalog_no
II. Catalog_no → Title, Author, Publisher, Year
III. Publisher, Title, Year → Price
ANSWER:
Book table analysis:
Check 1NF: All attributes atomic ✓
Check 2NF:
● Key candidates: {Title, Author}, {Catalog_no}
● Primary key: Catalog_no (unique identifier)
● All non-key attributes {Title, Author, Publisher, Year, Price} depend on entire
key ✓
Check 3NF:
● Catalog_no → Title, Author ✓ (direct to key)
● Catalog_no → Publisher, Year ✓ (direct to key)
● Publisher, Title, Year → Price
● This depends on Publisher, Title, Year (not directly on Catalog_no)
● But these are determined by Catalog_no anyway
● Transitive? No transitive dependency exists for non-key attrs
Book is in 3NF
Collection table analysis:
Same analysis: Collection is in 3NF
Answer: Highest Normal Form = 3NF
Q 2024-25(5b): "R(A,B,C,D,E,F,G,H,I,J) with FDs. Find key
and define MVD with example."
ANSWER:
(Key finding: Same as Q 2023-24(5b) above)
MVD Definition with Example:
(Already covered above in TOPIC 6)
Q 2022-23(5a): "Decomposition R(V,W,X,Y,Z) into
R1(V,W,X), R2(V,Y,Z). Check lossless-join."
ANSWER:
(Already covered in TOPIC 5 above)
Method:
1. Common attribute: V
2. Check if join R1 ⋈ R2 equals original R
3. Analyze FDs to confirm lossless
Q 2024-25(2c): "Decomposition check - lossy or lossless?"
Given: R(P, Q, S, T, X, Y, Z, W) with FDs. Two decompositions:
Decomposition 1: R = [(P,Q,S,T), (P,T,X), (Q,Y), (Y,Z,W)]
Decomposition 2: R = [(P,Q,S), (T,X), (Q,Y), (Y,Z,W)]
ANSWER:
Check each decomposition by:
1. Listing attributes in each table
2. Checking if joining back recovers all original attributes
3. Verifying no spurious data
D1 Analysis:
● (P,Q,S,T) + (P,T,X) → can join on P,T → recovers X
● (Q,Y) + (Y,Z,W) → can join on Y → recovers Z,W
● All original attributes present
● Likely lossless-join (if FD relationships support)
D2 Analysis:
● (P,Q,S) missing T and X initially
● (T,X) has T but no P,Q (can't join directly)
● Missing connection between first group and T,X
● Likely lossy (information lost in joins)