0% found this document useful (0 votes)
7 views30 pages

PP 4 Solution

The document outlines the components of a database system, including hardware, software, data, users, and procedures. It explains data models and their types, as well as the concepts of super keys, candidate keys, and primary keys. Additionally, it discusses integrity constraints, concurrency issues, transaction logs, and recovery processes in database management.

Uploaded by

mikehunt51214
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views30 pages

PP 4 Solution

The document outlines the components of a database system, including hardware, software, data, users, and procedures. It explains data models and their types, as well as the concepts of super keys, candidate keys, and primary keys. Additionally, it discusses integrity constraints, concurrency issues, transaction logs, and recovery processes in database management.

Uploaded by

mikehunt51214
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Here is a simple, clear, and detailed answer for all parts (a, b, c).

I will explain in easy words (as you prefer simple form).

a) Components of a Database System

A database system has five major components:

1. Hardware

 Physical devices where the database stores data.

 Examples: hard disks, SSDs, servers, RAM, network devices.

 Purpose: store data safely and allow fast access.

2. Software
 All programs used to manage the database.

 Mainly includes the DBMS (Database Management System) like MySQL, Oracle, SQL
Server.

 DBMS handles:

o Data storage

o Query processing

o Backups

o Security

o Concurrency control (handling multiple users)

3. Data

 The actual information stored in the database.

 Example: Student table, Customer records, Marks, Employees etc.

 Data is organized using tables, schemas, and relationships.

4. Users

People who interact with the DBMS:

a. Database Administrator (DBA)

 Controls entire database, manages security, backups, performance.

b. Application Programmers

 Write programs/apps that use the database (e.g., banking app developers).

c. End Users

 Actual users like students using LMS, customers using shopping app.

5. Procedures

 Instructions and rules for using the database.


 Example:

o Steps for backup

o User login procedures

o Security rules

o How to restore data

o How to run a specific stored procedure

b) What is a Data Model?

A data model is a framework that describes:

 How data is stored

 How data is connected

 How data can be accessed

It provides structure and rules for organizing data.

Types of Data Models

1. Hierarchical Data Model

 Data arranged in a tree structure (parent → child).

 Each child has one parent only.

 Example: File system folders.

2. Network Data Model

 Like hierarchical but more flexible.

 A child can have multiple parents.

 Uses graphs instead of trees.

3. Relational Data Model (MOST IMPORTANT)


 Data stored in tables (relations).

 Tables have rows (tuples) and columns (attributes).

 Based on mathematical relations.

 Example: MySQL, SQL Server, Oracle.

4. Object-Oriented Data Model

 Data stored as objects (similar to OOP).

 Supports classes, inheritance, methods.

5. Entity–Relationship (ER) Model

 Uses entities, attributes, relationships.

 Represented using ER diagrams.

 Example: Student–Course enrollment diagram.

c) Super Key, Candidate Key & Primary Key

Let’s take a sample table:

Student Table

RollNo CNIC Name Email Phone

12 35201-... Ali ali@[Link] 0300-123

Assume:

 RollNo is unique

 CNIC is unique

 Email is unique

 Phone is NOT unique

1. Super Key
A super key is any set of attributes that uniquely identifies a record.

Examples:

 {RollNo}

 {CNIC}

 {Email}

 {RollNo, Name}

 {RollNo, Phone}

 {RollNo, Email, Name}

👉 Even if it has extra unnecessary attributes, it’s still a superkey.

Easy Definition:
✔ A superkey = unique identifier + extra fields allowed.

2. Candidate Key

A candidate key is a minimal superkey.


Minimal means:

 It can uniquely identify a row

 It has no extra unnecessary attributes

From our table:

Candidate keys are:

 {RollNo}

 {CNIC}

 {Email}

Why?
Because each of them individually can identify a row and have no extra attribute.

3. Primary Key

A primary key is the candidate key that we choose to use as the main unique identifier.
From candidate keys {RollNo}, {CNIC}, {Email},
the designer might choose RollNo as the primary key.

✔ Primary key must be:

 Unique

 Minimal

 Not null

 Does not change

Short Summary (easy to remember)

Key Type Meaning Example

Super Key Any attribute(s) that uniquely identifies a row RollNo, CNIC, (RollNo + Name)

Candidate Key Minimal super key (no extra attributes) RollNo, CNIC, Email

Primary Key Chosen candidate key RollNo

If you want, I can also make MCQs, short questions, or exam notes for this topic.

Nice set of questions — I’ll explain each in clear points + concrete examples so it’s easy to
remember.

1) Entity integrity and referential integrity (with example)

Quick definitions (one-line)

 Entity integrity: every table row must have a unique, non-NULL primary key.

 Referential integrity: foreign key values in one table must either match a primary (or
candidate) key value in the referenced table or be NULL (if allowed).

Why they matter

 Entity integrity ensures every record is uniquely identifiable.

 Referential integrity ensures relationships between tables are consistent (no “dangling”
references).
Example (Student & Enrollment)

Two tables:

Student

StudentID (PK) Name Email

101 Ali ali@[Link]

102 Sara sara@[Link]

Enrollment

EnrollID (PK) StudentID (FK) Course

1 101 CS101

2 103 MATH201

 Entity integrity: [Link] and [Link] are primary keys → cannot


be NULL and must be unique. If someone tries INSERT INTO Student(StudentID, Name)
VALUES (NULL, 'Zeeshan') the DBMS rejects it because PK cannot be NULL.

 Referential integrity: [Link] is a foreign key referencing


[Link]. In the example above the row with StudentID = 103 in Enrollment
violates referential integrity if StudentID 103 does not exist in Student. The DBMS will
reject that insert (or you must define rules to handle it).

Referential actions (what DBMS can do when referenced row changes)

When you delete or update a referenced Student row, DBMS can be told to:

 CASCADE — automatically delete/update child rows (e.g., delete enrollments when


student deleted).

 SET NULL — set the child FK to NULL (if allowed).

 SET DEFAULT — set to a default value.

 RESTRICT / NO ACTION — prevent the delete/update if child rows exist.

Short checklist

 PK ≠ NULL, unique → entity integrity.

 FK either matches an existing PK or is NULL (if allowed) → referential integrity.

 Use cascade or restrict policies to control how deletes/updates propagate.


2) Concurrent transactions and common problems

What are concurrent transactions?

Multiple transactions executing at (roughly) the same time on the same database. Concurrency
is necessary for throughput and responsiveness, but it can create consistency problems if not
controlled.

ACID reminder

 Atomacity, Consistency, Isolation, Durability.


Concurrency control mainly enforces Isolation so transactions don’t interfere.

Common problems (with short examples)

1. Lost Update

o Two transactions T1 and T2 read the same row and then both update it; one
update is lost.

o Example: Account balance 100.

 T1 reads 100, adds 50 → will write 150.

 T2 reads 100, subtracts 20 → will write 80.

 If T2 commits last and overwrites, T1’s effect is lost.

2. Dirty Read

o T1 updates a value but hasn’t committed. T2 reads that uncommitted value. If T1


later rolls back, T2 has used invalid data.

o Example: T1 sets salary = 2000 but aborts; T2 read salary = 2000 and used it in a
calculation → wrong.

3. Non-repeatable Read

o T1 reads a row twice during its execution and gets different values because T2
modified and committed in between.

o Example: T1 reads student grade = 80, later reads again and now sees 85
(changed by T2).

4. Phantom Read
o T1 runs a query that retrieves a set of rows (e.g., all students with grade > 75). T2
inserts or deletes rows that would change the result set. When T1 runs the same
query again it sees different rows (new “phantoms”).

o Different from non-repeatable read because the set membership changes.

5. Write Skew (in some weaker isolation levels, e.g., snapshot isolation anomalies)

o Two transactions concurrently check conditions and write different rows, leading
to a constraint violation that wouldn’t happen if serialized.

How DBMS prevent/mitigate these problems

 Isolation levels (SQL standard): READ UNCOMMITTED, READ COMMITTED, REPEATABLE


READ, SERIALIZABLE

o Higher levels prevent more problems but reduce concurrency.

o E.g., READ COMMITTED prevents dirty reads; REPEATABLE READ prevents non-
repeatable reads; SERIALIZABLE prevents phantoms (gives serial execution
effect).

 Locking protocols

o Two-Phase Locking (2PL): acquire locks, then release after commit (ensures
serializability if strict). Locks: shared (S) for read, exclusive (X) for write.

 Optimistic concurrency control

o Transactions proceed without locks; check for conflicts at commit time (good for
low contention).

 Timestamp ordering

o Each transaction gets timestamp; DBMS orders reads/writes by timestamp.

 MVCC (Multi-Version Concurrency Control)

o Keeps multiple versions so readers don’t block writers (used by PostgreSQL,


Oracle, MySQL InnoDB). Different isolation semantics depending on
implementation (prevents many read issues, but can still allow some anomalies
unless serializable).

Practical advice
 Pick isolation level based on application needs: banking → SERIALIZABLE or strict
locking; reporting dashboards → lower isolation (READ COMMITTED / snapshot) to
improve concurrency.

 Use short transactions and avoid user interaction inside a transaction to reduce lock wait
times.

3) Role of the transaction log and the database change log in recovery

I’ll treat transaction log (also called WAL / redo/undo log) and database change log (e.g., binary
log, change data capture logs) as two related but distinct things — both help recovery and
replication.

Transaction Log (Write-Ahead Log, WAL)

What it is:

 A sequential log that records every change a transaction makes (often recording before
and/or after images or logical operations). Examples: SQL Server transaction log,
PostgreSQL WAL.

Key properties / rules:

 Write-Ahead: log records are written to stable storage before the actual data pages are
written to disk. This ensures changes can be redone or undone after a crash.

 Sequential append — fast to write.

Roles in recovery:

1. Crash recovery (atomicity & durability):

o After a crash, the DBMS scans the log to:

 Redo committed transactions whose data pages may not have been
flushed to disk (bring DB to the state of all committed transactions).

 Undo uncommitted transactions (roll back their partial effects).

o This guarantees atomicity (no partial transactions remain) and durability


(committed changes persist).

2. Point-in-time recovery (PITR):

o Restore the last full backup, then apply logged changes up to a chosen point in
time using the transaction log.
3. Crash → steps simplified:

o Read log from last checkpoint, identify committed vs uncommitted txns, redo
committed, undo uncommitted.

Example:

 You commit transaction T100. The DBMS writes T100’s log record to WAL and
acknowledges commit. Later the DBMS writes the actual data pages lazily. If a crash
happens before pages are written, recovery reads WAL and reapplies T100 (redo),
ensuring T100’s effects are present.

Database Change Log (binlog / CDC)

What it is:

 A log of changes meant often for replication, auditing, or change data capture.
Examples: MySQL binary log (binlog), Debezium CDC streams, Oracle redo +
supplemental logs used for replication.

Role in recovery & other uses:

 Replication: Master writes binlog; replicas read and apply changes to stay synchronized.

 Point-in-time recovery / audit: Can replay binlog after a backup to recreate DB state at a
specific time or to extract which rows changed when.

 Auditing / CDC pipelines: Downstream systems use change log to update caches, search
indexes, analytics.

Difference between transaction log and binlog (typical):

 Transaction log (WAL): primarily for crash recovery and internal consistency
(redo/undo). May be physical/low-level. Not always intended for external consumption.

 Change log / binlog: often logical (e.g., SQL statements or row images) and used for
replication and audit. Format and retention/purposes may differ from WAL.

Note: Some DBMS combine responsibilities; e.g., MySQL’s InnoDB has its redo log (WAL) and
MySQL also has a binary log (binlog) used for replication and PITR.

Recovery scenarios — how logs are used (stepwise)

1. Crash recovery (server crash):


o DBMS starts → reads transaction log (from last checkpoint).

o Redo all committed transactions not yet applied to data files.

o Undo any transactions that were active and uncommitted at crash time.

2. Media failure (disk lost) → full restore + recovery:

o Restore from full backup (and possibly incremental backups).

o Apply transaction logs / binlogs in order to roll forward to desired point-in-time.

3. Point-in-time recovery:

o Restore backup → apply logs up to the timestamp you want → stop.

Good practices for logs & recovery

 Frequent checkpoints to limit how far back recovery must scan.

 Separate log devices (fast disks) to improve durability and performance.

 Backup + log shipping (regularly copy logs offsite) for disaster recovery.

 Retention policy for binlogs and WALs must balance space vs ability to recover to older
points.

Short summary (one-paragraph)

 Entity integrity = PKs must be unique & not NULL; referential integrity = FKs must refer
to existing PKs or be NULL/handled by rules (cascade, restrict).

 Concurrent transactions increase throughput but can cause lost updates, dirty reads,
non-repeatable reads, phantoms, etc.; DBMS use isolation levels, locking, MVCC, or
optimistic control to prevent them.

 Transaction log (WAL) is the DB’s internal sequential log used primarily for crash
recovery, redo/undo and durability; database change log / binlog is often a logical log
used for replication, auditing, and point-in-time recovery. Both are essential to bring a
database back to a consistent state after failures.

If you want, I can:

 Draw a small timeline showing locks, reads, writes for a lost-update example.
 Create short exam-style questions from these topics.

 Show SQL examples to define FK with CASCADE/RESTRICT and how a DB rejects actions
that violate referential integrity.

Which would help you most next?

Below is a clear, step-by-step, detailed explanation of the problem, including:

✔ Entities
✔ Attributes
✔ Relationship types
✔ Cardinalities (min–max)

I will explain in simple language with examples.

Step 1: Identify the Entities

From the story, the main objects we store information about are:

1. Customer

A person who brings a car for an estimate.

Attributes

 CustomerID (PK)

 Name

 Phone

 Address

 Email

2. Car
A car owned by a customer.

Attributes

 CarID (PK)

 Make (e.g., Toyota)

 Model

 Year

 Color

 LicensePlate

 CustomerID (FK → Customer)

A customer can have multiple cars.

3. Mechanic

A mechanic who performs repairs.

Attributes

 MechanicID (PK)

 Name

 Phone

 SkillLevel

4. Estimate

Before job approval, the mechanic gives an estimate.

Attributes

 EstimateID (PK)

 EstimatedCost

 EstimatedTime

 Date
 MechanicID (FK)

 CarID (FK)

5. Job

Created when the customer accepts the estimate.

Attributes

 JobID (PK)

 JobDate (appointment date)

 ActualStartDate

 Status

 CarID (FK)

 CustomerID (FK)

 MechanicID (FK)

 EstimateID (FK)

6. Repair

A job is made up of several repairs (e.g., “Replace left fender”, “Paint door”).

Attributes

 RepairID (PK)

 RepairType (e.g., painting, body repair)

 Description

 LaborHours

 JobID (FK)

7. PartsUsed

Each repair may use several parts.


Attributes

 PartID (PK)

 PartName

 UnitPrice

 Quantity

 RepairID (FK)

Step 2: Identify the Relationships + min–max cardinalities

I will describe each relationship and also give the min–max notation like:
Customer — (1,N) Car

1. Customer — Car

Relationship: A customer owns a car.

 One customer can have many cars.

 Each car belongs to exactly one customer.

Cardinality (min–max):

 Customer → Car = (1, N)

 Car → Customer = (1, 1)

Min interpretation:

 A customer must own at least 1 car to appear in this system.

 A car must belong to exactly 1 customer.

2. Car — Estimate

Relationship: Car gets inspected for an estimate.

 A car may get multiple estimates over time.

 An estimate belongs to exactly one car.


Cardinality:

 Car → Estimate = (0, N)

 Estimate → Car = (1, 1)

Min 0 because not every car must have an estimate.

3. Mechanic — Estimate

Relationship: Mechanic prepares estimate.

 A mechanic can prepare many estimates.

 Each estimate is prepared by one mechanic.

Cardinality:

 Mechanic → Estimate = (0, N)

 Estimate → Mechanic = (1, 1)

4. Estimate — Job

Relationship: A job is created only when estimate is accepted.

 Each estimate can lead to at most 1 job.

 A job must come from exactly 1 estimate.

Cardinality:

 Estimate → Job = (0, 1)

 Job → Estimate = (1, 1)

5. Job — Mechanic

Relationship: Only ONE mechanic works on a job.

 A mechanic can handle many jobs.

 A job has exactly one mechanic.

Cardinality:
 Mechanic → Job = (0, N)

 Job → Mechanic = (1, 1)

6. Job — Repair

Relationship: A job includes several repairs.

 A job must have at least 1 repair, usually many.

 A repair belongs to one job only.

Cardinality:

 Job → Repair = (1, N)

 Repair → Job = (1, 1)

7. Repair — PartsUsed

Relationship: Repairs need parts.

 One repair may use zero or many parts.


(Example: painting job uses no part)

 A part record belongs to one repair.

Cardinality:

 Repair → PartsUsed = (0, N)

 PartsUsed → Repair = (1, 1)

Final Summary Table

Entity Key Attributes

Customer CustomerID, Name, Phone

Car CarID, Make, Model, Year, LicensePlate

Mechanic MechanicID, Name

Estimate EstimateID, EstimatedCost, EstimatedTime


Entity Key Attributes

Job JobID, JobDate, Status

Repair RepairID, RepairType, LaborHours

PartsUsed PartID, PartName, Quantity

Final Relationship Summary with Min–Max

Relationship Min–Max Meaning

Customer–Car Customer (1,N), Car (1,1) Customer owns many cars

Car–Estimate Car (0,N), Estimate (1,1) Car can have many estimates

Mechanic–Estimate Mechanic (0,N), Estimate (1,1) Mechanic prepares estimates

Estimate–Job Estimate (0,1), Job (1,1) Job created from estimate

Mechanic–Job Mechanic (0,N), Job (1,1) One mechanic per job

Job–Repair Job (1,N), Repair (1,1) Job contains many repairs

Repair–Parts Repair (0,N), Part (1,1) Repair may use many parts

If you want, I can also create:

✔ Full ER diagram (text format)


✔ ER diagram image (uploadable)
✔ Convert this to Relational Schema (tables with keys)
✔ Solve past-paper questions based on this problem

Just tell me!

Nice — let’s normalize the PROJECT relation to 3NF step by step. I’ll show:

 the definition of each normal form briefly,

 the functional dependencies (FDs) we infer from the problem,

 the anomalies (insertion / deletion / update) present at each stage, and


 the decomposition we perform with final 3NF relations, keys, and SQL DDL.

Given relation (un-normalized)

PROJECT(PRONO, PRONAME, BUDGET, STARTDATE, PROMGR,

EMPID, HOURS, EMPNAME, SALARY, EMPMGR, DEPTNO, DNAME, RATING)

Notes / assumptions (from the problem text)

 PRONO uniquely identifies a project.

 PROMGR = the project manager; we assume this is an EMPID (i.e. a manager is an


employee).

 EMPID uniquely identifies an employee; EMPMGR = that employee’s manager (also an


EMPID).

 Many-to-many between employees and projects; HOURS and RATING are per (EMPID,
PRONO).

Step 0 — Identify natural Functional Dependencies (FDs)

From the description we can infer these FDs:

1. PRONO → PRONAME, BUDGET, STARTDATE, PROMGR


(a project number determines project attributes and its manager)

2. EMPID → EMPNAME, SALARY, EMPMGR, DEPTNO


(an employee id determines employee attributes and department)

3. DEPTNO → DNAME
(department number determines department name)

4. PRONO, EMPID → HOURS, RATING


(for a given project and employee the hours and rating are determined)

Other implied facts:

 PROMGR is an EMPID (FK to employee).

 Candidate key for the original relation = {PRONO, EMPID} (because each tuple is an
assignment of an employee to a project).
Brief definitions of normal forms (for reference)

 1NF: All attributes atomic (no repeating groups).

 2NF: 1NF + no partial dependency of non-key attribute on part of a composite key. (Only
applies when key is composite.)

 3NF: 2NF + no transitive dependency of non-key attribute on the key (i.e., non-key
attributes depend only on keys, not on other non-key attributes), or every non-prime
attribute is non-transitively dependent on every candidate key. Equivalent statement: for
every FD X→A, either X is a superkey or A is prime or A is part of a relation that is keyed
by X (or the FD is trivial).

Stage A — Original relation: check 1NF

 Attributes look atomic, so 1NF holds.

Anomalies (present now):

 Insertion anomaly: You cannot insert a new project without also inserting at least one
employee tuple (since EMPID is part of the PK PRONO,EMPID) — or you must insert a
dummy employee.

 Deletion anomaly: If the last employee assigned to a project is removed, you lose
project info (name, budget, manager) because project attributes are stored in the same
tuple.

 Update anomaly: If project manager PROMGR changes, you must update all tuples for
that project (all employees on that project). If you forget, inconsistent PROMGR values
appear. Similarly, if department name DNAME changes, it must be updated in all tuples
for employees of that department.

So although 1NF holds, the relation is poorly designed due to redundancy.

Stage B — Move to 2NF (remove partial dependencies)

Why 2NF is violated: The primary key is composite {PRONO, EMPID}. Non-key attributes that
depend on only part of the key cause partial dependencies:

 PRONO → PRONAME, BUDGET, STARTDATE, PROMGR is a partial dependency because


these attributes depend on PRONO only, not on EMPID.
 EMPID → EMPNAME, SALARY, EMPMGR, DEPTNO is a partial dependency because these
depend on EMPID only.

To remove partial dependencies we decompose into relations where each non-key attribute
depends on the whole key of its relation:

Decompose into:

1. PROJECT(PRONO, PRONAME, BUDGET, STARTDATE, PROMGR)

o Key: PRONO

o FDs: PRONO → PRONAME, BUDGET, STARTDATE, PROMGR

2. EMPLOYEE(EMPID, EMPNAME, SALARY, EMPMGR, DEPTNO)

o Key: EMPID

o FDs: EMPID → EMPNAME, SALARY, EMPMGR, DEPTNO

3. WORKS_ON(EMPID, PRONO, HOURS, RATING)

o Key: composite (EMPID, PRONO)

o FDs: (EMPID, PRONO) → HOURS, RATING

4. DEPARTMENT(DEPTNO, DNAME) — we can extract DEPT now or wait to 3NF; including it


here helps.

After decomposition into these relations, 2NF is satisfied because there are no partial
dependencies in any of the three relations.

Anomalies removed/remaining after 2NF:

 Insertion: You can now insert a project into PROJECT without inserting EMPID —
insertion anomaly removed for projects. You can insert an employee into EMPLOYEE
without project data.

 Deletion: Deleting last WORKS_ON row for a project no longer deletes project info
(project row is separate).

 Update: Changing project data or employee data happens in one place now (PROJECT or
EMPLOYEE), removing many update anomalies.

But 2NF may still allow transitive dependencies (i.e., non-key → non-key). We must check for
3NF.
Stage C — Move to 3NF (remove transitive dependencies)

Look for transitive dependencies:

 In EMPLOYEE: EMPID → DEPTNO and DEPTNO → DNAME. So EMPID → DNAME


transitively via DEPTNO. This is a transitive dependency: a non-key attribute DNAME
depends on EMPID through DEPTNO. To remove it, create DEPARTMENT relation.

 Also ensure PROMGR in PROJECT is not causing a transitive FD (we treat PROMGR as an
EMPID foreign key; it is allowed in PROJECT as a key reference — PROMGR itself is not
giving other project attributes).

So decompose EMPLOYEE and PROJECT as needed:

Final 3NF relations:

1. PROJECT(PRONO, PRONAME, BUDGET, STARTDATE, PROMGR)

o Key: PRONO

o Notes: PROMGR is an EMPID (FK → [Link]).

2. EMPLOYEE(EMPID, EMPNAME, SALARY, EMPMGR, DEPTNO)

o Key: EMPID

o Notes: EMPMGR is an EMPID (self-reference: manager is an employee). DEPTNO


references DEPARTMENT.

3. DEPARTMENT(DEPTNO, DNAME)

o Key: DEPTNO

o FD: DEPTNO → DNAME

4. WORKS_ON(EMPID, PRONO, HOURS, RATING)

o Key: composite (EMPID, PRONO)

o FD: (EMPID, PRONO) → HOURS, RATING

Now 3NF holds:

 In each relation, every non-prime attribute depends on the key, the whole key, and
nothing but the key (no transitive non-key dependencies remain).

Stage D — Final FDs and keys (3NF)


Summarize FDs by relation:

 PROJECT

o Key: PRONO

o FDs: PRONO → PRONAME, BUDGET, STARTDATE, PROMGR

 EMPLOYEE

o Key: EMPID

o FDs: EMPID → EMPNAME, SALARY, EMPMGR, DEPTNO

o EMPMGR is an EMPID value (self FK) — no FD implying manager attributes are


stored here.

 DEPARTMENT

o Key: DEPTNO

o FDs: DEPTNO → DNAME

 WORKS_ON

o Key: (EMPID, PRONO)

o FDs: (EMPID, PRONO) → HOURS, RATING

Foreign keys / relationships:

 [Link] → [Link] (a project manager is an employee).

 [Link] → [Link].

 WORKS_ON.EMPID → [Link].

 WORKS_ON.PRONO → [Link].

 [Link] → [Link] (self-reference).

Stage E — Anomalies after 3NF (summary)

After normalization to 3NF:

 Insertion anomalies: largely resolved. You can insert a project, employee, or department
independently. You can insert a WORKS_ON only if the referenced employee and project
exist (FK), which is desirable.
 Deletion anomalies: deleting a WORKS_ON row does not remove the project or
employee info. Deleting a project row will remove project info but not employee info
(and will need cascade or restrict on WORKS_ON depending on design).

 Update anomalies: updating e.g., PRONAME occurs only in PROJECT, DNAME only in
DEPARTMENT, so update anomalies are removed.

Stage F — Example SQL DDL (3NF schema)

-- DEPARTMENT

CREATE TABLE DEPARTMENT (

DEPTNO INT PRIMARY KEY,

DNAME VARCHAR(100) NOT NULL

);

-- EMPLOYEE

CREATE TABLE EMPLOYEE (

EMPID INT PRIMARY KEY,

EMPNAME VARCHAR(100) NOT NULL,

SALARY DECIMAL(12,2),

EMPMGR INT, -- manager is an employee (nullable if top manager)

DEPTNO INT,

FOREIGN KEY (EMPMGR) REFERENCES EMPLOYEE(EMPID),

FOREIGN KEY (DEPTNO) REFERENCES DEPARTMENT(DEPTNO)

);

-- PROJECT

CREATE TABLE PROJECT (

PRONO INT PRIMARY KEY,


PRONAME VARCHAR(200) NOT NULL,

BUDGET DECIMAL(14,2),

STARTDATE DATE,

PROMGR INT, -- project manager (employee)

FOREIGN KEY (PROMGR) REFERENCES EMPLOYEE(EMPID)

);

-- WORKS_ON (relationship between employees and projects)

CREATE TABLE WORKS_ON (

EMPID INT,

PRONO INT,

HOURS DECIMAL(6,2),

RATING INT,

PRIMARY KEY (EMPID, PRONO),

FOREIGN KEY (EMPID) REFERENCES EMPLOYEE(EMPID),

FOREIGN KEY (PRONO) REFERENCES PROJECT(PRONO)

);

Notes:

 You may want ON DELETE RESTRICT or ON DELETE CASCADE depending on policy


(usually restrict: don’t delete employees/projects if work assignments exist).

 EMPMGR and PROMGR can be NOT NULL if business rules require managers always
present; otherwise allow NULL.

Stage G — Quick walk-through of the normalization justification

1. Why split PROJECT from WORKS_ON? Because project attributes depend only on
PRONO, not on EMPID. Storing them together repeats project data for each employee
assigned → redundancy and anomalies.
2. Why have EMPLOYEE separate? Employee attributes depend on EMPID only; storing
them with project assignments duplicates employee info across every project they work
on.

3. Why DEPARTMENT separate? To remove the transitive dependency EMPID → DEPTNO


→ DNAME. DNAME should not be stored redundantly for each employee tuple.

4. Why WORKS_ON as a relationship table? Many-to-many Employee ⇄ Project with


attributes (HOURS, RATING) that apply to the assignment, so
WORKS_ON(EMPID,PRONO,HOURS,RATING) is the correct design.

Final checklist for 3NF compliance

 Each relation has a primary key.

 All non-key attributes are fully functionally dependent on the key of their relation.

 No transitive dependencies remain (department name lives only in DEPARTMENT).

 Redundancy and anomalies are minimized.

If you want I can now:

 produce the ER diagram for these relations,

 convert the schema into 3NF step-by-step visual (showing which attributes moved
where), or

 give sample tuples to illustrate how data is stored before and after normalization.

Which would you like next?

Below are the SQL queries for each part, written in simple form with clear logic.

a) List all CUSTOMERS whose name starts with 'S' and income is between 30000 and 50000

SELECT *

FROM CUSTOMER
WHERE C_NAME LIKE 'S%'

AND INCOME BETWEEN 30000 AND 50000;

b) List all the purchases made by a given CUSTOMER

Assuming CUSTNO links CUSTOMER → PURCHASES.

(You will replace X with a real customer number.)

SELECT *

FROM PURCHASES

WHERE CUSTNO = X;

If the PURCHASES table does not have CUSTNO, then a join is needed:

SELECT P.*

FROM PURCHASES P

JOIN CUSTOMER C ON [Link] = [Link]

WHERE [Link] = X;

c) Display CUSTNO, C_NAME, PROD_NAME, PRICE, QTY

SELECT [Link], C.C_NAME, P.PROD_NAME, [Link], [Link]

FROM CUSTOMER C

JOIN PURCHASES P ON [Link] = [Link];

d) Find the name of the CUSTOMER with the lowest purchases

Interpretation → lowest total purchase amount = SUM(PRICE × QTY)

SELECT C.C_NAME

FROM CUSTOMER C

JOIN PURCHASES P ON [Link] = [Link]

GROUP BY [Link], C.C_NAME


ORDER BY SUM([Link] * [Link]) ASC

FETCH FIRST 1 ROW ONLY;

For MySQL:

SELECT C.C_NAME

FROM CUSTOMER C

JOIN PURCHASES P ON [Link] = [Link]

GROUP BY [Link], C.C_NAME

ORDER BY SUM([Link] * [Link])

LIMIT 1;

e) Find the name(s) of the unsold PRODUCT

Unsold product = product name not appearing in PURCHASES table.

SELECT DISTINCT PROD_NAME

FROM PURCHASES

WHERE PROD_NAME NOT IN (

SELECT PROD_NAME FROM PURCHASES

);

⚠ But this will always return empty.

Correct interpretation →
If you had a PRODUCT table, compare PRODUCT → PURCHASES.

But since only PURCHASES table is given, we assume some products exist but not purchased.

So better query (if PRODUCT table existed):

SELECT P.PROD_NAME

FROM PRODUCT P

WHERE P.PROD_NAME NOT IN (

SELECT PROD_NAME FROM PURCHASES


);

If only PURCHASES table exists → we cannot find unsold products because no product list
exists.

If you want, I can also create diagram, joins explanation, or practice questions using these
tables.

You might also like