PP 4 Solution
PP 4 Solution
1. Hardware
2. Software
All programs used to manage the database.
Mainly includes the DBMS (Database Management System) like MySQL, Oracle, SQL
Server.
DBMS handles:
o Data storage
o Query processing
o Backups
o Security
3. Data
4. Users
b. Application Programmers
Write programs/apps that use the database (e.g., banking app developers).
c. End Users
Actual users like students using LMS, customers using shopping app.
5. Procedures
o Security rules
Student Table
Assume:
RollNo is unique
CNIC is unique
Email is unique
1. Super Key
A super key is any set of attributes that uniquely identifies a record.
Examples:
{RollNo}
{CNIC}
{Email}
{RollNo, Name}
{RollNo, Phone}
Easy Definition:
✔ A superkey = unique identifier + extra fields allowed.
2. Candidate Key
{RollNo}
{CNIC}
{Email}
Why?
Because each of them individually can identify a row and have no extra attribute.
3. Primary Key
A primary key is the candidate key that we choose to use as the main unique identifier.
From candidate keys {RollNo}, {CNIC}, {Email},
the designer might choose RollNo as the primary key.
Unique
Minimal
Not null
Super Key Any attribute(s) that uniquely identifies a row RollNo, CNIC, (RollNo + Name)
Candidate Key Minimal super key (no extra attributes) RollNo, CNIC, Email
If you want, I can also make MCQs, short questions, or exam notes for this topic.
Nice set of questions — I’ll explain each in clear points + concrete examples so it’s easy to
remember.
Entity integrity: every table row must have a unique, non-NULL primary key.
Referential integrity: foreign key values in one table must either match a primary (or
candidate) key value in the referenced table or be NULL (if allowed).
Referential integrity ensures relationships between tables are consistent (no “dangling”
references).
Example (Student & Enrollment)
Two tables:
Student
Enrollment
1 101 CS101
2 103 MATH201
When you delete or update a referenced Student row, DBMS can be told to:
Short checklist
Multiple transactions executing at (roughly) the same time on the same database. Concurrency
is necessary for throughput and responsiveness, but it can create consistency problems if not
controlled.
ACID reminder
1. Lost Update
o Two transactions T1 and T2 read the same row and then both update it; one
update is lost.
2. Dirty Read
o Example: T1 sets salary = 2000 but aborts; T2 read salary = 2000 and used it in a
calculation → wrong.
3. Non-repeatable Read
o T1 reads a row twice during its execution and gets different values because T2
modified and committed in between.
o Example: T1 reads student grade = 80, later reads again and now sees 85
(changed by T2).
4. Phantom Read
o T1 runs a query that retrieves a set of rows (e.g., all students with grade > 75). T2
inserts or deletes rows that would change the result set. When T1 runs the same
query again it sees different rows (new “phantoms”).
5. Write Skew (in some weaker isolation levels, e.g., snapshot isolation anomalies)
o Two transactions concurrently check conditions and write different rows, leading
to a constraint violation that wouldn’t happen if serialized.
o E.g., READ COMMITTED prevents dirty reads; REPEATABLE READ prevents non-
repeatable reads; SERIALIZABLE prevents phantoms (gives serial execution
effect).
Locking protocols
o Two-Phase Locking (2PL): acquire locks, then release after commit (ensures
serializability if strict). Locks: shared (S) for read, exclusive (X) for write.
o Transactions proceed without locks; check for conflicts at commit time (good for
low contention).
Timestamp ordering
Practical advice
Pick isolation level based on application needs: banking → SERIALIZABLE or strict
locking; reporting dashboards → lower isolation (READ COMMITTED / snapshot) to
improve concurrency.
Use short transactions and avoid user interaction inside a transaction to reduce lock wait
times.
3) Role of the transaction log and the database change log in recovery
I’ll treat transaction log (also called WAL / redo/undo log) and database change log (e.g., binary
log, change data capture logs) as two related but distinct things — both help recovery and
replication.
What it is:
A sequential log that records every change a transaction makes (often recording before
and/or after images or logical operations). Examples: SQL Server transaction log,
PostgreSQL WAL.
Write-Ahead: log records are written to stable storage before the actual data pages are
written to disk. This ensures changes can be redone or undone after a crash.
Roles in recovery:
Redo committed transactions whose data pages may not have been
flushed to disk (bring DB to the state of all committed transactions).
o Restore the last full backup, then apply logged changes up to a chosen point in
time using the transaction log.
3. Crash → steps simplified:
o Read log from last checkpoint, identify committed vs uncommitted txns, redo
committed, undo uncommitted.
Example:
You commit transaction T100. The DBMS writes T100’s log record to WAL and
acknowledges commit. Later the DBMS writes the actual data pages lazily. If a crash
happens before pages are written, recovery reads WAL and reapplies T100 (redo),
ensuring T100’s effects are present.
What it is:
A log of changes meant often for replication, auditing, or change data capture.
Examples: MySQL binary log (binlog), Debezium CDC streams, Oracle redo +
supplemental logs used for replication.
Replication: Master writes binlog; replicas read and apply changes to stay synchronized.
Point-in-time recovery / audit: Can replay binlog after a backup to recreate DB state at a
specific time or to extract which rows changed when.
Auditing / CDC pipelines: Downstream systems use change log to update caches, search
indexes, analytics.
Transaction log (WAL): primarily for crash recovery and internal consistency
(redo/undo). May be physical/low-level. Not always intended for external consumption.
Change log / binlog: often logical (e.g., SQL statements or row images) and used for
replication and audit. Format and retention/purposes may differ from WAL.
Note: Some DBMS combine responsibilities; e.g., MySQL’s InnoDB has its redo log (WAL) and
MySQL also has a binary log (binlog) used for replication and PITR.
o Undo any transactions that were active and uncommitted at crash time.
3. Point-in-time recovery:
Backup + log shipping (regularly copy logs offsite) for disaster recovery.
Retention policy for binlogs and WALs must balance space vs ability to recover to older
points.
Entity integrity = PKs must be unique & not NULL; referential integrity = FKs must refer
to existing PKs or be NULL/handled by rules (cascade, restrict).
Concurrent transactions increase throughput but can cause lost updates, dirty reads,
non-repeatable reads, phantoms, etc.; DBMS use isolation levels, locking, MVCC, or
optimistic control to prevent them.
Transaction log (WAL) is the DB’s internal sequential log used primarily for crash
recovery, redo/undo and durability; database change log / binlog is often a logical log
used for replication, auditing, and point-in-time recovery. Both are essential to bring a
database back to a consistent state after failures.
Draw a small timeline showing locks, reads, writes for a lost-update example.
Create short exam-style questions from these topics.
Show SQL examples to define FK with CASCADE/RESTRICT and how a DB rejects actions
that violate referential integrity.
✔ Entities
✔ Attributes
✔ Relationship types
✔ Cardinalities (min–max)
From the story, the main objects we store information about are:
1. Customer
Attributes
CustomerID (PK)
Name
Phone
Address
2. Car
A car owned by a customer.
Attributes
CarID (PK)
Model
Year
Color
LicensePlate
3. Mechanic
Attributes
MechanicID (PK)
Name
Phone
SkillLevel
4. Estimate
Attributes
EstimateID (PK)
EstimatedCost
EstimatedTime
Date
MechanicID (FK)
CarID (FK)
5. Job
Attributes
JobID (PK)
ActualStartDate
Status
CarID (FK)
CustomerID (FK)
MechanicID (FK)
EstimateID (FK)
6. Repair
A job is made up of several repairs (e.g., “Replace left fender”, “Paint door”).
Attributes
RepairID (PK)
Description
LaborHours
JobID (FK)
7. PartsUsed
PartID (PK)
PartName
UnitPrice
Quantity
RepairID (FK)
I will describe each relationship and also give the min–max notation like:
Customer — (1,N) Car
1. Customer — Car
Cardinality (min–max):
Min interpretation:
2. Car — Estimate
3. Mechanic — Estimate
Cardinality:
4. Estimate — Job
Cardinality:
5. Job — Mechanic
Cardinality:
Mechanic → Job = (0, N)
6. Job — Repair
Cardinality:
7. Repair — PartsUsed
Cardinality:
Car–Estimate Car (0,N), Estimate (1,1) Car can have many estimates
Repair–Parts Repair (0,N), Part (1,1) Repair may use many parts
Nice — let’s normalize the PROJECT relation to 3NF step by step. I’ll show:
Many-to-many between employees and projects; HOURS and RATING are per (EMPID,
PRONO).
3. DEPTNO → DNAME
(department number determines department name)
Candidate key for the original relation = {PRONO, EMPID} (because each tuple is an
assignment of an employee to a project).
Brief definitions of normal forms (for reference)
2NF: 1NF + no partial dependency of non-key attribute on part of a composite key. (Only
applies when key is composite.)
3NF: 2NF + no transitive dependency of non-key attribute on the key (i.e., non-key
attributes depend only on keys, not on other non-key attributes), or every non-prime
attribute is non-transitively dependent on every candidate key. Equivalent statement: for
every FD X→A, either X is a superkey or A is prime or A is part of a relation that is keyed
by X (or the FD is trivial).
Insertion anomaly: You cannot insert a new project without also inserting at least one
employee tuple (since EMPID is part of the PK PRONO,EMPID) — or you must insert a
dummy employee.
Deletion anomaly: If the last employee assigned to a project is removed, you lose
project info (name, budget, manager) because project attributes are stored in the same
tuple.
Update anomaly: If project manager PROMGR changes, you must update all tuples for
that project (all employees on that project). If you forget, inconsistent PROMGR values
appear. Similarly, if department name DNAME changes, it must be updated in all tuples
for employees of that department.
Why 2NF is violated: The primary key is composite {PRONO, EMPID}. Non-key attributes that
depend on only part of the key cause partial dependencies:
To remove partial dependencies we decompose into relations where each non-key attribute
depends on the whole key of its relation:
Decompose into:
o Key: PRONO
o Key: EMPID
After decomposition into these relations, 2NF is satisfied because there are no partial
dependencies in any of the three relations.
Insertion: You can now insert a project into PROJECT without inserting EMPID —
insertion anomaly removed for projects. You can insert an employee into EMPLOYEE
without project data.
Deletion: Deleting last WORKS_ON row for a project no longer deletes project info
(project row is separate).
Update: Changing project data or employee data happens in one place now (PROJECT or
EMPLOYEE), removing many update anomalies.
But 2NF may still allow transitive dependencies (i.e., non-key → non-key). We must check for
3NF.
Stage C — Move to 3NF (remove transitive dependencies)
Also ensure PROMGR in PROJECT is not causing a transitive FD (we treat PROMGR as an
EMPID foreign key; it is allowed in PROJECT as a key reference — PROMGR itself is not
giving other project attributes).
o Key: PRONO
o Key: EMPID
3. DEPARTMENT(DEPTNO, DNAME)
o Key: DEPTNO
In each relation, every non-prime attribute depends on the key, the whole key, and
nothing but the key (no transitive non-key dependencies remain).
PROJECT
o Key: PRONO
EMPLOYEE
o Key: EMPID
DEPARTMENT
o Key: DEPTNO
WORKS_ON
[Link] → [Link].
WORKS_ON.EMPID → [Link].
WORKS_ON.PRONO → [Link].
Insertion anomalies: largely resolved. You can insert a project, employee, or department
independently. You can insert a WORKS_ON only if the referenced employee and project
exist (FK), which is desirable.
Deletion anomalies: deleting a WORKS_ON row does not remove the project or
employee info. Deleting a project row will remove project info but not employee info
(and will need cascade or restrict on WORKS_ON depending on design).
Update anomalies: updating e.g., PRONAME occurs only in PROJECT, DNAME only in
DEPARTMENT, so update anomalies are removed.
-- DEPARTMENT
);
-- EMPLOYEE
SALARY DECIMAL(12,2),
DEPTNO INT,
);
-- PROJECT
BUDGET DECIMAL(14,2),
STARTDATE DATE,
);
EMPID INT,
PRONO INT,
HOURS DECIMAL(6,2),
RATING INT,
);
Notes:
EMPMGR and PROMGR can be NOT NULL if business rules require managers always
present; otherwise allow NULL.
1. Why split PROJECT from WORKS_ON? Because project attributes depend only on
PRONO, not on EMPID. Storing them together repeats project data for each employee
assigned → redundancy and anomalies.
2. Why have EMPLOYEE separate? Employee attributes depend on EMPID only; storing
them with project assignments duplicates employee info across every project they work
on.
All non-key attributes are fully functionally dependent on the key of their relation.
convert the schema into 3NF step-by-step visual (showing which attributes moved
where), or
give sample tuples to illustrate how data is stored before and after normalization.
Below are the SQL queries for each part, written in simple form with clear logic.
a) List all CUSTOMERS whose name starts with 'S' and income is between 30000 and 50000
SELECT *
FROM CUSTOMER
WHERE C_NAME LIKE 'S%'
SELECT *
FROM PURCHASES
WHERE CUSTNO = X;
If the PURCHASES table does not have CUSTNO, then a join is needed:
SELECT P.*
FROM PURCHASES P
WHERE [Link] = X;
FROM CUSTOMER C
SELECT C.C_NAME
FROM CUSTOMER C
For MySQL:
SELECT C.C_NAME
FROM CUSTOMER C
LIMIT 1;
FROM PURCHASES
);
Correct interpretation →
If you had a PRODUCT table, compare PRODUCT → PURCHASES.
But since only PURCHASES table is given, we assume some products exist but not purchased.
SELECT P.PROD_NAME
FROM PRODUCT P
If only PURCHASES table exists → we cannot find unsold products because no product list
exists.
If you want, I can also create diagram, joins explanation, or practice questions using these
tables.