DBMS - Complete Notes
Unit 1: Introduction to DBMS + Data Models
1. What is DBMS? Why not File Systems?
De nition of DBMS
DBMS (Database Management System): A DBMS is a software system
designed to create, store, manage, and manipulate databases. It provides
an interface between end users and the data, ensuring data is organized,
secure, and accessible as required.
What is a File System?
File System: A method used by OS to store and manage les on a disk.
Data is kept in les, often in a speci c structure (e.g., CSV, TXT), but lacks
standardized query, security, or concurrency controls.
Drawbacks of File Systems
• Data Redundancy: Duplicate data in multiple les.
• Data Inconsistency: Updates in one le may not re ect in others.
• Lack of Data Isolation: Dif cult to retrieve related data from multiple
les.
• Integrity Problems: No automatic integrity constraints.
fi
fi
fi
fi
fi
fi
fi
fl
fi
• Atomicity Issues: Operations may not be fully completed (no
transaction support).
• Security Issues: Basic or no security, data can be accessed by
anyone with access to the le.
• Concurrent Access Anomalies: No control over simultaneous
accesses/edits by multiple users.
Why DBMS Over File System?
• Centralized data management
• Supports complex queries
• Enforces data integrity, security, and constraints
• Supports concurrent access and transaction management
• Provides backup, recovery, and crash protection
• Facilitates data abstraction and independence
2. Advantages of DBMS
Key Advantages
Data Abstraction: Hides storage details from users; provides conceptual
view.
Data Independence: Application can be changed without changing data
structure and vice versa.
Reduced Redundancy & Inconsistency: Centralized control avoids
duplicate and con icting data.
fl
fi
Data Integrity: Ensures accuracy and consistency via constraints (e.g.,
primary keys, foreign keys).
ACID Properties (Transaction Management):
• Atomicity: Each transaction is all or nothing.
• Consistency: Ensures database remains valid after transaction.
• Isolation: Concurrent transactions don't affect each other's
execution.
• Durability: Once transaction is committed, it survives failures.
Data Security: Access controls, authentication, and authorization
mechanisms.
Concurrent Access: Supports multiple users accessing/updating data
simultaneously.
Backup & Recovery: Automated processes for data protection and
disaster recovery.
Enforced Standards: Naming conventions, data formats, and rules.
Ef cient Query Processing: SQL support for complex data retrieval and
manipulation.
3. DBMS vs RDBMS vs NoSQL
DBMS
De nition: General database system, can be any data model (not just
relational).
Examples: DBMS like MS Access, FoxPro, dBase.
Features: No strict enforcement of relations, ACID properties may not be
fully supported, normalization not mandatory.
RDBMS (Relational DBMS)
De nition: A DBMS based on the relational model by E.F. Codd. Data is
stored in tables (relations) with rows and columns.
Examples: Oracle, MySQL, PostgreSQL, SQL Server, IBM DB2.
fi
fi
fi
Key Features:
• Enforces relationships via primary and foreign keys.
• Follows strict schema and data integrity rules.
• Supports ACID transactions.
• SQL support for queries.
NoSQL
De nition: "Not Only SQL" — Database systems designed for
unstructured or semi-structured data, high scalability, and exibility.
Examples: MongoDB (Document), Cassandra (Wide-column), Redis (Key-
Value), Neo4j (Graph).
Features:
• Flexible schemas; can handle structured, semi-structured, or
unstructured data.
• Designed for distributed, scalable environments (big data, web-
scale).
• No or weak ACID guarantees; may use eventual consistency (CAP
theorem).
• Types: Document, Key-Value, Wide-Column, Graph.
Comparison Table
Feature DBMS RDBMS NoSQL
fi
fl
Data File, Object, Table Key-Value, Document,
Structure Table (Relation) Graph, etc.
Not strictly
Schema Strict Flexible/None
enforced
Relations Enforced (PK, Usually not, except
No or limited
hips FK) Graph DBs
Transactio Varies (Eventual
Partial/None Full ACID
ns Consistency)
Horizontal (sharding,
Scalability Vertical Vertical
etc.)
Query Varies Varies (NoSQL, JSON,
SQL
Language (proprietary) etc.)
Use Simple OLTP, OLAP, Big Data, Real-time
Cases applications ERP, CRM web, IoT
4. 3-Schema Architecture: Internal, Conceptual, External
De nition
A way to separate the user view, logical structure, and physical storage in a
database system, enabling abstraction and data independence.
Layers
1. Internal Schema (Physical Level)
• What: Lowest level, describes how data is stored physically ( les,
indexes, structures, blocks).
• Who: Managed by DBAs and system engineers.
fi
fi
• Example: How a table is stored as B+ tree index or hash, le
locations.
2. Conceptual Schema (Logical Level)
• What: Describes logical structure of all data (entities, relationships,
constraints), independent of how data is stored.
• Who: Database designers.
• Example: Table de nitions, relationships, constraints (without details
of storage).
3. External Schema (View Level)
fi
fi
• What: Highest level, describes various user views, what each user/
application can access or see.
• Who: End users, application developers.
• Example: A view that only shows "Employee Name" and
"Department" to HR; other details hidden.
Bene ts
• Abstraction: Users don't need to know storage details.
• Security: Expose only needed data through views.
• Data Independence: Changes in storage or schema don't affect
user/application view.
5. Data Independence (Logical & Physical)
De nition
Ability to change schema at one level of the database system without
having to change the schema at the next higher level.
Types
1. Physical Data Independence
• De nition: Ability to change the physical storage structure ( les,
indices) without affecting the conceptual schema.
• Example: Moving data from SSD to HDD, adding indexes, or
changing le organization.
2. Logical Data Independence
fi
fi
fi
fi
fi
• De nition: Ability to change the conceptual schema (tables, elds,
relationships) without affecting the external schemas or user views.
• Example: Splitting a table into two without affecting user queries
using a view.
Importance
• Reduces system maintenance cost.
• Facilitates scalability and exibility.
• Ensures minimal application disruption on schema updates.
6. Types of Users in DBMS
1. Database Administrator (DBA)
Roles:
• Install and con gure DBMS.
• Manage users and permissions.
• Backup, recovery, performance tuning.
• Enforce security and integrity constraints.
2. Application Developers
Roles:
• Design and develop application programs using the DBMS.
• Write queries, stored procedures, and business logic.
3. End Users
Types:
fi
fi
fl
fi
• Naive/Parametric Users: Use pre-de ned applications (e.g., bank
clerk entering transactions).
• Sophisticated Users: Interact directly with the DB using query
languages (e.g., analysts).
• Casual Users: Occasionally access DB using queries or reports
(e.g., managers).
• Specialized Users: Use specialized DB applications (e.g., scientists
using advanced data analysis tools).
4. System Analysts and Designers
Roles:
• Design overall database schema, constraints, and ensure the system
meets business needs.
7. Data Models
A data model is a conceptual tool to describe data, relationships,
constraints, and semantics.
fi
a. Hierarchical Data Model
De nition: Data is organized in a tree-like structure with a single root and
multiple levels of parent-child (1:N) relationships.
Example: Organization chart, le system hierarchy.
Pros: Simple, fast navigation for parent-child data.
Cons: Rigid structure, dif cult for many-to-many relationships.
Use Cases: Early mainframe DBMS, e.g., IBM IMS.
b. Network Data Model
De nition: Data organized as a graph, supports many-to-many
relationships via "records" and "sets."
Example: Student enrolled in multiple courses; Courses have multiple
students.
Pros: Handles complex relationships.
Cons: Complex to design and maintain; requires navigational access.
Use Cases: IDS, Integrated Data Store (Charles Bachman).
c. Relational Data Model
fi
fi
fi
fi
De nition: Data represented in tables (relations) with rows (tuples) and
columns (attributes).
Key Features:
• Schema based on relations.
• Enforces data integrity via keys (primary, foreign).
• Uses SQL for queries.
Example: Table: Employee (EmpID, Name, DeptID)
Pros: Flexibility, simple to design, powerful querying, data independence.
Cons: May not be best for highly connected data or large-scale
unstructured data.
Use Cases: Business applications, banking, ERP, CRM.
d. Object-Oriented Data Model
De nition: Data modeled as objects, as in OOP; includes classes,
inheritance, polymorphism.
Key Features:
• Supports complex data types, encapsulation.
• Stores both data and methods.
Example: Class: Student { int rollNo; String name; List courses; }
Pros: Can model complex, real-world entities; reusable code.
fi
fi
Cons: Complex implementation, not as widely used as relational.
Use Cases: CAD/CAM, multimedia, scienti c databases.
Examples Table
Stru
Real-world
Model ctur Use Cases Tools/DBs
Example
e
Hierarchi Org Chart, Early
Tree IBM IMS
cal Filesystem banking, HR
Grap Courses- Telecom,
Network IDS, IDMS
h Students Link Logistics
Relation Table ERP, CRM, Oracle, MySQL,
Bank, Retail
al s Analytics SQL Server
Object- Obje CAD, Graphics,
db4o, ObjectDB
Oriented cts Multimedia Engineering
8. Use Cases: RDBMS vs NoSQL
When to Use RDBMS
• Structured data with xed schema (e.g., employee records,
transactions).
• Need for strong data consistency (ACID).
• Complex querying and reporting required (JOINs, aggregations).
• Examples: Banking systems, ERP, CRM.
When to Use NoSQL
fi
fi
• Handling big data, semi-structured/unstructured data (JSON, XML).
• High scalability, distributed architecture (web-scale applications).
• Flexible schema (frequent changes, evolving data).
• Real-time analytics, social networks, IoT.
• Examples:
◦ Document Stores (MongoDB): Product catalogs, content
management.
◦ Key-Value Stores (Redis): Caching, real-time leaderboards.
◦ Wide-column Stores (Cassandra): Sensor data, time-series.
◦ Graph DBs (Neo4j): Social networks, fraud detection.
Unit 2: Entity–Relationship (ER) Model & Enhanced ER
1. Entity, Attributes, Keys
Entity
De nition: A real-world object or concept that can be distinctly identi ed
and stored in a database.
Types:
1. Strong Entity
• Exists independently.
• Has a primary key of its own.
• Example: Student(StudentID, Name, DOB)
2. Weak Entity
• Cannot exist without being linked to a strong entity.
• Identi ed using a partial key + key of the related strong entity.
• Example: Dependent(DependentName) depends on Employee.
Attributes
De nition: Properties or characteristics of an entity.
Types:
1. Simple Attribute – Cannot be divided further.
◦ Example: FirstName
2. Composite Attribute – Can be divided into smaller subparts.
fi
fi
fi
fi
◦ Example: FullName → FirstName + LastName
3. Derived Attribute – Calculated from other attributes.
◦ Example: Age (from DOB)
4. Multivalued Attribute – Can have multiple values.
◦ Example: PhoneNumbers
5. Key Attribute – Used to uniquely identify an entity.
◦ Example: RollNumber
Keys
• Super Key – Any set of attributes that uniquely identi es a record.
◦ Example: {RollNo},
• Candidate Key – Minimal super key (no extra attributes).
◦ Example:
• Primary Key – Chosen candidate key for unique identi cation.
◦ Example: RollNo
• Alternate Key – Candidate keys not chosen as primary key.
◦ Example:
• Foreign Key – Attribute in one table that references primary key in
another table.
◦ Example: DeptID in Employee refers to Department.
2. Relationships & Cardinality
Relationship
fi
fi
De nition: Association among entities.
Degree of Relationship:
1. Unary (Recursive) – Entity related to itself.
◦ Example: Employee manages another Employee.
2. Binary – Relationship between two entities (most common).
◦ Example: Student–Enrolls–Course.
3. Ternary – Relationship between three entities.
◦ Example: Doctor–Prescribes–Medicine–to Patient.
Cardinality Ratios
• One-to-One (1:1): One instance of entity A is associated with one
instance of entity B.
◦ Example: Person Passport.
• One-to-Many (1:N): One instance of A can be related to many of B.
◦ Example: Department → Employees.
• Many-to-Many (M:N): Many instances of A can relate to many of B.
◦ Example: Students Courses.
3. ER Diagrams
fi
Notations
• Entity: Rectangle.
• Weak Entity: Double rectangle.
• Relationship: Diamond.
• Attributes: Oval.
• Key Attribute: Underlined.
• Derived Attribute: Dashed oval.
• Multivalued Attribute: Double oval.
• Weak Entity Key Attribute: Dashed underline.
Example:
STUDENT (RollNo, Name, DOB,
{PhoneNo}) ENROLLS (Grade) COURSE (CourseID, Title)
Student Enrolls Course
• RollNo is PK in Student, CourseID is PK in Course.
• Enrolls is M:N relationship with attribute Grade.
4. Generalization, Specialization, Aggregation
Generalization
De nition: Top-down approach — combining similar entities into a higher-
level generalized entity.
Example: Car and Truck generalized into Vehicle.
Specialization
De nition: Bottom-up approach — dividing an entity into sub-entities
based on distinct characteristics.
Example: Employee specialized into Manager, Engineer.
Aggregation
fi
fi
De nition: Treating a relationship as an entity to participate in another
relationship.
Example: Teacher–Teaches–Course is aggregated to relate to
Department.
5. Mapping ER to Relational Schema
Rules
1. Strong Entity → Table
• All simple attributes become columns.
• PK is same as entity's primary key.
STUDENT(RollNo [PK], Name, DOB)
2. Weak Entity → Table
• Includes partial key + PK of strong entity.
• Composite PK = PK of strong entity + partial key.
DEPENDENT(EmpID [PK, FK], DependentName [PK], Relation)
3. 1:1 Relationship
• Add PK of one entity as FK in other entity.
• Choose based on participation (total participation side gets FK).
4. 1:N Relationship
fi
• Add PK of "1" side as FK in "N" side table.
5. M:N Relationship
• Create separate table with PKs from both entities as composite PK.
ENROLLS(StudentID [PK, FK], CourseID [PK, FK], Grade)
6. Multivalued Attributes
• Create separate table with PK of entity + attribute value.
STUDENT_PHONE(StudentID [PK, FK], PhoneNo [PK])
6. Weak Entities
Characteristics
• Exist only with a strong entity.
• No primary key of their own.
• Identi ed via owner entity's key + partial key.
• Represented as double rectangle in ER diagram.
• Example: Dependent depends on Employee.
7. Recursive Relationships
De nition
An entity is related to itself via a relationship.
Example: Employee manages another Employee.
fi
fi
Implementation: Use FK referring to same table.
EMPLOYEE(EmpID [PK], Name, ManagerID [FK EmpID])
8. ISA Hierarchy (Specialization/Generalization)
Disjoint vs Overlapping
Disjoint: An entity belongs to only one subclass.
• Example: A student is either UG or PG, not both.
Overlapping: An entity can belong to multiple subclasses.
• Example: Employee can be both Manager and Engineer.
Total vs Partial Participation
Total Participation: Every entity in superclass must belong to a subclass.
(Double line in diagram)
Partial Participation: Some entities in superclass may not belong to any
subclass. (Single line)
Mapping ISA to Tables
1. One table per hierarchy: Merge all attributes into one table; use
nulls for unused elds.
2. One table per subclass: Store PK of superclass in subclass tables.
3. One table per class: Separate tables for superclass and subclasses.
fi
→
Quick ER Notation Summary Table
Unit 3: Relational Model & Integrity Constraints
1. Structure of Relational Databases
Relational Database
Based on E. F. Codd's relational model (1970).
Core Idea: Data is stored in tables (relations) with rows (tuples) and
columns (attributes).
Key Terms
1. Relation → A table in the database.
2. Tuple → A row in a table (record).
3. Attribute → A column in a table ( eld).
4. Domain → Set of possible values for an attribute.
5. Degree → Number of attributes in a relation.
6. Cardinality → Number of tuples in a relation.
7. Null → Missing or unknown value (not zero or empty string).
Example Table
STUDENT
RollN
Name Age Dept
o
101 Alice 20 CSE
102 Bob 21 ECE
103 Carol 20 CSE
2. Relational Algebra (RA)
De nition
Procedural query language → Speci es how to obtain results.
fi
fi
fi
Uses mathematical set operations + special relational operations.
Basic Operations
1. Selection (σ) Filters rows (tuples) based on a condition.
Syntax: σcondition(Relation)
Example: Get all CSE students:
σDept='CSE'(STUDENT)
2. Projection (π) Selects speci c columns (attributes).
Syntax: πattributes(Relation)
Example: Get only names of students:
πName(STUDENT)
3. Cartesian Product (×) Combines every tuple of relation A with every
tuple of relation B.
Syntax: R × S
Example:
STUDENT × COURSE
fi
→ Produces all possible student-course combinations.
4. Join Combines tuples from two relations based on a related attribute.
Types:
1. Theta Join (⋈condition): Join with any condition. Example:
STUDENT ⋈DeptID=DeptID DEPARTMENT
2. Equi Join: Special case of theta join with only = condition.
3. Natural Join (⋈): Equi join + remove duplicate attributes.
5. Division (÷) Finds tuples in one relation that are related to all tuples in
another relation.
Example: Students who have taken all courses in a given list.
6. Rename (ρ) Changes name of relation or attributes.
Syntax: ρnewName(Relation)
Example:
ρS(STUDENT)
Set Operations
(Only valid when relations are union-compatible → same degree & same
domain)
1. Union (∪) – All tuples in either or both relations. Example:
CSE_Students ∪ ECE_Students
2. Intersection (∩) – Tuples present in both relations. Example:
CSE_Students ∩ Robotics_Club
3. Difference (−) – Tuples in rst relation but not in second. Example:
CSE_Students − Robotics_Club
fi
Example RA Query
Find names of students in CSE department:
πName(σDept='CSE'(STUDENT))
3. Relational Calculus
De nition
Non-procedural query language → Speci es what to retrieve, not how.
Types
1. Tuple Relational Calculus (TRC) Uses tuple variables.
Syntax: {t | P(t)} Where t is a tuple variable, P(t) is a predicate.
Example: Names of CSE students:
{[Link] | STUDENT(t) ∧ [Link]='CSE'}
2. Domain Relational Calculus (DRC) Uses domain variables (attribute
values).
Syntax: {<x1, x2, ... , xn> | P(x1, x2, ..., xn)}.
fi
fi
Example: Names of CSE students:
{<n> | ∃r ∃a ∃d (STUDENT(r, n, a, d) ∧ d='CSE')}
4. Keys in Relational Model
• Super Key: Set of attributes that uniquely identi es a tuple.
◦ Example: {RollNo}, {RollNo, Name}.
• Candidate Key: Minimal super key.
◦ Example: {RollNo}.
• Primary Key: Chosen candidate key.
◦ Example: RollNo.
• Alternate Key: Other candidate keys not chosen as primary key.
◦ Example: {Email}.
5. Integrity Constraints
1. Domain Constraint
Values in attribute must come from the de ned domain.
Example: Age must be between 18 and 60.
2. Entity Integrity
Primary key cannot be null.
fi
fi
Ensures each tuple is uniquely identi able.
3. Referential Integrity
Foreign key value must either:
1. Match a primary key in the referenced table, OR
2. Be null (if allowed).
Example: [Link] → must exist in STUDENT table.
4. Key Constraint
No two tuples in a relation can have the same primary key value.
fi
Unit 4: Structured Query Language (SQL)
1. SQL Language Categories
a) DDL (Data De nition Language)
Purpose: De nes, modi es, and deletes the structure of database objects
(tables, schemas, etc.).
Commands:
• CREATE: Creates tables, views, indexes, etc.
• ALTER: Modi es existing database structures.
• DROP: Deletes objects from the database.
• TRUNCATE: Removes all records from a table, but keeps the table
structure.
Example:
CREATE TABLE Employee (
fi
fi
fi
fi
EmpID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT
);
ALTER TABLE Employee ADD Salary DECIMAL(10,2);
DROP TABLE Employee;
TRUNCATE TABLE Employee;
b) DML (Data Manipulation Language)
Purpose: Manipulates data stored in tables.
Commands:
• INSERT: Adds new records.
• UPDATE: Modi es existing records.
• DELETE: Removes records.
Example:
INSERT INTO Employee (EmpID, Name, Age) VALUES (1,
'Alice', 25);
UPDATE Employee SET Age = 26 WHERE EmpID = 1;
DELETE FROM Employee WHERE EmpID = 1;
c) DCL (Data Control Language)
fi
Purpose: Controls access to data in the database.
Commands:
• GRANT: Gives user access privileges.
• REVOKE: Removes access privileges.
Example:
GRANT SELECT ON Employee TO user1;
REVOKE SELECT ON Employee FROM user1;
d) TCL (Transaction Control Language)
Purpose: Manages transactions in the database.
Commands:
• COMMIT: Saves all changes made in the transaction.
• ROLLBACK: Undoes changes since last commit.
• SAVEPOINT: Sets a point within a transaction for partial rollback.
Example:
BEGIN TRANSACTION;
UPDATE Employee SET Age = 27 WHERE EmpID = 1;
SAVEPOINT UpdateAge;
DELETE FROM Employee WHERE EmpID = 2;
ROLLBACK TO UpdateAge;
COMMIT;
2. Basic SQL Syntax
a) SELECT Statement
Retrieves data from tables.
Syntax:
SELECT column1, column2, ... FROM table_name
[WHERE condition]
[GROUP BY column]
[HAVING group_condition]
[ORDER BY column [ASC|DESC]];
b) WHERE Clause
Filters rows based on a condition.
Example:
SELECT * FROM Employee WHERE Age > 25;
c) GROUP BY Clause
Groups rows that have the same values in speci ed columns.
fi
Example:
SELECT DeptID, COUNT(*) FROM Employee GROUP BY DeptID;
d) HAVING Clause
Applies conditions on groups (used with GROUP BY).
Example:
SELECT DeptID, COUNT(*) FROM Employee GROUP BY DeptID
HAVING COUNT(*) > 5;
e) ORDER BY Clause
Sorts the result-set.
Example:
SELECT * FROM Employee ORDER BY Age DESC;
3. Joins
a) Inner Join
Returns only matching rows from both tables.
Syntax:
SELECT [Link], [Link]
FROM Employee E
INNER JOIN Department D ON [Link] = [Link];
b) Left Join (Left Outer Join)
Returns all rows from the left table and matching rows from right. Fills
NULLs where there is no match.
Syntax:
SELECT [Link], [Link]
FROM Employee E
LEFT JOIN Department D ON [Link] = [Link];
c) Right Join (Right Outer Join)
Returns all rows from the right table and matching rows from left.
Syntax:
SELECT [Link], [Link]
FROM Employee E
RIGHT JOIN Department D ON [Link] = [Link];
d) Full Join (Full Outer Join)
Returns rows when there is a match in either left or right table.
Syntax:
SELECT [Link], [Link]
FROM Employee E
FULL OUTER JOIN Department D ON [Link] = [Link];
e) Self Join
A table joined with itself.
Syntax:
SELECT [Link], [Link] AS Manager
FROM Employee E1
LEFT JOIN Employee E2 ON [Link] = [Link];
4. Subqueries
a) Scalar Subquery
Returns a single value.
Example:
SELECT Name, (SELECT MAX(Salary) FROM Employee) AS
MaxSalary FROM Employee;
b) Correlated Subquery
References column from outer query; evaluated once for each row.
Example:
SELECT Name, Salary
FROM Employee E1
WHERE Salary > (SELECT AVG(Salary) FROM Employee E2
WHERE [Link] = [Link]);
c) Nested Subquery
Subquery inside another subquery.
Example:
SELECT Name
FROM Employee
WHERE DeptID IN (
SELECT DeptID FROM Department WHERE Location =
'Hyderabad'
);
5. Set Operations
• UNION: Combines results from two queries, removing duplicates.
• UNION ALL: Combines all results (keeps duplicates).
• INTERSECT: Returns rows present in both queries.
• EXCEPT (MINUS): Rows in rst query but not in second.
Example:
SELECT Name FROM Employee WHERE DeptID = 1
UNION
SELECT Name FROM Employee WHERE DeptID = 2;
SELECT Name FROM Employee
INTERSECT
SELECT Name FROM RetiredEmployee;
SELECT Name FROM Employee
EXCEPT
SELECT Name FROM ResignedEmployee;
6. Views and Indexes
a) Views
De nition: Virtual tables based on SQL queries.
Syntax:
fi
fi
CREATE VIEW HighEarners AS
SELECT Name, Salary FROM Employee WHERE Salary >
100000;
Use: Simpli es queries, enforces security, restricts access.
b) Indexes
De nition: Data structures that speed up search and retrieval.
Syntax:
CREATE INDEX idx_name ON Employee(Name);
Use: Increases query performance; but may slow down writes (INSERT/
UPDATE/DELETE).
7. Aggregation Functions
• SUM(column): Total sum.
• COUNT(column or *): Number of rows.
• AVG(column): Average value.
• MAX(column): Maximum value.
• MIN(column): Minimum value.
Example:
SELECT COUNT(*) FROM Employee;
fi
fi
SELECT AVG(Salary) FROM Employee WHERE DeptID = 1;
SELECT MAX(Age), MIN(Age) FROM Employee;
8. Triggers and Stored Procedures (Basics)
a) Trigger
De nition: Set of SQL statements that automatically executes when an
event occurs.
Example:
CREATE TRIGGER trg_UpdateSalary
AFTER UPDATE ON Employee
FOR EACH ROW
BEGIN
INSERT INTO AuditTable(EmpID, OldSalary, NewSalary,
ChangeDate)
VALUES ([Link], [Link], [Link], NOW());
END;
b) Stored Procedure
De nition: Named collection of SQL statements; can accept parameters
and return values.
fi
fi
Example:
CREATE PROCEDURE GetHighEarners (IN minSalary
DECIMAL(10,2))
BEGIN
SELECT Name, Salary FROM Employee WHERE Salary >
minSalary;
END;
9. Constraints
Types and Syntax
• CHECK: Restricts values in a column.
CHECK (Age >= 18)
• NOT NULL: Ensures value cannot be NULL.
Name VARCHAR(50) NOT NULL
• UNIQUE: Ensures all values are unique.
Email VARCHAR(100) UNIQUE
• DEFAULT: Sets a default value if none is given.
Status VARCHAR(10) DEFAULT 'Active'
• PRIMARY KEY: Uniquely identi es each row.
EmpID INT PRIMARY KEY
fi
• FOREIGN KEY: Enforces referential integrity with another table.
DeptID INT,FOREIGN KEY (DeptID) REFERENCES
Department(DeptID)
10. Sample Exam Queries
1. List names and salaries of employees in the 'HR' department,
ordered by salary descending:
SELECT Name, Salary FROM Employee
WHERE DeptID = (SELECT DeptID FROM Department WHERE
DeptName = 'HR')
ORDER BY Salary DESC;
2. Find departments with more than 5 employees:
SELECT DeptID, COUNT(*) FROM Employee
GROUP BY DeptID
HAVING COUNT(*) > 5;
3. List employees who do not manage anyone:
SELECT Name FROM Employee
WHERE EmpID NOT IN (SELECT ManagerID FROM Employee
WHERE ManagerID IS NOT NULL);
4. Show department names and total salaries, only for departments
with average salary > 50,000:
SELECT [Link], SUM([Link])
FROM Department D JOIN Employee E ON [Link] =
[Link]
GROUP BY [Link]
HAVING AVG([Link]) > 50000;
Unit 5: Normalization & Functional Dependencies
1. Functional Dependencies (FDs)
De nition
A Functional Dependency (FD), denoted as X → Y, is a relationship
between two sets of attributes X and Y in a relation such that if two tuples
agree on X, they must agree on Y.
Example: In Student(RollNo, Name, Dept), RollNo → Name (if RollNo
same, Name must be same).
Types of FDs
• Trivial FD: Y ⊆ X (e.g., {A,B} → A)
• Non-Trivial FD: Y ⊈ X (e.g., RollNo → Name)
• Fully Functional Dependency: X → Y, and for any proper subset of
X, the FD does not hold.
• Partial Dependency: In a relation with composite key, an attribute
depends only on a part of the key.
• Transitive Dependency: X → Y and Y → Z implies X → Z.
2. Closure of Attributes
De nition
fi
fi
The closure of a set of attributes X with respect to a set of FDs F (denoted
as X⁺) is the set of attributes that are functionally determined by X under F.
Finding Closure:
1. Start with X⁺ = X.
2. For each FD Y → Z, if Y ⊆ X⁺, add Z to X⁺.
3. Repeat until no more attributes can be added.
Example:
Given FDs: A → B, B → C, C → D Find A⁺:
• Start: A⁺ =
• A→B add B:
• B→C add C:
• C→D add D:
So, A⁺ =
3. Minimal Cover
De nition
A minimal cover (canonical cover) for a set of FDs is an equivalent set of
FDs with:
• Single attribute on the right side of each FD.
• No extraneous attributes in left or right side.
fi
• No redundant FDs.
Steps to Find Minimal Cover:
1. Right Reduction: Ensure single attribute on RHS (decompose if
not).
2. Left Reduction: Remove extraneous attributes from LHS.
3. Remove Redundant FDs: If an FD can be derived from others,
remove it.
Example:
FDs: AB → C, A → C, C → D
1. Split AB → C to A → C, B → C (if possible)
2. Check if A → C is redundant (since AB → C present)
3. Remove redundant FDs
4. Normalization
Purpose
• Eliminate redundancy and anomalies (update, insert, delete).
• Organize attributes into well-structured relations.
1NF (First Normal Form)
Condition: All attributes have atomic (indivisible) values.
• No repeating groups/multivalued attributes.
Example: Not in 1NF:
RollN
Name Phone Numbers
o
1 Raj 912345, 998877
In 1NF:
RollN
Name PhoneNumber
o
1 Raj 912345
1 Raj 998877
2NF (Second Normal Form)
Condition:
• In 1NF, and
• No partial dependency (no non-prime attribute depends only on part
of candidate key).
Applies to relations with composite keys.
Example: Table: (RollNo, CourseCode, Grade) FDs: (RollNo, CourseCode)
→ Grade; RollNo → Name
Name depends only on RollNo (partial).
Fix: Separate into
1. Student(RollNo, Name)
2. Enroll(RollNo, CourseCode, Grade)
3NF (Third Normal Form)
Condition:
• In 2NF, and
• No transitive dependency (non-prime attribute does not depend on
another non-prime attribute).
Rule: For every FD X → Y, at least one of:
1. X is a superkey, or
2. Y is a prime attribute (part of a candidate key)
Example: Table: (RollNo, DeptID, DeptName) FDs: RollNo → DeptID,
DeptID → DeptName
DeptName depends transitively on RollNo via DeptID.
Fix:
1. Student(RollNo, DeptID)
2. Department(DeptID, DeptName)
BCNF (Boyce-Codd Normal Form)
Condition: For every non-trivial FD X → Y, X is a superkey.
Stricter than 3NF.
Example: Table: (Student, Course, Instructor) FDs: (Student, Course) →
Instructor; Instructor → Course
Instructor is not a superkey; violates BCNF.
Fix: Decompose into
1. Instructor(Instructor, Course)
2. Enrollment(Student, Instructor)
4NF (Fourth Normal Form, Multi-Valued Dependencies)
Condition:
• In BCNF, and
• No non-trivial multi-valued dependencies.
Multi-valued Dependency (MVD): X →→ Y means for each X, there can
be multiple independent Y values.
Example: Table: (Student, Language, Sport)
Student can have multiple languages and multiple sports, independently.
Fix:
1. StudentLanguage(Student, Language)
2. StudentSport(Student, Sport)
5NF (Fifth Normal Form, Join Dependency)
Condition:
• In 4NF, and
• Every join dependency in the relation is implied by candidate keys.
Occurs in rare, complex situations with cyclic dependencies.
Example: Supplier-Parts-Projects
5. Decomposition
Lossless Join Decomposition
De nition: Decomposition is lossless if no information is lost after
decomposition; original relation can be reconstructed by joining.
Test: At least one decomposed relation contains a candidate key of the
original relation.
Dependency Preservation
De nition: All functional dependencies can be enforced without joining the
decomposed relations.
Goal: Minimize joins for checking constraints.
Summary Table
Norma
What It Removes Key Condition
l Form
Repeating groups,
1NF Atomic values in every cell
multivalued attrs
Partial 1NF + No partial dependency (non-
2NF
dependencies prime depends on full key)
Transitive
3NF 2NF + No transitive dependency
dependencies
Certain anomalies
BCNF For every X → Y, X is superkey
3NF can't x
fi
fi
fi
Multi-valued
4NF BCNF + No non-trivial MVDs
dependencies
4NF + Lossless join for every join
5NF Join dependencies
dependency
Unit 6: Transaction Management & Concurrency
Control
1. ACID Properties
A. Atomicity
De nition: A transaction is an indivisible unit; it either completes in full or
has no effect at all.
Ensured by: Transaction logs and rollbacks.
B. Consistency
De nition: A transaction brings the database from one valid state to
another, maintaining integrity constraints.
Ensured by: Constraints, triggers, rules.
C. Isolation
De nition: Concurrent execution of transactions leaves the DB in the
same state as if transactions were executed one after another.
Ensured by: Concurrency control protocols (locks, timestamps).
D. Durability
De nition: Once a transaction commits, its changes survive system
failures.
Ensured by: Commit logs, stable storage.
fi
fi
fi
fi
2. Transactions: Begin, Commit, Rollback
Transaction: A logical unit of work; consists of one or more operations
(read, write).
• Begin: Transaction starts (BEGIN TRANSACTION;)
• Commit: Save all changes made by the transaction (COMMIT;)
• Rollback: Undo all changes since the last commit (ROLLBACK;)
• Savepoint: Intermediate marker for partial rollbacks.
Example:
BEGIN TRANSACTION;
UPDATE Account SET Balance = Balance - 500 WHERE AccID
= 1;
UPDATE Account SET Balance = Balance + 500 WHERE AccID
= 2;
COMMIT;
3. Serializability
A. What is Serializability?
De nition: The gold standard for correctness in concurrent schedules. A
schedule is serializable if its result is equivalent to some serial (one-after-
another) schedule.
fi
B. Types of Serializability
i. Con ict Serializability
De nition: A schedule is con ict-serializable if it can be transformed into a
serial schedule by swapping non-con icting adjacent operations.
Con icting operations:
fi
fl
fl
fl
fl
• At least one is a write.
• Both operate on the same data item.
Testing: Build a precedence (serialization) graph.
ii. View Serializability
De nition: A schedule is view-serializable if it is view-equivalent to a serial
schedule.
View equivalence:
• Initial reads are from the same source.
• Final writes are the same.
• Each read gets the same value as in serial schedule.
Note: All con ict-serializable schedules are view-serializable, but not all
view-serializable schedules are con ict-serializable.
C. Schedules
• Serial Schedule: Transactions execute one after another, with no
overlap.
• Concurrent/Interleaved Schedule: Operations of multiple
transactions are mixed.
• Serializable Schedule: Interleaved, but result equivalent to some
serial schedule.
• Non-Serializable Schedule: May cause inconsistency.
D. Precedence (Serialization) Graph
fi
fl
fl
• Nodes: Transactions (T1, T2, ...)
• Edges: Draw an edge from Ti to Tj if an operation in Ti precedes and
con icts with an operation in Tj (on same data).
Rule: If the graph has no cycles, the schedule is con ict-serializable.
Example:
Schedule:
T1: R(A) W(A)
T2: R(A) W(A)
Precedence graph: T1 → T2 (because T2's R(A) after T1's W(A)), check for
cycles.
4. Concurrency Control Protocols
fl
fl
A. Two-Phase Locking (2PL)
Basic 2PL:
1. Growing phase: Transaction can acquire locks but not release.
2. Shrinking phase: Can release locks but not acquire new ones.
Ensures con ict-serializability.
Types:
Strict 2PL:
• All exclusive (write) locks held until transaction commits/aborts.
• Guarantees serializability + recoverability (prevents cascading
rollbacks).
Rigorous 2PL:
• All locks (shared + exclusive) held till commit/abort.
Drawback: May cause deadlocks.
B. Timestamp Ordering Protocol
Each transaction gets a unique timestamp at start.
Rules:
• Each data item has Read_Timestamp(X) and Write_Timestamp(X).
fl
• Read rule: If TS(T) < Write_TS(X): Abort and restart T.
• Write rule: If TS(T) < Read_TS(X) or Write_TS(X): Abort and restart
T.
Advantage: Deadlock-free. Drawback: Starvation possible.
C. Deadlock Handling
Detection
• Wait-for graph: Nodes are transactions; edge Ti→Tj if Ti waits for a
lock held by Tj.
• Cycle = Deadlock.
• Resolution: Abort one or more transactions to break the cycle.
Prevention
Wait-Die:
• Older transactions may wait; younger abort and restart.
Wound-Wait:
• Older transactions preempt (force abort) younger ones; younger must
wait.
Timeouts:
• Transactions waiting too long are aborted.
5. Cascading Rollback & Recoverability
A. Cascading Rollback
De nition: If a transaction aborts and other transactions have read its
uncommitted changes, they must also roll back. This is called cascading
rollback.
Prevention: Use Strict 2PL to ensure transactions read only committed
data.
B. Recoverability
• Recoverable schedule: If T2 reads data written by T1, then T2
commits only after T1 commits.
• Cascadeless schedule: Transactions only read data written by
committed transactions.
Summary Table
Concept Key Point / De nition
Atomicity All or nothing
Consistency Preserves database rules
Isolation Transactions do not interfere
Durability Changes survive crashes
Serial
One transaction at a time
Schedule
Serializable Equivalent to serial schedule
Two phases: growing (acquire locks), shrinking
2PL
(release)
All write locks released at commit/abort (prevents
Strict 2PL
cascading rollback)
fi
fi
Rigorous
All locks (read/write) held till commit/abort
2PL
Timestamp
Transactions ordered by timestamps; avoids deadlock
Order
Wait-Die Older waits, younger aborts
Wound-Wait Older preempts, younger waits
Cascading
One abort leads to many; avoid with strict protocols
Roll
Precedence Nodes = transactions; edges = dependencies; no cycle
Graph → con ict-serializable
Unit 7: File Organization & Indexing
fl
1. File Organization
A. Heap (Unordered) File Organization
Description: Records are placed wherever space is available; no ordering.
Operations:
• Insert: Fast (just add at the end or in a free slot).
• Search: Linear scan (slow for large les).
• Delete: Find and remove; may create holes.
Use Cases: Small tables, temporary data.
B. Sequential (Ordered) File Organization
Description: Records are stored in order (usually by search key).
Operations:
• Search: Ef cient for range queries; can use binary search.
• Insert/Delete: May require shifting records.
Use Cases: Applications with mostly sequential access (e.g., payroll
processing).
C. Hashing
Description: Hash function computes bucket for each record.
Operations:
fi
fi
• Insert/Find/Delete: O(1) on average if hashing is good.
• Search: Fast for exact key, poor for range queries.
• Over ow: Handled with over ow chaining or rehashing.
Use Cases: When direct record lookup is common.
D. Clustered File Organization
Description: Records of different tables that are frequently accessed
together are stored physically together (e.g., cluster Employee and
Department if joined often).
Bene t: Improves join performance.
2. Indexing
A. What is an Index?
De nition: Data structure that improves the speed of data retrieval on a
database table.
Analogy: Book index points to pages with terms.
B. Primary vs Secondary Index
Primary Index:
• Built on the ordering key of a sequential le.
fi
fl
fi
fl
fi
• One per table.
Secondary Index:
• Built on non-ordering or non-primary attributes.
• Can be multiple per table.
C. Dense vs Sparse Index
Dense Index:
• Every search key value appears in the index.
• Index has entry for every record.
Sparse Index:
• Entries for only some records (usually one entry per data block).
• Fewer entries, but search involves extra steps.
Table Example:
Dense Index (every record):
Key Record Ptr
1 ...
2 ...
Sparse Index (every block):
Ke
Record Ptr
y
1 ...
100 ...
200 ...
D. Multilevel Index
Problem: Single-level index too large for memory.
Solution: Index the index!
• First Level: Index of data blocks.
• Second Level: Index of rst level, etc.
Reduces number of disk accesses to O(logₙ N), where n is the fanout.
E. B+ Tree Indexing
fi
Structure
• Balanced multi-level index structure.
• Internal Nodes: Store keys, guide search.
• Leaf Nodes: Store (key, record pointer) pairs, form a linked list for
fast range scans.
• All data found at leaves (unlike B-trees).
Properties
• Each node (except root) has [ceil(m/2), m] children (where m is
order).
• Height is logₘ N (very shallow for large N).
• All leaves at same depth.
Search
1. Start at root.
2. At each node, pick correct child (binary search among keys).
3. Follow down to leaf.
4. At leaf, scan for record.
Insert Steps (Example for m=4):
1. Find leaf for key.
2. Insert key in order.
3. If node over ows (> m-1 keys), split node:
◦ Middle key promoted to parent.
◦ Split can propagate up to root (causing new root).
4. Leaf nodes stay linked.
fl
Delete Steps:
1. Remove key from leaf.
2. If under ow (< ceil(m/2) keys), borrow from sibling or merge.
3. Update parent as needed.
4. May cause root to shrink.
Example: Order m = 4 B+ Tree Insert Steps: Insert: 10, 20, 5, 6, 12 (Draw
each step. Show split when adding 12.)
F. Hash Indexing
Structure: Hash table points to data buckets.
• Fast exact match lookup (O(1) if no collision).
• Not good for range queries.
3. Buffer Management, Record Formats
A. Buffer Management
• Buffers: Memory pages where disk blocks are loaded.
• Goal: Minimize disk I/O by caching frequently/recently used pages.
• Replacement Policies: LRU (Least Recently Used), FIFO, etc.
B. Record Formats
Fixed Length:
• All elds have xed size; easy but can waste space.
Variable Length:
fi
fl
fi
• Pointers/lengths indicate eld boundaries; supports long strings/
blobs.
4. Access Methods
• Sequential Access: Scan records in order.
• Direct Access: Jump directly to a record (e.g., via index/hash).
• Index Access: Use index to retrieve relevant records fast.
Summary Table
Concept Key Point/De nition
Heap File Unordered, fast insert, slow search
Sequential File Ordered, fast range, slow insert/delete
Hashing O(1) direct access, no range support
Clustered Store related records together
Primary Index On ordering key, one per table
Secondary Index On any non-ordering eld, can be multiple
Dense Index Entry for every record
Sparse Index Entry per data block
Multilevel Index Index of indexes, supports large datasets
Balanced, ef cient, all data at leaves, great for
B+ Tree
range
Hash Indexing Direct lookup, not for ranges
Buffer
Reduces disk I/O using memory
Management
Record Formats Fixed or variable length
fi
fi
fi
fi
Access Methods Sequential, direct, index-based
Unit 8: Query Processing & Optimization
1. Query Execution Pipeline
A. Phases of SQL Query Processing
1. Parsing
◦ SQL query is checked for syntax and semantics.
◦ Output: Parse tree or query tree (internal representation).
2. Translation
◦ SQL is converted to an internal relational algebra or logical
plan.
◦ Type checks, rewrites (e.g., subquery attening).
3. Optimization
◦ Finds the most ef cient way to execute the query.
◦ Output: Optimal query execution plan.
4. Code Generation
fi
fl
◦ Query plan is translated into low-level operations (e.g., iterator,
function calls).
5. Execution
◦ The plan is run on the database system using access methods
and storage engines.
2. Parsing, Optimization, Execution
A. Parsing
• Checks for syntax errors (typos, keywords).
• Validates table/column names, types, constraints.
• Generates parse tree.
B. Optimization
• Cost-based optimization: Estimates different plans and chooses the
least expensive.
• Heuristic-based optimization: Applies rules of thumb (push
selection down, combine projections, etc.).
• Considers available indexes, statistics (table size, index height, etc.).
• Evaluates join order, method, and access path.
C. Execution
• Executes the nal plan.
• Uses buffer management, index lookups, joins, aggregations, etc.
3. Evaluation of Expressions (Cost Models)
A. What is Cost?
fi
Cost = Number of disk I/Os + CPU time (I/O dominates).
Factors: Table size, indexes, join methods, selectivity of predicates.
B. Access Path Cost
Full Table Scan:
• Reads all blocks of table.
• Cost: B (number of blocks).
Index Scan:
• Uses index, then fetches data blocks.
• Cost: Index height (logₙ N) + number of qualifying records.
C. Join Method Cost Comparison
Nested Loop Join:
• For each tuple in outer table, scan inner table.
• Cost: M + (M × N), where M = blocks of outer, N = blocks of inner.
Index Nested Loop Join:
• Use index on inner table.
• Cost: M + (outer tuples × index lookup cost).
Sort-Merge Join:
• Sort both tables, merge in one pass.
• Cost: Sorting cost (M log M + N log N) + merging cost.
Hash Join:
• Build hash table on smaller input; scan larger and match.
• Cost: Scan both tables once, plus hashing overhead.
D. Example:
Suppose Table A has 1,000 blocks, Table B has 100 blocks.
Block Nested Loop Join: Outer = 1,000, Inner = 100 → 1,000 + (1,000 ×
100) = 101,000 block accesses (very expensive).
Index Nested Loop (if index on B): Outer = 1,000 If index height = 3, and
1 match per A: 1,000 × 3 = 3,000 + 1,000 = 4,000 block accesses (much
faster).
4. Query Plan Trees
A. What is a Query Plan Tree?
• Tree of relational algebra operators (select, project, join, etc.).
• Leaves: Input tables.
• Internal nodes: Operations (join, select, aggregate, etc.).
• Shows order in which operations are performed.
B. Example
Query:
SELECT Name FROM Student WHERE Dept = 'CSE' AND Marks >
80;
Plan Tree:
πName
|
σDept='CSE' AND Marks>80
|
Student
5. Join Order Optimization
A. Why Optimize Join Order?
Joins are expensive; the order drastically affects performance.
For n tables, there are (n-1)! possible join orders.
Example: For 4 tables: 3! = 6 different orders.
B. Strategies
Left-Deep vs Right-Deep vs Bushy Trees:
• Left-Deep: Most common in practice; easier to pipeline.
• Right-Deep: Can be better for parallel processing.
• Bushy: More complex but can be optimal.
Dynamic Programming:
• Build optimal plans bottom-up.
• Consider all possible subsets and join orders.
C. Heuristics
1. Push selections down: Apply WHERE clauses as early as possible.
2. Choose smaller table as outer: In nested loop joins.
3. Use indexes: When available for join conditions.
4. Consider cardinality: Join tables that produce smaller intermediate
results rst.
fi
Unit 9: Distributed & NoSQL Databases
1. CAP Theorem
De nition
CAP stands for Consistency, Availability, Partition Tolerance.
Theorem (Eric Brewer, 2000): In any distributed data store, you can
guarantee only 2 out of 3 at any moment:
• Consistency (C): Every read receives the most recent write or an
error.
fi
• Availability (A): Every request receives a (non-error) response, but
not guaranteed to be the latest.
• Partition Tolerance (P): System continues to operate despite
network partitions (failures splitting the network).
Implication
Cannot guarantee all three together in real distributed systems.
Design choices:
• CA (no P): Not fault-tolerant—rare for real-world.
• CP: Consistent, Partition-tolerant (but may become unavailable).
• AP: Available, Partition-tolerant (may serve stale data).
Example Table:
DB System CAP Type
MongoDB CP (by default), tunable to AP
Cassandra AP
DynamoDB AP
RDBMS (single node) CA
2. BASE vs ACID
ACID (Traditional RDBMS)
• Atomicity: Transactions are all-or-nothing.
• Consistency: Transactions bring DB from one valid state to another.
• Isolation: Concurrent transactions don't interfere.
• Durability: Once committed, always saved.
BASE (NoSQL/Distributed DBs)
• Basically Available: System guarantees availability.
• Soft state: State may change over time, even without input (due to
eventual consistency).
• Eventual Consistency: System will become consistent over time if
no new updates.
Summary:
• ACID: Strong consistency, safe, but less scalable.
• BASE: High scalability, availability, but eventually consistent.
3. Sharding & Replication
A. Sharding
De nition: Splitting data across multiple servers (shards) to distribute load
and storage.
Example: User data partitioned by user_id: users 1–1000 on shard1,
1001–2000 on shard2, etc.
Purpose: Scale horizontally (add more servers as data grows).
B. Replication
De nition: Storing copies of data on multiple servers.
fi
fi
Purpose:
• Fault tolerance (if one node fails, data isn't lost).
• High availability (queries can go to any replica).
Types:
• Master-slave (primary-secondary): One node handles writes,
others sync from it.
• Multi-master: All nodes can handle writes.
4. NoSQL Database Types
A. Document Stores (e.g., MongoDB)
Structure: JSON-like documents (dynamic schema).
Use-case: Semi-structured data, catalogs, user pro les, content
management.
Example Document:
{ "user": "john", "age": 30, "emails":
["j@[Link]"] }
B. Key-Value Stores (e.g., Redis, DynamoDB)
Structure: Simple (key, value) pairs; values can be strings, blobs, or
serialized objects.
fi
Use-case: Caching, session storage, fast lookup.
Example:
user:123 {...profile data...}
C. Column Stores (e.g., Cassandra, HBase)
Structure: Data stored column-wise, not row-wise; allows ef cient reads of
speci c columns.
Use-case: Time series data, analytics, big data.
Example:
Columns: UserID | Name | Email | Purchases
D. Graph Databases (e.g., Neo4j)
Structure: Nodes (entities), Edges (relationships), Properties (attributes).
Use-case: Social networks, fraud detection, recommendation engines.
Example:
• Nodes: User, Movie
• Edge: User "LIKES" Movie
5. RDBMS vs NoSQL: Key Differences
fi
→
fi
Feature RDBMS NoSQL
Data Key-value, Document,
Relational (tables)
Model Column, Graph
Schema Fixed, pre-de ned Dynamic or schema-less
Scalabilit Horizontal (more servers/
Vertical (bigger server)
y shards)
Consiste
Strong (ACID) Eventual, tunable (BASE)
ncy
Joins Supported Limited or not supported
Use- Transactional apps Big Data, web-scale, social,
case (banking, ERP) IoT
6. Use Cases & When to Use What
When to use RDBMS
• Strict data integrity is needed (banking, nance).
• Data is highly structured, relational.
• Complex queries/joins are common.
When to use NoSQL
• Need for massive scalability (millions of users, web-scale).
• Data is unstructured/semi-structured or evolving.
• Real-time analytics, exible schemas.
Examples:
• MongoDB: Content management, product catalogs.
fi
fl
fi
• Redis: Session store, caching, leaderboard.
• Cassandra: Logging, IoT, time series.
• Neo4j: Social network, fraud graph.
7. Interview-Style Use Case Q&A
Q: Would you use MongoDB or MySQL for an e-commerce product
catalog? A: MongoDB—supports exible, dynamic product attributes; easy
to store different categories with different elds.
Q: Why use Cassandra for IoT sensor data? A: Columnar store; high
write throughput; can handle billions of time-stamped entries across
clusters.
Q: What database for storing millions of relationships (friends/
follows) in a social network? A: Graph DB (Neo4j)—optimized for fast
traversal of connected data.
Q: When is Redis ideal? A: When ultra-fast, in-memory access is needed
(caching, ephemeral data, real-time counters).
Summary Table
Concept Key Points / Examples
Only 2 of Consistency, Availability, Partition
CAP Theorem
tolerance
ACID: Strict, safe; BASE: Scalable, eventual
ACID vs BASE
consistency
fl
fi
Sharding Horizontal partitioning for scalability
Replication Copies for fault-tolerance/availability
Document Store JSON docs, exible; MongoDB
Key-Value Store Simple map; Redis, DynamoDB
Column Store Columnar data; Cassandra, HBase
Graph DB Nodes, edges; Neo4j
RDBMS vs Structured vs exible, strong vs eventual
NoSQL consistency
200+ Most Frequently Asked DBMS Interview
Questions
Basic DBMS Concepts (1-30)
1. What is a Database Management System (DBMS)?
A DBMS is software that enables users to de ne, create, maintain,
and control access to databases. It provides a systematic and
fl
fl
fi
organized way of storing, managing, and retrieving data, ensuring
data integrity, security, and consistency.
2. Explain the difference between DBMS and RDBMS.
A DBMS manages databases in general, which may be hierarchical,
network, or other models. An RDBMS (Relational DBMS) speci cally
manages relational databases, where data is stored in tables
(relations) and supports relational integrity, SQL, and ACID
properties.
3. What are the advantages of DBMS over le systems?
• Reduced data redundancy and inconsistency
• Data sharing and centralized control
• Improved data security and integrity
• Support for backup and recovery
• Concurrent access and transaction management
• Data abstraction and independence
4. What is a database schema?
A schema is the logical structure of a database, de ning tables, elds,
relationships, views, indexes, and constraints. It acts as a blueprint
for how data is organized.
5. De ne data independence and its types.
Data independence is the ability to change the schema at one level
without affecting the next higher level.
fi
fi
fi
fi
fi
• Physical data independence: Change in physical storage does not
affect logical schema.
• Logical data independence: Change in logical schema does not affect
external views.
6. What is the 3-schema architecture in DBMS?
It is a framework with three levels:
• Internal (physical): How data is stored
• Conceptual (logical): Logical structure of data
• External (view): User-speci c views
7. Explain the concept of data abstraction.
Data abstraction hides the complexity of data storage and presents
users with a simpli ed view. It allows users to interact with data
without knowing storage details.
8. What are the different types of database users?
• DBA: Manages the database
• Application developers: Build applications
• End users: Use applications (naive, sophisticated, casual,
specialized)
• System analysts/designers: Design database structure
9. What is metadata in DBMS?
Metadata is data about data, such as table de nitions, data types,
constraints, and indexes. It helps the DBMS manage and interpret the
actual data.
10. Differentiate between logical and physical data independence.
fi
fi
fi
• Logical: Ability to change logical schema without affecting external
views
• Physical: Ability to change physical storage without affecting logical
schema
11. What is a data model? Name different types.
A data model is a conceptual tool for describing data, relationships,
and constraints. Types: hierarchical, network, relational, object-
oriented, entity-relationship.
12. Explain hierarchical, network, and relational data models.
• Hierarchical: Tree-like structure, parent-child relationships
• Network: Graph structure, many-to-many relationships
• Relational: Tables with rows and columns, based on relations
13. What is database normalization and why is it important?
Normalization is the process of organizing data to reduce redundancy
and improve integrity. It prevents anomalies and ensures ef cient
data management.
14. What are the different normal forms?
• 1NF: Atomic values
• 2NF: No partial dependency
• 3NF: No transitive dependency
• BCNF: Every determinant is a candidate key
• 4NF: No multi-valued dependency
• 5NF: No join dependency
fi
15. What is denormalization? When would you use it?
Denormalization is the process of combining tables to reduce joins
and improve read performance, often used in OLAP or reporting
systems.
16. What is a primary key? Can it be null?
A primary key uniquely identi es each record in a table and cannot be
null.
17. Difference between primary key and unique key.
Both enforce uniqueness, but a table can have only one primary key
(not null), while it can have multiple unique keys (can be null).
18. What is a foreign key and referential integrity?
A foreign key is an attribute in one table that references the primary
key of another table, enforcing referential integrity (valid
relationships).
19. What are candidate keys and alternate keys?
Candidate keys are minimal sets of attributes that uniquely identify
records. Alternate keys are candidate keys not chosen as the primary
key.
20. What is a composite key?
A composite key consists of two or more attributes that together
uniquely identify a record.
fi
21. What is a surrogate key?
A surrogate key is an arti cial, system-generated key (like an auto-
incremented number) used as a unique identi er.
22. Explain super key with an example.
A super key is any set of attributes that uniquely identi es a record.
Example: {RollNo, Name} is a super key if RollNo is unique.
23. What is the difference between DELETE, DROP, and TRUNCATE?
• DELETE: Removes rows, can use WHERE, can be rolled back
• DROP: Removes entire table/schema, cannot be rolled back
• TRUNCATE: Removes all rows, cannot use WHERE, cannot be
rolled back
24. What is a view in DBMS?
A view is a virtual table based on a query, presenting data from one
or more tables.
25. What are materialized views?
Materialized views store the result of a query physically, allowing
faster access but requiring refresh when data changes.
26. What is an index? Types of indexes.
An index is a data structure that speeds up data retrieval. Types:
primary, secondary, clustered, non-clustered, unique, composite,
bitmap, B-tree, hash.
27. Difference between clustered and non-clustered indexes.
• Clustered: Alters the physical order of data, one per table
fi
fi
fi
• Non-clustered: Separate structure, does not affect physical order,
multiple per table
28. What is database partitioning?
Partitioning divides a table into smaller, manageable pieces
(partitions) for performance, maintenance, or scalability.
29. What is sharding in databases?
Sharding is horizontal partitioning where data is distributed across
multiple servers or nodes to improve scalability.
30. Explain database replication.
Replication is the process of copying and maintaining database
objects in multiple locations for fault tolerance and high availability.
SQL and Query Processing (31-70)
31. What is SQL? List different types of SQL commands.
SQL (Structured Query Language) is used to manage and manipulate
relational databases. Types:
• DDL (Data De nition Language): CREATE, ALTER, DROP
• DML (Data Manipulation Language): SELECT, INSERT, UPDATE,
DELETE
• DCL (Data Control Language): GRANT, REVOKE
• TCL (Transaction Control Language): COMMIT, ROLLBACK,
SAVEPOINT
32. Difference between DDL, DML, DCL, and TCL.
fi
• DDL: De nes/modi es structure (tables, schemas)
• DML: Manipulates data (insert, update, delete)
• DCL: Controls access (permissions)
• TCL: Manages transactions (commit, rollback)
33. What is the difference between WHERE and HAVING clauses?
• WHERE: Filters rows before grouping/aggregation
• HAVING: Filters groups after aggregation (used with GROUP BY)
34. Explain different types of JOINs with examples.
• INNER JOIN: Returns matching rows
• LEFT JOIN: All rows from left, matching from right
• RIGHT JOIN: All rows from right, matching from left
• FULL JOIN: All rows from both, matching where possible
• CROSS JOIN: Cartesian product
• SELF JOIN: Table joined with itself
35. What is the difference between INNER JOIN and OUTER JOIN?
• INNER JOIN: Only matching rows
• OUTER JOIN: Includes unmatched rows from one or both tables
(LEFT, RIGHT, FULL)
36. What is a CROSS JOIN?
A CROSS JOIN returns the Cartesian product of two tables (all
possible combinations).
37. What is a SELF JOIN?
A SELF JOIN joins a table with itself, useful for hierarchical or
recursive relationships.
fi
fi
38. Explain UNION vs UNION ALL.
• UNION: Combines results, removes duplicates
• UNION ALL: Combines all results, keeps duplicates
39. What are aggregate functions? Give examples.
Functions that operate on sets of values: SUM(), COUNT(), AVG(),
MAX(), MIN()
40. What is GROUP BY clause and when to use it?
GROUP BY groups rows sharing a value in speci ed columns, used
with aggregate functions.
41. Explain ORDER BY clause.
ORDER BY sorts the result set by one or more columns, ascending
or descending.
42. What are subqueries? Types of subqueries.
A subquery is a query within another query. Types:
• Scalar (returns single value)
• Row (returns single row)
• Table (returns multiple rows/columns)
• Correlated (references outer query)
43. What is a correlated subquery?
A subquery that references columns from the outer query and is
evaluated once per row.
44. Difference between EXISTS and IN operators.
• IN: Checks if a value exists in a list or subquery result
fi
• EXISTS: Checks if subquery returns any rows (more ef cient for
correlated subqueries)
45. What is the difference between COUNT(*) and COUNT(column)?
• COUNT(*): Counts all rows
• COUNT(column): Counts non-null values in the column
46. What is DISTINCT keyword in SQL?
DISTINCT removes duplicate rows from the result set.
47. Explain CASE statement in SQL.
CASE is used for conditional logic in queries, similar to if-else.
48. What are window functions in SQL?
Functions that perform calculations across a set of rows related to the
current row, e.g., ROW_NUMBER(), RANK(), SUM() OVER(...)
49. What is the difference between RANK() and DENSE_RANK()?
• RANK(): Leaves gaps in ranking if there are ties
• DENSE_RANK(): No gaps, consecutive ranks
50. What is ROW_NUMBER() function?
Assigns a unique sequential integer to rows within a partition.
51. Explain PIVOT and UNPIVOT operations.
• PIVOT: Rotates rows into columns
• UNPIVOT: Rotates columns into rows
52. What are Common Table Expressions (CTEs)?
CTEs are temporary result sets de ned within a query using WITH,
useful for recursion and modular queries.
fi
fi
53. What is the difference between CHAR and VARCHAR?
• CHAR: Fixed-length, padded with spaces
• VARCHAR: Variable-length, saves space
54. What are NULL values and how to handle them?
NULL represents missing or unknown data. Handle with IS NULL,
COALESCE, or default values.
55. What is COALESCE function?
Returns the rst non-null value in a list of expressions.
56. Explain NULLIF function.
Returns NULL if two expressions are equal; otherwise returns the rst
expression.
57. What are stored procedures?
Stored procedures are precompiled collections of SQL statements
stored in the database, can accept parameters and return results.
58. What are functions in SQL? Types of functions.
Functions are routines that return a value. Types:
• Scalar functions (return single value)
• Table-valued functions (return table)
59. Difference between stored procedures and functions.
• Procedures: Can perform actions, return multiple values, use DML/
DDL
• Functions: Return a value, can be used in SELECT, limited side
effects
fi
fi
60. What are triggers? Types of triggers.
Triggers are automatic actions executed in response to events
(INSERT, UPDATE, DELETE). Types:
• BEFORE/AFTER triggers
• Row-level or statement-level
61. What is a cursor in SQL?
A cursor is a pointer to a result set, allowing row-by-row processing.
62. Explain query execution plan.
A query execution plan shows how the database will execute a query,
including join order, indexes used, and access paths.
63. What is query optimization?
The process of choosing the most ef cient way to execute a query,
using statistics, indexes, and cost estimation.
64. What are hints in SQL?
Hints are directives to the query optimizer to in uence execution
plans (e.g., force index usage).
65. What is the cost-based optimizer?
An optimizer that estimates the cost of different execution plans and
chooses the lowest-cost plan.
66. What is indexing strategy?
A plan for creating and maintaining indexes to optimize query
performance, considering query patterns and data distribution.
67. How to optimize a slow-running query?
fi
fl
• Analyze execution plan
• Add or tune indexes
• Rewrite query for ef ciency
• Reduce data scanned ( lter early)
• Avoid unnecessary columns/joins
68. What is SQL injection and how to prevent it?
SQL injection is a security vulnerability where attackers inject
malicious SQL. Prevent by using prepared statements, parameterized
queries, and input validation.
69. What are prepared statements?
Precompiled SQL statements with placeholders for parameters,
improving performance and security.
70. Explain transaction isolation levels.
Isolation levels de ne how transactions interact:
• Read Uncommitted
• Read Committed
• Repeatable Read
• Serializable
Higher levels provide more consistency but less concurrency.
ER Model and Database Design (71-100)
fi
fi
fi
71. What is an Entity-Relationship (ER) model?
A conceptual data model that represents entities, their attributes, and
relationships among them, used for database design.
72. What is an entity, attribute, and relationship?
• Entity: Real-world object (e.g., Student)
• Attribute: Property of an entity (e.g., Name)
• Relationship: Association between entities (e.g., Student enrolls in
Course)
73. What are strong and weak entities?
• Strong entity: Exists independently, has its own primary key
• Weak entity: Depends on a strong entity, identi ed by partial key plus
strong entity's key
74. What is a weak relationship?
A relationship that connects a weak entity to its owner (strong entity),
often with total participation.
75. Explain different types of attributes.
• Simple (atomic)
• Composite (can be divided)
• Derived (calculated)
• Multivalued (multiple values)
• Key attribute (uniquely identi es)
76. What is cardinality in ER model?
Cardinality speci es the number of instances of one entity related to
another (one-to-one, one-to-many, many-to-many).
fi
fi
fi
77. What are the different cardinality ratios?
• One-to-one (1:1)
• One-to-many (1:N)
• Many-to-one (N:1)
• Many-to-many (M:N)
78. What is participation constraint?
Speci es whether all or only some entity instances participate in a
relationship (total or partial participation).
79. What is total and partial participation?
• Total: Every entity instance must participate
• Partial: Some may not participate
80. What is specialization and generalization?
• Specialization: Dividing an entity into sub-entities
• Generalization: Combining similar entities into a super-entity
81. What is aggregation in ER model?
Treating a relationship as a higher-level entity to participate in other
relationships.
82. How to convert ER diagram to relational schema?
• Entities → tables
• Attributes → columns
• Relationships → foreign keys or separate tables (for M:N)
83. What are the rules for converting entities to tables?
• Strong entity: Table with primary key
fi
• Weak entity: Table with foreign key to owner and partial key
• Multivalued attribute: Separate table
84. How to handle many-to-many relationships?
Create a junction table with foreign keys referencing both entities.
85. What is ISA hierarchy?
Represents inheritance (superclass-subclass) relationships in ER
diagrams.
86. What are disjoint and overlapping constraints?
• Disjoint: Entity belongs to only one subclass
• Overlapping: Entity can belong to multiple subclasses
87. How to represent multivalued attributes in relational model?
Create a separate table with a foreign key to the main entity and the
multivalued attribute.
88. What is relationship degree?
Number of entities involved in a relationship (unary, binary, ternary,
etc.).
89. Explain ternary relationship with example.
A relationship involving three entities, e.g., Doctor prescribes
Medicine to Patient.
90. What is recursive relationship?
An entity is related to itself, e.g., Employee manages another
Employee.
91. How to identify entities and relationships in a system?
Analyze requirements, look for nouns (entities) and verbs
(relationships).
92. What is conceptual, logical, and physical database design?
• Conceptual: High-level, ER diagrams
• Logical: Mapping to relational schema
• Physical: Implementation details (indexes, storage)
93. What are business rules in database design?
Constraints and policies that de ne or restrict data and relationships,
derived from organizational requirements.
94. How to validate an ER diagram?
Check for completeness, correctness, normalization, and adherence
to business rules.
95. What is forward and reverse engineering in database design?
• Forward: From model to database
• Reverse: From database to model
96. What are database design patterns?
Reusable solutions to common design problems, e.g., star schema,
snow ake schema, inheritance mapping.
97. How to handle temporal data in database design?
Use timestamp columns, history tables, or temporal extensions to
track changes over time.
98. What is star schema and snow ake schema?
fl
fi
fl
• Star: Central fact table with denormalized dimension tables
• Snow ake: Fact table with normalized (multi-level) dimension tables
99. What is fact table and dimension table?
• Fact table: Stores measurable events (sales, transactions)
• Dimension table: Stores descriptive attributes (date, product,
customer)
[Link] OLTP vs OLAP.
• OLTP (Online Transaction Processing): Handles day-to-day
transactions, normalized, fast writes
• OLAP (Online Analytical Processing): Supports analytical queries,
denormalized, fast reads
Normalization and Functional Dependencies (101-130)
[Link] is functional dependency?
A relationship where one set of attributes uniquely determines
another set. Denoted as X → Y.
[Link] are trivial and non-trivial functional dependencies?
• Trivial: Y is a subset of X (e.g., {A, B} → A)
• Non-trivial: Y is not a subset of X
[Link] is closure of a set of attributes?
The set of all attributes functionally determined by a given set under a
set of FDs.
fl
[Link] is a candidate key? How to nd it?
A minimal set of attributes that uniquely identi es tuples. Find by
computing closures and checking minimality.
[Link] is the minimal cover of functional dependencies?
A minimal set of FDs equivalent to the original, with single attributes
on the right and no redundancy.
[Link] is First Normal Form (1NF)?
All attributes have atomic (indivisible) values; no repeating groups.
[Link] is Second Normal Form (2NF)?
1NF + no partial dependency (non-prime attribute depends on full
candidate key).
[Link] is Third Normal Form (3NF)?
2NF + no transitive dependency (non-prime attribute depends only on
candidate key).
[Link] is Boyce-Codd Normal Form (BCNF)?
For every non-trivial FD X → Y, X is a superkey.
[Link] is Fourth Normal Form (4NF)?
BCNF + no non-trivial multi-valued dependencies.
111. What is Fifth Normal Form (5NF)?
4NF + every join dependency is implied by candidate keys.
[Link] is partial dependency?
A non-prime attribute depends on part of a composite candidate key.
fi
fi
[Link] is transitive dependency?
A non-prime attribute depends on another non-prime attribute, which
depends on a candidate key.
[Link] is multivalued dependency?
An attribute is independent of others, given a key (X →→ Y).
[Link] is join dependency?
A relation can be reconstructed by joining multiple projections.
[Link] is lossless decomposition?
Decomposition where the original relation can be reconstructed
without loss of information.
[Link] is dependency preserving decomposition?
All functional dependencies can be enforced without joining
decomposed tables.
[Link] is a decomposition considered good?
If it is lossless and dependency preserving.
[Link] are the advantages and disadvantages of normalization?
• Advantages: Reduces redundancy, improves integrity
• Disadvantages: More joins, possible performance impact
[Link] should you denormalize a database?
When read performance is critical and redundancy is acceptable,
e.g., in reporting or OLAP systems.
[Link] is the difference between 3NF and BCNF?
BCNF is stricter; every determinant must be a superkey, while 3NF
allows non-superkey determinants if the dependent is a prime
attribute.
[Link] a relation be in BCNF but not in 4NF?
Yes, if it has multi-valued dependencies but no BCNF violations.
[Link] is Armstrong's axioms?
A set of inference rules (re exivity, augmentation, transitivity) for
deriving all functional dependencies.
[Link] to check if a decomposition is lossless?
Use the chase algorithm or check if a common attribute is a key in at
least one decomposed relation.
[Link] is the canonical cover algorithm?
An algorithm to nd the minimal cover (canonical cover) for a set of
FDs.
[Link] are update anomalies?
Problems like inconsistent data after updates due to redundancy.
[Link] are insertion anomalies?
Inability to insert data due to missing other required data.
[Link] are deletion anomalies?
Unintended loss of data when deleting other data.
fi
fl
[Link] normalization helps in reducing data redundancy?
By organizing data into related tables and eliminating duplicate data.
[Link] is the process of normalization?
Stepwise decomposition of tables into higher normal forms based on
functional dependencies.
Transaction Management and Concurrency (131-160)
[Link] is a transaction in DBMS?
A transaction is a sequence of operations performed as a single
logical unit of work, which must be atomic, consistent, isolated, and
durable.
[Link] are ACID properties?
• Atomicity: All or nothing
• Consistency: Preserves database rules
• Isolation: Transactions do not interfere
• Durability: Changes survive failures
[Link] atomicity with an example.
Transferring money between accounts: both debit and credit must
succeed or both fail.
[Link] is consistency in transactions?
Ensures that a transaction brings the database from one valid state to
another, maintaining integrity constraints.
[Link] is isolation in DBMS?
Ensures that concurrent transactions do not affect each other's
execution.
[Link] is durability in transactions?
Once a transaction commits, its changes are permanent, even after a
crash.
[Link] are the different transaction states?
Active, partially committed, committed, failed, aborted.
[Link] is a transaction log?
A record of all changes made during transactions, used for recovery.
[Link] is commit and rollback?
• Commit: Save all changes
• Rollback: Undo changes since last commit
[Link] are savepoints in transactions?
Markers within a transaction to which you can roll back partially.
[Link] is concurrency control?
Techniques to ensure correct results when multiple transactions
execute simultaneously.
[Link] are the problems of concurrent transactions?
Lost update, dirty read, unrepeatable read, phantom read.
[Link] is a lost update problem?
Two transactions overwrite each other's changes, resulting in lost
data.
[Link] is dirty read problem?
A transaction reads uncommitted changes from another transaction.
[Link] is unrepeatable read problem?
A transaction reads the same data twice and gets different results
due to another transaction's update.
[Link] is phantom read problem?
A transaction re-executes a query and nds new rows inserted by
another transaction.
[Link] are the different isolation levels?
Read Uncommitted, Read Committed, Repeatable Read,
Serializable.
[Link] Read Uncommitted isolation level.
Transactions can read uncommitted changes from others (allows dirty
reads).
[Link] Read Committed isolation level.
Transactions can only read committed data (prevents dirty reads).
[Link] Repeatable Read isolation level.
Ensures that if a transaction reads a row twice, it gets the same value
(prevents unrepeatable reads).
fi
[Link] Serializable isolation level.
Highest isolation; transactions execute as if serially, preventing all
anomalies.
[Link] is serializability?
A schedule is serializable if its outcome is equivalent to some serial
execution.
[Link] is con ict serializability?
A schedule can be transformed into a serial schedule by swapping
non-con icting operations.
[Link] is view serializability?
A schedule is view-serializable if it is view-equivalent to a serial
schedule (same reads/writes).
[Link] is a precedence graph?
A directed graph showing dependencies between transactions; cycles
indicate non-serializability.
[Link] is two-phase locking (2PL)?
A protocol where transactions acquire all locks before releasing any,
ensuring serializability.
[Link] is strict two-phase locking?
All exclusive locks are held until the transaction commits or aborts,
preventing cascading rollbacks.
[Link] is rigorous two-phase locking?
All locks (shared and exclusive) are held until commit/abort.
fl
fl
[Link] is timestamp ordering protocol?
Each transaction gets a timestamp; operations are ordered based on
timestamps to ensure serializability.
[Link] is optimistic concurrency control?
Assumes con icts are rare; transactions execute without locking,
validated before commit.
Locking and Deadlocks (161-180)
[Link] are locks in DBMS?
Mechanisms to control concurrent access to data, ensuring
consistency.
[Link] are shared and exclusive locks?
• Shared (S) lock: Allows read, multiple can hold
• Exclusive (X) lock: Allows write, only one can hold
[Link] is lock granularity?
The size of data item locked (row, page, table); ner granularity
allows more concurrency.
[Link] is deadlock in DBMS?
A situation where two or more transactions wait inde nitely for each
other to release locks.
[Link] are the conditions for deadlock?
• Mutual exclusion
fl
fi
fi
• Hold and wait
• No preemption
• Circular wait
[Link] to detect deadlock?
Build a wait-for graph; cycles indicate deadlock.
[Link] is wait-for graph?
A directed graph representing which transactions are waiting for locks
held by others.
[Link] are deadlock prevention techniques?
• Wait-die and wound-wait schemes
• Resource ordering
• Timeout
[Link] is wait-die scheme?
Older transactions wait for younger; younger abort if waiting for older.
[Link] is wound-wait scheme?
Older transactions preempt (abort) younger; younger wait for older.
[Link] is deadlock avoidance?
System checks for potential deadlocks before granting locks, e.g.,
using wait-for graphs or resource allocation graphs.
[Link] is banker's algorithm in DBMS context?
An algorithm to avoid deadlock by ensuring safe resource allocation.
[Link] is deadlock recovery?
Detecting deadlock and resolving it by aborting one or more
transactions.
[Link] is victim selection in deadlock recovery?
Choosing which transaction(s) to abort to break the deadlock, based
on cost, priority, or minimal rollback.
[Link] is lock escalation?
Upgrading from ner-grained locks (row) to coarser-grained locks
(table) to reduce overhead.
[Link] is intent locking?
Locks indicating a transaction's intention to acquire ner-grained
locks, used in multi-granularity locking.
[Link] is multiple granularity locking?
Locking at different levels (table, page, row) to balance concurrency
and overhead.
[Link] is phantom locking?
Locks to prevent phantom reads, typically by locking ranges or
predicates.
[Link] is predicate locking?
Locks based on query predicates (conditions), not just speci c rows.
[Link] are the performance implications of locking?
Locks can reduce concurrency and throughput, causing contention
and blocking, but are necessary for consistency.
fi
fi
fi
Database Recovery and Backup (181-200)
[Link] is database recovery?
Restoring the database to a consistent state after a failure.
[Link] are the different types of failures in DBMS?
• Transaction failure
• System crash
• Disk failure
[Link] is a transaction failure?
A transaction cannot complete due to logical or system errors.
[Link] is system crash?
Failure of hardware or software causing loss of volatile memory.
[Link] is disk failure?
Physical damage or corruption of storage media.
[Link] is write-ahead logging (WAL)?
Log changes before applying them to the database, ensuring
recoverability.
[Link] is a checkpoint in database recovery?
A point where all changes are written to disk, reducing recovery time.
[Link] is REDO operation?
Reapplying committed changes from the log to the database after a
crash.
[Link] is UNDO operation?
Reverting uncommitted changes using the log.
[Link] is the ARIES recovery algorithm?
A recovery method using WAL, checkpoints, and three phases:
analysis, redo, undo.
[Link] is immediate update recovery?
Changes are written to the database before commit, requiring both
undo and redo during recovery.
[Link] is deferred update recovery?
Changes are written only after commit, requiring only redo during
recovery.
[Link] is shadow paging?
Maintains two copies (current and shadow) of pages; switch pointer
on commit.
[Link] is database backup?
Copying database data to a separate storage for recovery purposes.
[Link] are different types of backup?
• Full backup
• Incremental backup
• Differential backup
[Link] is full backup?
A complete copy of the entire database.
[Link] is incremental backup?
Backs up only changes since the last backup.
[Link] is differential backup?
Backs up changes since the last full backup.
[Link] is hot backup vs cold backup?
• Hot backup: Taken while the database is running
• Cold backup: Taken when the database is of ine
[Link] is point-in-time recovery?
Restoring the database to a speci c moment using backups and logs.
Advanced Topics and NoSQL (201-220)
[Link] is distributed database?
A database distributed across multiple physical locations, connected
via a network.
[Link] is CAP theorem?
States that a distributed system can guarantee only two of
Consistency, Availability, and Partition Tolerance at any time.
[Link] is eventual consistency?
A consistency model where updates propagate to all nodes
eventually, so all replicas converge over time.
[Link] is BASE vs ACID?
• ACID: Strong consistency, transactional guarantees
fi
fl
• BASE: Basically Available, Soft state, Eventual consistency (used in
NoSQL)
[Link] are NoSQL databases?
Non-relational databases designed for scalability, exibility, and high
performance, often schema-less.
[Link] are the types of NoSQL databases?
• Document stores (MongoDB)
• Key-value stores (Redis)
• Column stores (Cassandra)
• Graph databases (Neo4j)
[Link] is MongoDB and when to use it?
A document-oriented NoSQL database, ideal for exible, semi-
structured data and rapid development.
[Link] is Cassandra and its use cases?
A distributed, column-family NoSQL database, suitable for high write
throughput, big data, and time-series applications.
[Link] is Redis and its applications?
An in-memory key-value store, used for caching, session
management, real-time analytics.
[Link] is Neo4j and graph databases?
Neo4j is a graph database optimized for storing and querying highly
connected data, such as social networks.
fl
fl
[Link] is data warehouse?
A centralized repository for integrated, historical data used for
analysis and reporting.
[Link] is ETL process?
Extract, Transform, Load: process of moving data from sources to a
data warehouse.
[Link] is big data and its characteristics?
Large, complex datasets characterized by Volume, Velocity, Variety,
Veracity, and Value.
[Link] is Hadoop ecosystem?
A suite of open-source tools for distributed storage and processing of
big data (HDFS, MapReduce, Hive, Pig, etc.).
[Link] is Apache Spark?
A fast, in-memory data processing engine for big data analytics.
[Link] is data lake vs data warehouse?
• Data lake: Stores raw, unstructured data
• Data warehouse: Stores structured, processed data
[Link] is database as a Service (DBaaS)?
Cloud-based database management, where the provider handles
maintenance, scaling, and backups.
[Link] is cloud database?
A database that runs on cloud infrastructure, offering scalability and
managed services.
[Link] is database migration?
Moving data from one database to another, often involving schema
and data transformation.
[Link] are modern database trends?
Cloud-native databases, multi-model databases, NewSQL, serverless
databases, AI/ML integration.
Performance and Optimization (221-240)
[Link] factors affect database performance?
• Hardware resources
• Query design
• Indexing
• Schema design
• Concurrency
• Network latency
[Link] to identify performance bottlenecks?
Monitor slow queries, resource usage, locking, and wait events using
pro ling tools.
[Link] is database tuning?
Adjusting con guration, queries, indexes, and schema to improve
performance.
[Link] are database performance metrics?
• Query response time
fi
fi
• Throughput
• CPU/memory/disk usage
• Cache hit ratio
• Lock contention
[Link] to optimize database schema for performance?
Normalize for integrity, denormalize for reads, use appropriate data
types, partition large tables.
[Link] is index tuning?
Creating, dropping, or modifying indexes to optimize query
performance.
[Link] should you rebuild indexes?
When indexes become fragmented due to frequent updates/deletes,
causing slow queries.
[Link] is statistics in query optimization?
Metadata about data distribution, used by the optimizer to choose
ef cient plans.
[Link] is cost-based vs rule-based optimization?
• Cost-based: Uses statistics to estimate costs
• Rule-based: Uses prede ned rules
[Link] to analyze query execution plans?
Use EXPLAIN or similar tools to see join order, index usage, and
estimated costs.
fi
fi
[Link] is database caching?
Storing frequently accessed data in memory to reduce disk I/O and
improve speed.
[Link] is connection pooling?
Reusing database connections to reduce overhead of establishing
new connections.
[Link] is database load balancing?
Distributing queries across multiple servers to improve performance
and availability.
[Link] is read replica?
A copy of the database used for read-only queries, of oading work
from the primary.
[Link] to handle large tables?
Partitioning, indexing, archiving old data, and optimizing queries.
[Link] is table partitioning strategies?
• Range partitioning
• List partitioning
• Hash partitioning
• Composite partitioning
[Link] is database archiving?
Moving old or infrequently accessed data to separate storage to
improve performance.
fl
[Link] is data compression in databases?
Reducing storage space and I/O by compressing data, often at the
cost of CPU.
[Link] are best practices for database design?
• Normalize appropriately
• Use proper data types
• Index wisely
• Enforce constraints
• Document schema
[Link] to monitor database performance?
Use monitoring tools to track metrics, set alerts, and analyze logs for
proactive tuning.
ALL THE BEST