0% found this document useful (0 votes)
10 views18 pages

DBMS SQL Interview Questions

The document provides a comprehensive guide on DBMS and SQL interview questions, covering topics from database fundamentals to advanced SQL concepts. It includes essential concepts such as normalization, SQL commands, joins, indexes, and transaction properties, along with detailed explanations and examples. This resource is designed for candidates ranging from freshers to senior engineers preparing for technical interviews.

Uploaded by

pruthviraj.g
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views18 pages

DBMS SQL Interview Questions

The document provides a comprehensive guide on DBMS and SQL interview questions, covering topics from database fundamentals to advanced SQL concepts. It includes essential concepts such as normalization, SQL commands, joins, indexes, and transaction properties, along with detailed explanations and examples. This resource is designed for candidates ranging from freshers to senior engineers preparing for technical interviews.

Uploaded by

pruthviraj.g
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DBMS & SQL

Interview Questions
50 Must-Know Questions with Diagrams & Deep Explanations
From Basics to Advanced • Freshers to Senior Engineers

Topics Covered:
Database Fundamentals • ER Model • Normalization • SQL Queries • Joins • Indexes • Transactions & ACID
• Concurrency Control • Query Optimization • Stored Procedures • NoSQL vs SQL • Storage Engines
■ Section 1: Database Fundamentals
Q1. What is a Database Management System (DBMS)? How is it different from a file
system?
A DBMS is software that manages, stores, retrieves, and manipulates structured data in a database.
Unlike a file system, a DBMS provides:
■ Data Abstraction – hide physical storage details from users
■ Data Independence – change schema without affecting applications
■ Concurrent Access – multiple users can read/write safely
■ ACID Transactions – guaranteed correctness even on failure
■ Security & Access Control – fine-grained permissions
■ Query Language (SQL) – declarative data retrieval
■ Popular DBMS: MySQL, PostgreSQL, Oracle, SQL Server, SQLite, MongoDB (NoSQL).

Q2. What are the three levels of data abstraction in a DBMS?


The ANSI-SPARC architecture defines three abstraction levels:
■ Physical Level (Internal) – how data is stored on disk (files, indexes, blocks)
■ Logical Level (Conceptual) – what data is stored (tables, relationships, constraints)
■ View Level (External) – what specific users see (customised views per user role)
■ Data independence: Logical independence = change conceptual schema without changing views. Physical
independence = change physical storage without changing conceptual schema.

Q3. What is a Relational Database? State Codd's 12 rules.


A relational database stores data in tables (relations) with rows (tuples) and columns (attributes), based
on the relational model proposed by E.F. Codd.
■ Rule 0: Foundation – Must use relational capabilities to manage the database
■ Rule 1: Information – All data stored as values in table cells
■ Rule 2: Guaranteed Access – Every value accessible via table+key+column
■ Rule 6: View Updating – Theoretically updatable views must be updatable
■ Rule 9: Logical Data Independence – Changes at logical level don't affect apps
■ Rule 12: Non-Subversion – No low-level bypass of relational rules
Q4. What is an ER diagram? Explain its components.
An Entity-Relationship (ER) diagram is a conceptual model that visually represents the structure of a
database — entities, their attributes, and relationships between them.
■ Entity – Real-world object (STUDENT, COURSE). Represented as rectangle
■ Attribute – Property of an entity (Name, ID). Represented as ellipse
■ Primary Key attribute – Underlined attribute that uniquely identifies entity
■ Relationship – Association between entities (ENROLLS). Represented as diamond
■ Cardinality – 1:1, 1:N, M:N (shown on relationship lines)
■ Weak Entity – Depends on another entity for identification (dashed rectangle)

Q5. What are the different types of keys in a database?


Keys are attributes or combinations that uniquely identify rows or establish relationships:
■ Super Key – Any set of attributes that uniquely identifies a tuple
■ Candidate Key – Minimal super key (no redundant attributes)
■ Primary Key – Chosen candidate key; uniquely identifies every row
■ Foreign Key – Attribute in one table referencing PK of another (enforces referential integrity)
■ Composite Key – Primary key made of two or more attributes
■ Surrogate Key – System-generated artificial key (auto-increment ID)
■ Alternate Key – Candidate keys not chosen as primary key
■ A table can have multiple candidate keys but only ONE primary key.

■ Section 2: Normalization
Q6. What is Normalization? Why is it important?
Normalization is the process of organizing a database to reduce data redundancy and improve data
integrity, by decomposing tables into smaller, well-structured ones.
■ Eliminates insertion, update, and deletion anomalies
■ Reduces data redundancy → saves storage
■ Improves data consistency across the database
■ Makes queries and updates more efficient
■ Denormalization is sometimes done intentionally for read-heavy workloads (e.g., data warehouses) to improve
query performance at the cost of some redundancy.

Q7. Explain 1NF, 2NF, and 3NF with examples.


Normal forms are progressive levels of normalization:
■ 1NF (First Normal Form): Eliminate repeating groups; each column must have atomic (indivisible)
values; each row must be unique
■ 2NF (Second Normal Form): Must be in 1NF + no partial dependencies (every non-key attribute must
depend on the WHOLE primary key, not just part of it)
■ 3NF (Third Normal Form): Must be in 2NF + no transitive dependencies (non-key attributes must NOT
depend on other non-key attributes)
■ Example: OrderID→CustomerID→CustomerName is transitive; move CustomerName to a separate
Customer table

Q8. What is BCNF (Boyce-Codd Normal Form)?


BCNF is a stronger version of 3NF. For every functional dependency X → Y, X must be a superkey. It
eliminates anomalies that 3NF may not fully address.
■ Every relation in BCNF is also in 3NF, but not vice versa
■ BCNF violation: non-superkey determines an attribute
■ Decompose the violating table, preserving functional dependencies if possible
■ Trade-off: BCNF decomposition may not preserve all dependencies
■ Most real-world databases are normalized to 3NF or BCNF. 4NF and 5NF deal with multi-valued and join
dependencies.

Q9. What are Functional Dependencies?


A functional dependency X → Y means the value of X uniquely determines the value of Y in a relation. It
is the foundation of normalization.
■ Trivial FD: Y ⊆ X (e.g., {A,B} → A) — always holds
■ Non-trivial FD: Y ⊄ X — carries new information
■ Full FD: Y depends on all of X, not a proper subset
■ Partial FD: Y depends on only part of a composite key
■ Transitive FD: X → Z via X → Y → Z where Y is not a key
Armstrong's Axioms: 1. Reflexivity: Y ⊆ X ■ X → Y 2. Augmentation: X → Y ■ XZ → YZ
3. Transitivity: X→Y, Y→Z ■ X → Z

■ Section 3: SQL Fundamentals

Q10. What are the different categories of SQL commands?


SQL commands are grouped into five categories:
■ DDL (Data Definition Language): CREATE, ALTER, DROP, TRUNCATE, RENAME
■ DML (Data Manipulation Language): SELECT, INSERT, UPDATE, DELETE
■ DCL (Data Control Language): GRANT, REVOKE
■ TCL (Transaction Control Language): COMMIT, ROLLBACK, SAVEPOINT
■ DQL (Data Query Language): SELECT (sometimes separated from DML)
■ DDL commands are auto-committed (cannot be rolled back in most DBMS). DML changes can be rolled back
within a transaction.

Q11. Explain the difference between WHERE and HAVING.


Both filter rows, but they operate at different stages of query execution:
■ WHERE filters individual rows BEFORE grouping (cannot use aggregate functions)
■ HAVING filters groups AFTER GROUP BY (can use aggregate functions)
■ WHERE is evaluated at step 2 in SQL execution order; HAVING at step 4
SELECT dept, AVG(salary) AS avg_sal FROM employees WHERE status = 'active' -- filters
rows first GROUP BY dept HAVING AVG(salary) > 50000; -- filters groups after
Q12. What is the difference between DELETE, TRUNCATE, and DROP?
All three remove data but in different ways and with different consequences:
■ DELETE: DML; removes specific rows (WHERE clause); logs each row; can be rolled back; triggers fire
■ TRUNCATE: DDL; removes ALL rows; minimal logging; cannot be rolled back; faster than DELETE;
resets auto-increment
■ DROP: DDL; removes entire table structure AND data; cannot be rolled back; all
indexes/triggers/constraints removed
■ TRUNCATE cannot be used with WHERE clause. DROP removes the table itself — you cannot SELECT from it
after.

Q13. What are aggregate functions? Name the common ones.


Aggregate functions perform calculations on a set of rows and return a single result. They are typically
used with GROUP BY.
■ COUNT(*) – number of rows; COUNT(col) – non-NULL values in column
■ SUM(col) – total of numeric column values
■ AVG(col) – arithmetic mean of numeric column
■ MAX(col) / MIN(col) – largest/smallest value in column
■ GROUP_CONCAT / STRING_AGG – concatenate values in a group
■ STDDEV(), VARIANCE() – statistical functions
SELECT dept, COUNT(*) AS emp_count, AVG(salary) AS avg_salary, MAX(salary) AS
top_salary FROM employees GROUP BY dept ORDER BY avg_salary DESC;

Q14. What is a subquery? Explain correlated vs non-correlated subqueries.


A subquery is a query nested inside another query. It can appear in SELECT, FROM, WHERE, or
HAVING clauses.
■ Non-correlated subquery: Executes once, independently of the outer query
■ Correlated subquery: References columns from the outer query; executes once per outer row (slower)
■ EXISTS operator: Often used instead of correlated subquery for performance
■ Scalar subquery: Returns exactly one value; used in SELECT or WHERE
-- Non-correlated SELECT * FROM emp WHERE salary > (SELECT AVG(salary) FROM emp); --
Correlated (for each emp, check dept avg) SELECT * FROM emp e1 WHERE salary > (SELECT
AVG(salary) FROM emp e2 WHERE [Link] = [Link]);

■ Section 4: SQL Joins


Q15. Explain all types of SQL JOINs with examples.
SQL JOINs combine rows from two or more tables based on a related column:
■ INNER JOIN: Returns rows that have matching values in BOTH tables
■ LEFT (OUTER) JOIN: Returns ALL rows from left table + matching from right (NULL for no match)
■ RIGHT (OUTER) JOIN: Returns ALL rows from right table + matching from left
■ FULL OUTER JOIN: Returns ALL rows from both tables (NULL where no match)
■ CROSS JOIN: Returns the Cartesian product (every combination of rows)
■ SELF JOIN: Table joined with itself (useful for hierarchical data like org charts)
-- Find employees and their managers (Self Join) SELECT [Link] AS employee, [Link] AS
manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.emp_id;

Q16. What is the difference between UNION and UNION ALL?


Both combine results of two SELECT statements, but differ in how they handle duplicates:
■ UNION: Removes duplicate rows from the combined result set (slower, needs sorting)
■ UNION ALL: Keeps all rows including duplicates (faster, no deduplication)
■ Both require same number of columns and compatible data types
■ Use UNION ALL when you know data is distinct or duplicates are acceptable
■ UNION does an implicit DISTINCT on the combined results. For large datasets, UNION ALL is significantly
faster.

Q17. What is a NATURAL JOIN? When should you avoid it?


A NATURAL JOIN automatically joins tables on all columns with the same name. It can be dangerous
because:
■ Implicit column matching makes queries fragile — adding a column can break the join
■ Hard to read/understand without knowing the schema
■ Behaviour changes if column names change
■ Prefer explicit JOIN ... ON condition for clarity and safety
-- Avoid this: SELECT * FROM orders NATURAL JOIN customers; -- Prefer explicit: SELECT
* FROM orders o JOIN customers c ON o.customer_id = c.customer_id;
■ Section 5: Indexes & Query Optimization

Q18. What is an index? How does it improve query performance?


An index is a data structure (typically a B-Tree or Hash) that allows the database engine to find rows
quickly without scanning the entire table.
■ Without index: Full table scan O(n) for every query
■ With B-Tree index: O(log n) lookup time
■ Indexes speed up SELECT, WHERE, JOIN, and ORDER BY operations
■ Indexes slow down INSERT, UPDATE, DELETE (must update the index too)
■ A table with too many indexes has slower writes and larger storage footprint
■ Rule of thumb: Index columns used frequently in WHERE, JOIN ON, and ORDER BY. Avoid indexing columns
with low cardinality (e.g., a boolean column).

Q19. What is the difference between a clustered and non-clustered index?


The key difference lies in whether the actual table data is sorted/stored by the index:
■ Clustered Index: Data rows are physically stored in sorted order of the index key. Only ONE per table.
In InnoDB, the primary key is always a clustered index.
■ Non-Clustered Index: A separate structure with index key + pointer (row ID) to actual data. Multiple
allowed per table.
■ Clustered is faster for range queries; non-clustered requires an extra pointer lookup (key lookup)
-- Create a non-clustered index CREATE INDEX idx_lastname ON employees(last_name); --
Composite index (order matters!) CREATE INDEX idx_dept_sal ON employees(dept_id,
salary);

Q20. What is a covering index?


A covering index is one that contains ALL columns needed by a query, so the database can answer the
query entirely from the index without reading the actual table (avoids 'key lookup').
■ Eliminates need to access the base table — significantly faster
■ Include frequently-queried non-key columns in index definition
■ Use EXPLAIN/EXPLAIN ANALYZE to check if 'Index Only Scan' is used
-- Index covers both filter and select columns CREATE INDEX idx_cover ON
orders(customer_id, status, order_date); -- This query is answered from index alone:
SELECT order_date FROM orders WHERE customer_id = 5 AND status = 'shipped';

Q21. What is the EXPLAIN plan? How do you optimize a slow query?
EXPLAIN (or EXPLAIN ANALYZE) shows the query execution plan — how the database will retrieve
data. Use it to identify bottlenecks.
■ Look for: Full Table Scan (type=ALL in MySQL) → add an index
■ Check: rows column — large numbers suggest poor selectivity
■ Avoid: SELECT * — fetch only required columns
■ Avoid: Functions on indexed columns in WHERE (prevents index use)
■ Use: proper JOINs instead of subqueries when possible
■ Consider: query hints, partitioning, or denormalization for complex cases
EXPLAIN SELECT * FROM orders WHERE customer_id = 100 AND status = 'open'; -- Look for
'Using index', 'ref', avoid 'ALL'

Q22. What are window functions? How are they different from GROUP BY?
Window functions perform calculations across a set of related rows without collapsing them into a single
row (unlike GROUP BY).
■ ROW_NUMBER() – unique sequential number per partition
■ RANK() / DENSE_RANK() – rank with/without gaps for ties
■ LAG(col, n) / LEAD(col, n) – access previous/next row values
■ SUM(col) OVER (...) – running total without GROUP BY
■ PARTITION BY – divides rows into groups for window calculation
-- Running total and rank by dept SELECT name, dept, salary, RANK() OVER (PARTITION BY
dept ORDER BY salary DESC) AS rnk, SUM(salary) OVER (PARTITION BY dept) AS dept_total
FROM employees;

■ Section 6: Transactions & ACID Properties


Q23. What is a database transaction? What is ACID?
A transaction is a logical unit of work consisting of one or more SQL operations that must be executed
atomically. ACID guarantees correctness of transactions:
■ Atomicity – All operations succeed or all are rolled back
■ Consistency – DB transitions from one valid state to another
■ Isolation – Concurrent transactions don't interfere with each other
■ Durability – Committed data persists even after system crash
BEGIN TRANSACTION; UPDATE accounts SET balance = balance - 500 WHERE id = 1; UPDATE
accounts SET balance = balance + 500 WHERE id = 2; COMMIT; -- or ROLLBACK on error

Q24. Explain the four transaction isolation levels and the problems they prevent.
SQL defines four isolation levels (from least to most strict) that control how concurrent transactions
interact:
■ READ UNCOMMITTED: Lowest isolation; allows dirty reads, non-repeatable reads, phantoms
■ READ COMMITTED: Prevents dirty reads; allows non-repeatable reads and phantoms (default in many
DBMS)
■ REPEATABLE READ: Prevents dirty + non-repeatable reads; allows phantom reads (default in MySQL
InnoDB)
■ SERIALIZABLE: Highest isolation; prevents all anomalies; transactions behave as if serial; slowest
■ Higher isolation = fewer anomalies but lower concurrency and throughput. Choose based on your application's
correctness requirements.

Q25. What are the concurrency problems (anomalies) in databases?


When multiple transactions run concurrently without proper isolation, these anomalies can occur:
■ Dirty Read: T2 reads uncommitted data from T1; if T1 rolls back, T2 has invalid data
■ Non-Repeatable Read: T1 reads the same row twice and gets different values (T2 updated it in
between)
■ Phantom Read: T1 re-executes a query and gets new rows that T2 inserted
■ Lost Update: Two transactions read-then-write the same data; one update overwrites the other
■ Dirty Write: T2 overwrites uncommitted data written by T1

Q26. What is MVCC (Multi-Version Concurrency Control)?


MVCC maintains multiple versions of data to allow readers and writers to work concurrently without
blocking each other.
■ Each write creates a new version of the row with a transaction ID timestamp
■ Readers see a consistent snapshot of the database (not blocked by writers)
■ Writers create new versions (not overwriting old ones)
■ Garbage collection removes old versions when no transaction needs them
■ Used by: PostgreSQL, InnoDB (MySQL), Oracle, CockroachDB
■ MVCC is why SELECT queries in InnoDB never block on INSERT/UPDATE/DELETE — they read from a
snapshot.

Q27. What are SAVEPOINT and ROLLBACK TO SAVEPOINT?


SAVEPOINTs mark intermediate points within a transaction, allowing partial rollbacks without aborting
the entire transaction.
■ SAVEPOINT sp1 – creates a named checkpoint
■ ROLLBACK TO sp1 – undoes all operations after sp1 (keeps earlier changes)
■ RELEASE SAVEPOINT sp1 – removes the savepoint (commits work done after it)
■ Useful for complex transactions where certain operations may fail optionally
BEGIN; INSERT INTO orders VALUES (1, 'pending'); SAVEPOINT after_insert; UPDATE
inventory SET qty = qty - 10 WHERE id = 5; -- if update fails: ROLLBACK TO
after_insert; -- order saved, inventory unchanged COMMIT;
■ Section 7: Advanced SQL & Stored Procedures
Q28. What are stored procedures, functions, and triggers?
These are server-side SQL objects that encapsulate reusable logic:
■ Stored Procedure: Named block of SQL+procedural code; called explicitly; may return multiple result
sets; can have IN/OUT/INOUT parameters
■ Function (UDF): Returns a single value; can be called in SQL expressions; no side effects (in standard
SQL)
■ Trigger: Auto-executes on INSERT/UPDATE/DELETE events on a table; used for auditing, enforcing
business rules, cascading changes
-- Stored Procedure CREATE PROCEDURE GetEmpByDept(IN dept_id INT) BEGIN SELECT * FROM
employees WHERE department_id = dept_id; END; CALL GetEmpByDept(10);

Q29. What is a View? What are materialized views?


A view is a virtual table based on a SELECT query. It doesn't store data itself (unless materialized).
■ Regular View: Executes the underlying query on every access; always up-to-date; no storage overhead
■ Materialized View: Stores the result of the query on disk; must be refreshed periodically; faster for
complex aggregations
■ Benefits: Simplify complex queries, enforce security (hide sensitive columns), data abstraction
■ Updatable Views: Some simple views allow INSERT/UPDATE/DELETE
CREATE VIEW active_employees AS SELECT id, name, dept, salary FROM employees WHERE
status = 'active'; -- Use like a table: SELECT * FROM active_employees WHERE dept =
'Sales';

Q30. What are CTEs (Common Table Expressions)? How are they different from
subqueries?
A CTE (WITH clause) is a named temporary result set that exists within the scope of a single SQL
statement.
■ CTEs improve readability and allow recursive queries
■ CTEs can be referenced multiple times in the same query (unlike subqueries)
■ Recursive CTE: References itself; used for hierarchical/tree data (org charts, BOM)
■ Non-recursive CTEs are often optimized similarly to subqueries by the planner
-- Recursive CTE: find all employees in hierarchy WITH RECURSIVE emp_tree AS ( SELECT
id, name, manager_id FROM employees WHERE id = 1 -- anchor UNION ALL SELECT [Link],
[Link], e.manager_id FROM employees e JOIN emp_tree t ON e.manager_id = [Link] --
recursive ) SELECT * FROM emp_tree;

Q31. What is partitioning in databases?


Partitioning divides a large table into smaller, more manageable pieces (partitions) while appearing as a
single table to queries.
■ Range Partitioning: Rows in date/value ranges (e.g., by year)
■ List Partitioning: Rows by discrete values (e.g., by country)
■ Hash Partitioning: Rows distributed by hash function (even distribution)
■ Composite: Combination of the above
■ Benefits: Pruning (scan only relevant partitions), faster DELETE of old data, parallel scans
CREATE TABLE sales ( id INT, amount DECIMAL, sale_date DATE ) PARTITION BY RANGE
(YEAR(sale_date)) ( PARTITION p2022 VALUES LESS THAN (2023), PARTITION p2023 VALUES
LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025) );

Q32. What are RANK(), DENSE_RANK(), and ROW_NUMBER()? How do they differ?
All three assign numbers to rows within a result set, but handle ties differently:
■ ROW_NUMBER(): Always unique sequential number (1,2,3,4,5) — no ties
■ RANK(): Same rank for ties, then skips numbers (1,2,2,4,5) — gap after tie
■ DENSE_RANK(): Same rank for ties, no gap (1,2,2,3,4) — no skipping
SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, RANK() OVER
(ORDER BY salary DESC) AS rnk, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;

■ Section 8: Relational Algebra & Database Design

Q33. What is Relational Algebra? Why is it important?


Relational algebra is a formal procedural query language that provides the theoretical foundation for
SQL. It defines operations on relations (tables) that return new relations.
■ Select (σ): Filter rows by a condition
■ Project (π): Select specific columns
■ Union (∪), Intersection (∩), Difference (−): Set operations
■ Cartesian Product (×): All combinations of rows
■ Join (■): Combine related tuples from two relations
■ Rename (ρ): Rename a relation or its attributes
■ Every SQL query can be expressed as a relational algebra expression. Query optimisers convert SQL →
relational algebra → optimised execution plan.

Q34. What is a schema? Explain the three schema architecture.


A schema is the logical design/structure of a database. The ANSI three-schema architecture separates
user views from physical storage:
■ External Schema: Multiple user views — each user/role sees their relevant subset
■ Conceptual Schema: Complete logical structure — all tables, constraints, relationships
■ Internal Schema: Physical storage details — file organisation, indexing, access paths
■ Separation ensures data independence: changing how data is physically stored doesn't break applications.

■■ Section 9: Storage Engines, NoSQL & Performance

Q35. What is the difference between InnoDB and MyISAM in MySQL?


InnoDB and MyISAM are two major MySQL storage engines with very different capabilities:
■ InnoDB: Full ACID transactions, row-level locking, foreign keys, MVCC, crash recovery via WAL
■ MyISAM: No transactions, table-level locking, faster for read-heavy workloads, simpler
■ InnoDB is the default since MySQL 5.5 and is recommended for almost all use cases
■ MyISAM: useful for read-heavy, non-critical reporting tables with full-text search needs
■ Never use MyISAM for tables that require data integrity, concurrent writes, or crash safety.

Q36. What is the difference between SQL (relational) and NoSQL databases?
SQL and NoSQL represent different paradigms for data storage:
■ SQL: Structured tables, fixed schema, ACID transactions, powerful JOINs (MySQL, PostgreSQL,
Oracle)
■ NoSQL: Flexible schema, horizontal scaling, eventual consistency, different data models
■ Document Store: JSON-like documents (MongoDB) — good for unstructured/variable data
■ Key-Value Store: Simple key-value pairs (Redis) — caching, sessions
■ Column Family: Wide columns (Cassandra, HBase) — time-series, high write throughput
■ Graph DB: Nodes and edges (Neo4j) — social networks, recommendation engines
■ CAP Theorem: Consistency, Availability, Partition Tolerance — a distributed system can guarantee at most 2 of
3.

Q37. What is database sharding? How is it different from replication?


Sharding and replication are both database scaling strategies but serve different purposes:
■ Sharding (Horizontal Partitioning): Data split across multiple database servers; each shard holds a
subset of rows; improves write scalability
■ Replication: Exact copies of data on multiple servers; improves read scalability and fault tolerance
■ Sharding key choice is critical — poor choice leads to hotspots
■ Common strategies: Range sharding, Hash sharding, Directory-based sharding
■ Most large-scale systems use BOTH: sharding for scale + replication for high availability per shard.

Q38. What is query caching? What are its benefits and drawbacks?
Query caching stores the result of a query so identical future queries can be served from memory.
■ Benefits: Eliminates repeated computation for identical queries; reduces DB load
■ Drawbacks: Cache invalidation is hard — any write to the underlying table must invalidate the cache
■ MySQL's built-in query cache was deprecated in 8.0 due to overhead
■ Better approach: Application-level caching with Redis/Memcached
■ Cache is most effective for read-heavy workloads with rarely-changing data and frequently-repeated queries.

Q39. What is a deadlock in a database? How does the DBMS detect and resolve it?
A database deadlock occurs when two or more transactions each hold locks and wait for each other to
release them — creating a circular wait.
■ Example: T1 holds lock on row A, wants row B. T2 holds lock on row B, wants row A.
■ Detection: DBMS builds a wait-for graph; if a cycle is found → deadlock exists
■ Resolution: DBMS picks a victim transaction (usually the cheapest to abort) and rolls it back
■ Prevention: Always acquire locks in the same order; use short transactions; use SELECT ... FOR
UPDATE carefully
■ InnoDB automatically detects deadlocks and rolls back the transaction with the smallest undo log — making it
the cheaper victim.

Q40. What are database constraints? Explain all types.


Constraints enforce data integrity rules at the schema level:
■ NOT NULL: Column cannot have NULL values
■ UNIQUE: All values in the column must be distinct (NULLs are allowed and considered distinct in most
DBMS)
■ PRIMARY KEY: NOT NULL + UNIQUE; identifies each row
■ FOREIGN KEY: References PK of another table; enforces referential integrity; can cascade
DELETE/UPDATE
■ CHECK: Custom condition that each row must satisfy
■ DEFAULT: Provides a default value if none is supplied on INSERT
CREATE TABLE employees ( id INT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(100) UNIQUE
NOT NULL, salary DECIMAL CHECK (salary > 0), dept_id INT REFERENCES departments(id) ON
DELETE SET NULL, hire_date DATE DEFAULT (CURRENT_DATE) );

Q41. What is an execution plan? What does 'Full Table Scan' mean?
An execution plan (query plan) is the strategy the query optimizer chooses to execute a SQL query.
Understanding it is key to performance tuning.
■ Full Table Scan: Every row in the table is read — O(n), very slow on large tables
■ Index Scan: The index B-tree is traversed — O(log n) + row fetches
■ Index Only Scan: All data served from index without touching the table
■ Nested Loop Join: For each row in outer table, search inner table — good for small datasets
■ Hash Join: Build hash table from one side, probe with other — good for large tables
■ Merge Join: Sort-merge of two sorted inputs — good for sorted or indexed data
-- MySQL EXPLAIN ANALYZE SELECT [Link], d.dept_name FROM employees e JOIN departments
d ON e.dept_id = [Link] WHERE [Link] > 60000;

■ Section 10: Tricky SQL & Scenario-Based Questions


Q42. How do you find the Nth highest salary in SQL?
A classic interview question with multiple approaches:
-- Using LIMIT/OFFSET (MySQL) SELECT DISTINCT salary FROM employees ORDER BY salary
DESC LIMIT 1 OFFSET N-1; -- Using Dense Rank (works in all DBMS) SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk FROM employees ) ranked
WHERE rnk = N;

Q43. How do you find duplicate records in a table?


Use GROUP BY and HAVING COUNT(*) > 1 to identify duplicates:
-- Find duplicate emails SELECT email, COUNT(*) AS cnt FROM employees GROUP BY email
HAVING COUNT(*) > 1; -- Delete duplicates, keep one row DELETE FROM employees WHERE id
NOT IN ( SELECT MIN(id) FROM employees GROUP BY email );

Q44. What is the difference between IN, EXISTS, and ANY/ALL?


These operators test conditions against a set of values, but differ in implementation and performance:
■ IN: Checks if a value is in a static list or subquery result set — subquery evaluated once
■ EXISTS: Returns TRUE if subquery returns at least one row — stops as soon as match found (faster for
correlated checks)
■ ANY: TRUE if any comparison in the subquery is true (=ANY is equivalent to IN)
■ ALL: TRUE if ALL comparisons in the subquery are true
-- EXISTS vs IN (EXISTS often faster for large subqueries) SELECT * FROM employees e
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.emp_id = [Link]); -- ALL: employees earning
more than all managers SELECT name FROM employees WHERE salary > ALL (SELECT salary
FROM managers);

Q45. What is a CROSS APPLY / LATERAL JOIN?


LATERAL JOIN (standard SQL) or CROSS APPLY (SQL Server) allows the right-side subquery to
reference columns from the left side — essentially a correlated FROM clause.
■ More powerful than a regular join: can call table-valued functions per row
■ Can return different numbers of rows for each left row
■ Useful for: top-N per group, unnesting arrays, calling functions
-- Top 3 orders per customer (PostgreSQL) SELECT [Link], o.order_id, [Link] FROM
customers c CROSS JOIN LATERAL ( SELECT order_id, amount FROM orders WHERE customer_id
= [Link] ORDER BY amount DESC LIMIT 3 ) o;

Q46. How do you pivot rows to columns in SQL?


Pivoting transforms unique values from rows into columns — useful for reporting.
-- Manual pivot using CASE WHEN SELECT dept, SUM(CASE WHEN year = 2022 THEN sales ELSE
0 END) AS sales_2022, SUM(CASE WHEN year = 2023 THEN sales ELSE 0 END) AS sales_2023,
SUM(CASE WHEN year = 2024 THEN sales ELSE 0 END) AS sales_2024 FROM sales_data GROUP
BY dept; -- SQL Server has built-in PIVOT operator

■ PostgreSQL has crosstab() in the tablefunc extension. Oracle has PIVOT/UNPIVOT syntax.

Q47. What is the difference between OLTP and OLAP?


OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are designed for very
different workloads:
■ OLTP: Short, fast transactions (inserts/updates/selects), high concurrency, normalized schema (3NF),
row-oriented storage
■ OLAP: Complex analytical queries on large datasets, few concurrent users, denormalized schema
(star/snowflake), column-oriented storage
■ OLTP examples: Banking, e-commerce, ERP (MySQL, PostgreSQL, Oracle)
■ OLAP examples: Business intelligence, reporting, data warehousing (Redshift, BigQuery, Snowflake,
Clickhouse)
■ A Data Warehouse is an OLAP system that aggregates data from multiple OLTP sources for analysis.
Quick Reference Card
SQL Concept Syntax / Key Point
Primary Key UNIQUE + NOT NULL; one per table

Foreign Key References PK of parent table

1NF Atomic values, no repeating groups

2NF 1NF + no partial dependencies

3NF 2NF + no transitive dependencies

BCNF Every determinant is a superkey

INNER JOIN Only matching rows from both tables

LEFT JOIN All from left + matching from right

UNION Combines & removes duplicates

UNION ALL Combines, keeps duplicates (faster)

WHERE vs HAVING WHERE = before GROUP, HAVING = after

Clustered Index Data stored in index order; 1 per table

B-Tree Index O(log n) lookup; default index type

ACID Atomicity, Consistency, Isolation, Durability

Dirty Read Fixed by READ COMMITTED isolation

Phantom Read Fixed by SERIALIZABLE isolation

MVCC Readers don't block writers (InnoDB)

Nth Highest Salary DENSE_RANK() OVER (ORDER BY salary DESC)

Deadlock Wait-for cycle; resolved by aborting victim txn

Sharding Horizontal data split across servers

Replication Data copies for HA and read scaling

OLTP vs OLAP Transactions vs. Analytics workloads

CAP Theorem Choose 2 of: Consistency, Availability, Partition Tolerance

Master the concepts, practise writing queries, and you'll ace the interview! ■

You might also like