Database Systems Question Bank Guide
Database Systems Question Bank Guide
In relational databases, a primary key uniquely identifies each row within a table, ensuring entity integrity by preventing duplicate entries . A foreign key establishes relationships between tables by referencing a primary key in another table, enforcing referential integrity by ensuring that relationships between tables are consistent . For example, in a university database, the StudentID as a primary key in the Student table identifies each student uniquely. A Foreign Key in an Enrolment table referencing StudentID ensures courses are only associated with existent students, maintaining relational consistency .
The three-schema architecture in DBMS consists of the internal level (physical storage), conceptual level (logical structure), and external level (user views). The internal level deals with data storage formats, providing efficiency in storage and retrieval operations . The conceptual level offers a unified view of the entire database structure, independent of physical storage specifics, supporting logical data independence . Finally, the external level provides user-specific views of the database, ensuring external data independence by separating user interactions from logical structures. For example, in an online banking application, users only see a view of their accounts and transactions, while the conceptual and internal details remain abstracted .
Indexing improves data retrieval efficiency by creating an ordered data structure that provides quick lookup capabilities, such as B-trees or hash indexes, allowing fast search access by patient ID or name . Hashing, on the other hand, provides direct access by computing a hash function on key fields, retrieving records in constant time, which is optimal for unique identifier retrieval. However, indexing generally excels when queries are complex and involve range searches, whereas hashing may result in more collisions, especially when the data volume is unpredictable . In a hospital system, while hashing could speed up accesses by patient IDs, indexing might be preferable for more complex queries involving multiple attributes such as the date of visit .
Lock-based concurrency control involves acquiring locks on data to coordinate access, offering solutions like two-phase locking to prevent deadlock by ensuring specific order of operations . While effective, it may lead to performance bottlenecks due to locking overhead and potential deadlocks. Timestamp-based protocols assign timestamps to transactions to determine operation order, thus ensuring fewer delays as no locks are held . While timestamp methods can reduce blockages for read transactions, they may lead to higher roll-back rates if conflicting writes emerge. Each method's suitability depends on the system needs regarding throughput versus response time .
NoSQL databases support Big Data analytics by providing scalable, flexible data models capable of handling large volumes of diverse data types unstructured or semi-structured . Column-family stores like HBase allow storage of vast datasets with sparse, unpredictable schemas, ideal for real-time analytics in finance where transaction data grows rapidly. Document databases like MongoDB are advantageous in e-commerce by handling diverse product descriptions efficiently. Wide-scale data ingestion and high write/read throughput of NoSQL suits Internet of Things (IoT) scenarios where sensor data must be processed continuously and efficiently .
Violating ACID properties—Atomicity, Consistency, Isolation, Durability—leads to transaction failures, inconsistent database states, data anomalies, and potential data loss . Serializability, an Isolation facet, assures that transaction outcomes are consistent with transactions running sequentially, crucial for data accuracy during concurrent executions. Ignoring it leads to unpredictable transaction results, risking business logic failures in systems like financial applications where order of transactions matters .
The hierarchical data model is organized in a tree-like structure with a single root, making it suitable for systems where relationships are one-to-many . The network model allows more complex many-to-many relationships through a graph structure, providing more flexibility than the hierarchical model but can be complex to manage . The relational model employs tables (relations) to represent data and relationships, offering simplicity and powerful query capabilities via SQL, making it the most suitable for modern applications due to its flexibility, simplicity, and support for ACID properties .
Traditional file systems generally focus on basic storage and retrieval operations without support for advanced querying, concurrency control or data integrity enforcement . DBMS, however, offer structured storage with query capabilities, support for concurrent access, robust data integrity through constraints, and improved security features. These characteristics allow DBMS to handle large volumes of data efficiently with ACID compliance, ensuring data is reliable and accessible even in multi-user environments .
The CAP theorem states that a distributed system can offer only two of three properties at any time: Consistency, Availability, and Partition Tolerance . NoSQL systems must balance these properties based on specific use cases. For instance, a system prioritizing Consistency and Partition Tolerance may temporarily restrict Availability, seen in banking systems ensuring transactional integrity. Conversely, systems like Cassandra prioritize Availability and Partition Tolerance, tolerating temporary Consistency issues to maintain service during network splits, suitable for real-time analytic applications . Developers face trade-offs in deciding which configuration aligns with their application needs given network reliability and user interaction scope .
Normalization is the process of structuring a relational database to reduce redundancy and improve data integrity. BCNF, a form beyond Third Normal Form (3NF), addresses anomalies not handled by 3NF by ensuring every determinant is a candidate key . This process involves identifying functional dependencies and ensuring attributes are only dependent on candidate keys . By applying BCNF, the database is protected from delete, update, and insert anomalies, essential for maintaining consistent and reliable data structures. For instance, in a university database, ensuring student courses are determined only through valid course IDs and student IDs avoids data duplication across records .