Understanding Database Management Systems
Understanding Database Management Systems
A DBMS is a system that allows users to create, modify and query databases while ensuring
data integrity, security and efficient data access.
Unlike traditional file systems, DBMS minimizes data redundancy, prevents inconsistencies
and simplifies data management with features like concurrent access and backup
mechanisms.
It organizes data into tables, views, schemas and reports, providing a structured approach to
data management.
Example: A university database can store and manage student information, faculty records
and administrative data, allowing seamless retrieval, insertion and deletion of information as
required.
Key Features of DBMS
1. Data Modeling: Tools to create and modify data models, defining the structure and
relationships within the database.
2. Data Storage and Retrieval: Efficient mechanisms for storing data and executing queries
to retrieve it quickly.
3. Concurrency Control: Ensures multiple users can access the database simultaneously
without conflicts.
4. Data Integrity and Security: Enforces rules to maintain accurate and secure data,
including access controls and encryption.
5. Backup and Recovery: Protects data with regular backups and enables recovery in case of
system failures.
Types of DBMS
There are several types of Database Management Systems (DBMS), each tailored to different
data structures, scalability requirements and application needs. The most common types are
as follows:
Database Languages
Database languages are specialized sets of commands and instructions used to define,
manipulate and control data within a database. Each language type plays a distinct role
in database management, ensuring efficient storage, retrieval and security of data. The
primary database languages include:
Database Languages
1. Data Definition Language (DDL)
DDL is the short name for Data Definition Language, which deals with database schemas and
descriptions, of how the data should reside in the database.
CREATE: to create a database and its objects like (table, index, views, store procedure,
function and triggers)
ALTER: alters the structure of the existing database
DROP: delete objects from the database
TRUNCATE: remove all records from a table, including all spaces allocated for the records
are removed
COMMENT: add comments to the data dictionary
RENAME: rename an object
2. Data Manipulation Language (DML)
DML focuses on manipulating the data stored in the database, enabling users to retrieve, add,
update and delete data.
SELECT: retrieve data from a database
INSERT: insert data into a table
UPDATE: updates existing data within a table
DELETE: Delete all records from a database table
MERGE: UPSERT operation (insert or update)
CALL: call a PL/SQL or Java subprogram
EXPLAIN PLAN: interpretation of the data access path
LOCK TABLE: concurrency Control
3. Data Control Language (DCL)
DCL commands manage access permissions, ensuring data security by controlling who can
perform certain actions on the database.
GRANT: Provides specific privileges to a user (e.g., SELECT, INSERT).
REVOKE: Removes previously granted permissions from a user.
4. Transaction Control Language (TCL)
TCL commands oversee transactional data to maintain consistency, reliability and atomicity.
ROLLBACK: Undoes changes made during a transaction.
COMMIT: Saves all changes made during a transaction.
SAVEPOINT: Sets a point within a transaction to which one can later roll back.
5. Data Query Language (DQL)
DQL is a subset of DML, specifically focused on data retrieval.
SELECT: The primary DQL command, used to query data from the database without altering
its structure or contents.
Paradigm Shift from File System to DBMS
Before the advent of modern DBMS, data was managed using basic file systems on hard
drives. While this approach allowed users to store, retrieve and update files as needed, it
came with numerous challenges.
A typical example can be seen in a file-based university management system, where data was
stored in separate sections such as Departments, Academics, Results, Accounts and Hostels.
Certain information like student names and phone numbers was repeated across multiple
files, leading to the following issues:
1. Redundancy of data: When the same data exists in multiple places, any update must be
manually repeated everywhere. For instance, if a student changes their phone number, it must
be updated across all sections. Failure to do so leads to unnecessary duplication and wasted
storage.
2. Inconsistency of Data: Data is said to be inconsistent if multiple copies of the same data do
not match each other. If the Phone number is different in Accounts Section and Academics
Section, it will be inconsistent. Inconsistency may be because of typing errors or not updating
all copies of the same data.
3. Complex Data Access: A user should know the exact location of the file to access data, so
the process is very cumbersome and tedious. If the user wants to search the student hostel
allotment number of a student from 10000 unsorted students’ records, how difficult it can be.
4. Lack of Security: File systems provided limited control over who could access certain data.
A student who gained access to a file with grades might easily alter it without proper
authorization, compromising data integrity.
5. No Concurrent Access: File systems were not designed for multiple users working at the
same time. If one user was editing a file, others had to wait, which hindered collaboration and
slowed down workflows.
6. No Backup and Recovery: File systems lacked built-in mechanisms for creating backups or
recovering data after a loss. If a file was accidentally deleted or corrupted, there was no easy
way to restore it, potentially causing permanent data loss.
Applications of DBMS
Application of DBMS
Enterprise Information: Sales, accounting, human resources, Manufacturing, online retailers.
Banking and Finance Sector: Banks maintaining the customer details, accounts, loans,
banking transactions, credit card transactions. Finance: Storing the information about sales
and holdings, purchasing of financial stocks and bonds.
University: Maintaining the information about student course enrolled information, student
grades, staff roles.
Airlines: Reservations and schedules.
Telecommunications: Prepaid, postpaid bills maintenance.
Advantages of DBMS
Data organization: A DBMS allows for the organization and storage of data in a structured
manner, making it easy to retrieve and query the data as needed.
Data integrity: A DBMS provides mechanisms for enforcing data integrity constraints, such
as constraints on the values of data and access controls that restrict who can access the data.
Concurrent access: A DBMS provides mechanisms for controlling concurrent access to
the database, to ensure that multiple users can access the data without conflicting with each
other.
Data security: A DBMS provides tools for managing the security of the data, such as
controlling access to the data and encrypting sensitive data.
Backup and recovery: A DBMS provides mechanisms for backing up and recovering the data
in the event of a system failure.
Data sharing: A DBMS allows multiple users to access and share the same data, which can be
useful in a collaborative work environment.
Disadvantages of DBMS
Complexity: DBMS can be complex to set up and maintain, requiring specialized knowledge
and skills.
Performance overhead: The use of a DBMS can add overhead to the performance of an
application, especially in cases where high levels of concurrency are required.
Scalability: The use of a DBMS can limit the scalability of an application, since it requires
the use of locking and other synchronization mechanisms to ensure data consistency.
Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high, especially for
large or complex systems.
Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions don't need
high reliability, consistency or security and may be better served by other types of data
storage.
File System
The file system is basically a way of arranging the files in a storage medium like a hard disk.
The file system organizes the files and helps in the retrieval of files when they are required.
File systems consist of different files which are grouped into directories. The directories
further contain other folders and files. The file system performs basic operations like
management, file naming, giving access rules, etc.
Example: NTFS(New Technology File System) , EXT(Extended File System).
File System
DBMS ( Database Management System)
Database Management System is basically software that manages the collection of related
data. It is used for storing data and retrieving the data effectively when it is needed. It also
provides proper security measures for protecting the data from unauthorized access. In
Database Management System the data can be fetched by SQL queries and relational
algebra. It also provides mechanisms for data recovery and data backup.
Example:
Oracle, MySQL, MS SQL server.
DBMS
Difference Between File System and DBMS
Structure The file system is a way of arranging DBMS is software for managing
the files in a storage medium within the database.
a computer.
Backup and It doesn't provide Inbuilt mechanism It provides in house tools for
Recovery for backup and recovery of data if it backup and recovery of data even
is lost. if it is lost.
User Access Only one user can access data at a Multiple users can access data at a
time. time.
Meaning The users are not required to write The user has to write procedures
procedures. for managing databases
Sharing Data is distributed in many files. So, Due to centralized nature data
it is not easy to share data. sharing is easy
Data It give details of storage and It hides the internal details
Abstraction representation of data of Database
External level
It is also called view level. The reason this level is called “view” is because several users can
view their desired data from this level which is internally fetched from database with the help
of conceptual and internal level mapping. The user doesn’t need to know the database schema
details such as data structure, table definition etc. user is only concerned about data which is
what returned back to the view level after it has been fetched from database (present at the
internal level). External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level It is also called logical level. The whole design of the database such as
relationship among data, schema of data etc. are described in this level. Database constraints
and security are also implemented in this level of architecture. This level is maintained by
DBA (database administrator)
3. . Internal level This level is also known as physical level. This level describes how the
data is actually stored in the storage devices. This level is also responsible for allocating
space to the data. This is the lowest level of the architecture.
Caching
Caching is used to implement a high-speed system with a large number of users. A cache is a
high-speed data storage that stores data temporarily to serve future requests faster.
Database caching is like a helper for your primary database (DB). It is a mechanism that
stores frequently accessed data in temporary memory. Whenever the application requests the
data again, that can quickly get it from this helper, instead of from the main database. Cache
helps to reduce the database workloads. So it increases system speed by reducing the need to
fetch data from DB.
Why is Database Caching Strategy Important?
Improved System Performance: In general, the frequently used data is stored in the
cache. So the applications can retrieve that data more quickly from the cache instead
of the primary DB. It improves the system performance and reduces latency.
Increase Data Availability: Caching increases data availability if the primary DB goes
down, caching acts as a backup source. Depending on where the cache is stored, it can
still serve data to the application even if the primary DB is unavailable.
Cost Savings: The caching mechanism reduces DB server load. Caching can help to
build the infrastructure with lower costs compared to DB servers. It allows
organizations to achieve the same level of performance with fewer resources.
Improved User Experience: Using the Caching technique responses are returned
faster. It leads to a better overall user experience, particularly for applications that
require real-time or near-real-time access to data.
What is ResultSet Caching?
ResultSet Caching is a Dedicated SQL Pools feature which caches a dataset created using a
SELECT query, for example a query that aggregates data, or returns all the data from a table
(e.g. in a Power BI table load). If the query satisfies certain criteria (listed below) then it gets
added to the Dedicated SQL Pools cache. Then when a request comes in that matches the
query definition then the result can be served from the cache. This is really beneficial for a
couple of reasons:
It speeds up the retrieval of the result as it can just return from the cache
It also does not use up any resource allocation % for the Dedicated SQL Pool.
Criteria
There are a few criteria a query must meet before it can be added to the ResultSet Cache:
ResultSet Caching must be enabled on the Dedicated SQL Pool database
No data changes in the table
No schema changes in the table
Queries using user-defined functions are non-deterministic functions like GetDate()
Dedicated SQL Pool tables with Row-Level Security
Queries which return a dataset larger than 10GB
Queries with row sizes larger that 64K…that is roughly 16 VARCHAR(4000)
columns
Controls the result set caching behavior for the current client session.
Applies to Azure Synapse Analytics
Use SQL to turn on or turn off statement caching for an individual database session when
statement caching is enabled for the database server. The following statement shows how to
use SQL to turn on caching for the current database session:
The following statement shows how to use SQL to turn off caching for the current database
session:
Example:
Imagine a website displaying product prices. Instead of querying the database for each price
request, the application can cache frequently accessed product prices in a cache. When a user
requests a product price, the application first checks the cache. If the price is cached, it's
returned immediately. Otherwise, the price is fetched from the database, stored in the cache,
and then returned.
Considerations:
Cache invalidation: When the data in the database changes, the cache needs to be
updated or invalidated to ensure data consistency.
Cache size: The cache should be appropriately sized to hold frequently accessed data
without causing performance issues due to cache misses or eviction.
Cache eviction policies: Algorithms like Least Recently Used (LRU) or Least Frequently
Used (LFU) can be used to manage the cache when it reaches its capacity.
Introduction of Shadow Paging
Shadow paging is a fundamental recovery technique used in database management systems
(DBMS) to ensure the reliability and consistency of data. It plays a crucial role in maintaining
atomicity and durability which are the two core properties of transaction management
by maintaining two versions of the database state: the shadow page table and the current page
table. This technique is also known as Cut-of-Place updating.
This technique ensures that a database can recover seamlessly from failures without losing
data integrity. During a transaction, updates are made to a new version of the database pages
tracked by the current page table, while the shadow page table preserves the pre-transaction
state. This dual-table approach allows for efficient crash recovery and simplifies the commit
and rollback processes.
Page Table : A page table is a data structure that maps logical pages (a logical division of
data) to physical pages (actual storage on disk).
Each entry in the page table corresponds to a physical page location on the disk.
The database uses the page table to retrieve or modify data.
How Shadow Paging Works ?
Shadow paging is a recovery technique that views the database as a collection of fixed-sized
logical storage units, known as pages, which are mapped to physical storage blocks using a
structure called the page table. The page table enables the system to efficiently locate and
manage database pages.
Here’s how shadow paging works in detail:
Start of Transaction:
The shadow page table is created by copying the current page table.
The shadow page table represents the original, unmodified state of the database.
This table is saved to disk and remains unchanged throughout the transaction.
P1 Address_1 Address_1
P2 Address_2 Address_2
P3 Address_3 Address_3
Transaction Execution:
Updates are made to the database by creating new pages.
The current page table reflects these changes, while the shadow page table remains
unchanged.
Page Modification:
If a logical page (e.g. P2) needs to be updated:
A new version of the page (P2’) is created in memory and written to a new physical storage
block.
The current page table entry for P2 is updated to point to P2’.
The shadow page table still points to the original page P2, ensuring it is unaffected by the
changes.
Logical Page Shadow Page Table Current Page
(Disk) Table
P1 Address_1 Address_1
P3 Address_3 Address_3
Commit:
If the transaction is successful, the shadow page table is replaced by the current page table.
This replacement makes the changes permanent.
P1 Address_1 Address_1
P3 Address_3 Address_3
Abort:
If the transaction is aborted, the current page table is discarded, leaving the shadow page
table intact.
Since the shadow page table still points to the original pages, no changes are reflected in the
database.
Consider the below example:
To understand concept, consider above figure. In this 2 write operations are performed on
page 3 and 5. Before start of write operation on page 3, current page table points to old page
3. When write operation starts following steps are performed :
Firstly, search start for available free block in disk blocks.
After finding free block, it copies page 3 to free block which is represented by Page 3 (New).
Now current page table points to Page 3 (New) on disk but shadow page table points to old
page 3 because it is not modified.
The changes are now propagated to Page 3 (New) which is pointed by current page table.
COMMIT Operation : To commit transaction following steps should be done :
All the modifications which are done by transaction which are present in buffers are
transferred to physical database.
Log based Recovery in DBMS
Log-based recovery in DBMS ensures data can be maintained or restored in the event of a
system failure. The DBMS records every transaction on stable storage, allowing for easy data
recovery when a failure occurs. For each operation performed on the database, a log file is
created. Transactions are logged and verified before being applied to the database, ensuring
data integrity.
A log is a sequence of records that document the operations performed during database
transactions. Logs are stored in a log file for each transaction, providing a mechanism to
recover data in the event of a failure. For every operation executed on the database, a
corresponding log record is created. It is critical to store these logs before the actual
transaction operations are applied to the database, ensuring data integrity and consistency
during recovery processes.
For example, consider a transaction to modify a student's city. This transaction generates the
following logs:
Start Log: When the transaction begins, a log is created to indicate the start of the transaction.
Format:<Tn, Start>
Here, Tn represents the transaction identifier.
Example: <T1, Start> indicates that Transaction 1 has started.
Operation Log: When the city is updated, a log is recorded to capture the old and new values
of the operation.
Format:<Tn, Attribute, Old_Value, New_Value>
Example: <T1, City, 'Gorakhpur', 'Noida'> shows that in Transaction 1, the value of
the City attribute has changed from 'Gorakhpur' to 'Noida'.
Commit Log: Once the transaction is successfully completed, a final log is created to indicate
that the transaction has been completed and the changes are now permanent.
Format:<Tn, Commit>
Example: <T1, Commit> signifies that Transaction 1 has been successfully completed.
These logs play a crucial role in ensuring that the database can recover to a consistent state
after a system crash. If a failure occurs, the DBMS can use these logs to either roll back
incomplete transactions or redo committed transactions to maintain data consistency.
Key Operations in Log-Based Recovery
Undo Operation
The undo operation reverses the changes made by an uncommitted transaction, restoring the
database to its previous state.
Example of Undo:
Consider a transaction T1 that updates a bank account balance but fails before committing:
Initial State:
Account balance = 500.
Transaction T1:
Update balance to 600.
Log entry:
<T1, Balance, 500, 600>
Failure:
T1 fails before committing.
Undo Process:
Use the old value from the log to revert the change.
Set balance back to 500.
Final log entry after undo:
<T1, Abort>
Redo Operation
The redo operation re-applies the changes made by a committed transaction to ensure
consistency in the database.
Example of Redo:
Consider a transaction T2 that updates an account balance but the database crashes before
changes are permanently reflected:
Initial State:
Account balance = 300.
Transaction T2:
Update balance to 400.
Log entries:
<T2, Start><T2, Balance, 300, 400><T2, Commit>
Crash:
Changes are not reflected in the database.
Redo Process:
Use the new value from the log to reapply the committed change.
Set balance to 400.
Undo-Redo Example:
Assume two transactions:
T1: Failed transaction (requires undo).
T2: Committed transaction (requires redo).
Log File:
<T1, Start><T1, Balance, 500, 600><T2, Start><T2, Balance, 300, 400><T2, Commit><T1,
Abort>
Recovery Steps:
Identify Committed and Uncommitted Transactions:
T1: Not committed → Undo.
T2: Committed → Redo.
Undo T1:
Revert balance from 600 to 500.
Redo T2:
Reapply balance change from 300 to 400.
Introduction of ER Model
The Entity-Relationship Model (ER Model) is a conceptual model for designing a databases.
This model represents the logical structure of a database, including entities, their attributes
and relationships between them.
Entity: An objects that is stored as data such as Student, Course or Company.
Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
Relationship: A connection between entities such as "a Student enrolls in a Course".
Entity Set
We can represent the entity sets in an ER Diagram but we can't represent individual entities
because an entity is like a row in a table, and an ER diagram shows the structure and
relationships of data, not specific data entries (like rows and columns). An ER diagram is a
visual representation of the data model, not the actual data itself.
Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema for
its identification. It has a primary key that ensures its uniqueness and is represented by a
rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a
strong entity to be identified. A weak entity is associated with an identifying entity (strong
entity), which helps in its identification. A weak entity are represented by a double rectangle.
The participation of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
Example:
A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents can't exist without the employee. So dependent will be a Weak
Entity Type and Employee will be identifying entity type for dependent, which means it is
Strong Entity Type.
Strong Entity and Weak Entity
Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student entity
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define entity type
Student. In ER diagram, the attribute is represented by an oval.
Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with an underline.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
Cardinality can be of different types:
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the cardinality
is one-to-one. Let us assume that a male can marry one female and a female can marry one
male. So the relationship will be one-to-one.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.