0% found this document useful (0 votes)
21 views40 pages

Understanding Database Management Systems

A Database Management System (DBMS) is software that enables users to create, modify, and query databases while ensuring data integrity, security, and efficient access. It offers advantages over traditional file systems, such as reduced data redundancy, improved data management, and support for concurrent access. DBMS architectures include 1-tier, 2-tier, and 3-tier systems, each with varying complexities and use cases.

Uploaded by

Navya gupta
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views40 pages

Understanding Database Management Systems

A Database Management System (DBMS) is software that enables users to create, modify, and query databases while ensuring data integrity, security, and efficient access. It offers advantages over traditional file systems, such as reduced data redundancy, improved data management, and support for concurrent access. DBMS architectures include 1-tier, 2-tier, and 3-tier systems, each with varying complexities and use cases.

Uploaded by

Navya gupta
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

What is a DBMS?

A DBMS is a system that allows users to create, modify and query databases while ensuring
data integrity, security and efficient data access.
Unlike traditional file systems, DBMS minimizes data redundancy, prevents inconsistencies
and simplifies data management with features like concurrent access and backup
mechanisms.
It organizes data into tables, views, schemas and reports, providing a structured approach to
data management.
Example: A university database can store and manage student information, faculty records
and administrative data, allowing seamless retrieval, insertion and deletion of information as
required.
Key Features of DBMS
1. Data Modeling: Tools to create and modify data models, defining the structure and
relationships within the database.
2. Data Storage and Retrieval: Efficient mechanisms for storing data and executing queries
to retrieve it quickly.
3. Concurrency Control: Ensures multiple users can access the database simultaneously
without conflicts.
4. Data Integrity and Security: Enforces rules to maintain accurate and secure data,
including access controls and encryption.
5. Backup and Recovery: Protects data with regular backups and enables recovery in case of
system failures.
Types of DBMS
There are several types of Database Management Systems (DBMS), each tailored to different
data structures, scalability requirements and application needs. The most common types are
as follows:
Database Languages
Database languages are specialized sets of commands and instructions used to define,
manipulate and control data within a database. Each language type plays a distinct role
in database management, ensuring efficient storage, retrieval and security of data. The
primary database languages include:

Database Languages
1. Data Definition Language (DDL)
DDL is the short name for Data Definition Language, which deals with database schemas and
descriptions, of how the data should reside in the database.
CREATE: to create a database and its objects like (table, index, views, store procedure,
function and triggers)
ALTER: alters the structure of the existing database
DROP: delete objects from the database
TRUNCATE: remove all records from a table, including all spaces allocated for the records
are removed
COMMENT: add comments to the data dictionary
RENAME: rename an object
2. Data Manipulation Language (DML)
DML focuses on manipulating the data stored in the database, enabling users to retrieve, add,
update and delete data.
SELECT: retrieve data from a database
INSERT: insert data into a table
UPDATE: updates existing data within a table
DELETE: Delete all records from a database table
MERGE: UPSERT operation (insert or update)
CALL: call a PL/SQL or Java subprogram
EXPLAIN PLAN: interpretation of the data access path
LOCK TABLE: concurrency Control
3. Data Control Language (DCL)
DCL commands manage access permissions, ensuring data security by controlling who can
perform certain actions on the database.
GRANT: Provides specific privileges to a user (e.g., SELECT, INSERT).
REVOKE: Removes previously granted permissions from a user.
4. Transaction Control Language (TCL)
TCL commands oversee transactional data to maintain consistency, reliability and atomicity.
ROLLBACK: Undoes changes made during a transaction.
COMMIT: Saves all changes made during a transaction.
SAVEPOINT: Sets a point within a transaction to which one can later roll back.
5. Data Query Language (DQL)
DQL is a subset of DML, specifically focused on data retrieval.
SELECT: The primary DQL command, used to query data from the database without altering
its structure or contents.
Paradigm Shift from File System to DBMS
Before the advent of modern DBMS, data was managed using basic file systems on hard
drives. While this approach allowed users to store, retrieve and update files as needed, it
came with numerous challenges.
A typical example can be seen in a file-based university management system, where data was
stored in separate sections such as Departments, Academics, Results, Accounts and Hostels.
Certain information like student names and phone numbers was repeated across multiple
files, leading to the following issues:
1. Redundancy of data: When the same data exists in multiple places, any update must be
manually repeated everywhere. For instance, if a student changes their phone number, it must
be updated across all sections. Failure to do so leads to unnecessary duplication and wasted
storage.
2. Inconsistency of Data: Data is said to be inconsistent if multiple copies of the same data do
not match each other. If the Phone number is different in Accounts Section and Academics
Section, it will be inconsistent. Inconsistency may be because of typing errors or not updating
all copies of the same data.
3. Complex Data Access: A user should know the exact location of the file to access data, so
the process is very cumbersome and tedious. If the user wants to search the student hostel
allotment number of a student from 10000 unsorted students’ records, how difficult it can be.
4. Lack of Security: File systems provided limited control over who could access certain data.
A student who gained access to a file with grades might easily alter it without proper
authorization, compromising data integrity.
5. No Concurrent Access: File systems were not designed for multiple users working at the
same time. If one user was editing a file, others had to wait, which hindered collaboration and
slowed down workflows.
6. No Backup and Recovery: File systems lacked built-in mechanisms for creating backups or
recovering data after a loss. If a file was accidentally deleted or corrupted, there was no easy
way to restore it, potentially causing permanent data loss.
Applications of DBMS

Application of DBMS
Enterprise Information: Sales, accounting, human resources, Manufacturing, online retailers.
Banking and Finance Sector: Banks maintaining the customer details, accounts, loans,
banking transactions, credit card transactions. Finance: Storing the information about sales
and holdings, purchasing of financial stocks and bonds.
University: Maintaining the information about student course enrolled information, student
grades, staff roles.
Airlines: Reservations and schedules.
Telecommunications: Prepaid, postpaid bills maintenance.
Advantages of DBMS
Data organization: A DBMS allows for the organization and storage of data in a structured
manner, making it easy to retrieve and query the data as needed.
Data integrity: A DBMS provides mechanisms for enforcing data integrity constraints, such
as constraints on the values of data and access controls that restrict who can access the data.
Concurrent access: A DBMS provides mechanisms for controlling concurrent access to
the database, to ensure that multiple users can access the data without conflicting with each
other.
Data security: A DBMS provides tools for managing the security of the data, such as
controlling access to the data and encrypting sensitive data.
Backup and recovery: A DBMS provides mechanisms for backing up and recovering the data
in the event of a system failure.
Data sharing: A DBMS allows multiple users to access and share the same data, which can be
useful in a collaborative work environment.
Disadvantages of DBMS
Complexity: DBMS can be complex to set up and maintain, requiring specialized knowledge
and skills.
Performance overhead: The use of a DBMS can add overhead to the performance of an
application, especially in cases where high levels of concurrency are required.
Scalability: The use of a DBMS can limit the scalability of an application, since it requires
the use of locking and other synchronization mechanisms to ensure data consistency.
Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high, especially for
large or complex systems.
Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions don't need
high reliability, consistency or security and may be better served by other types of data
storage.

File System
The file system is basically a way of arranging the files in a storage medium like a hard disk.
The file system organizes the files and helps in the retrieval of files when they are required.
File systems consist of different files which are grouped into directories. The directories
further contain other folders and files. The file system performs basic operations like
management, file naming, giving access rules, etc.
Example: NTFS(New Technology File System) , EXT(Extended File System).
File System
DBMS ( Database Management System)
Database Management System is basically software that manages the collection of related
data. It is used for storing data and retrieving the data effectively when it is needed. It also
provides proper security measures for protecting the data from unauthorized access. In
Database Management System the data can be fetched by SQL queries and relational
algebra. It also provides mechanisms for data recovery and data backup.

Example:
Oracle, MySQL, MS SQL server.

DBMS
Difference Between File System and DBMS

Basics File System DBMS

Structure The file system is a way of arranging DBMS is software for managing
the files in a storage medium within the database.
a computer.

Data Redundant data can be present in a In DBMS there is no redundant


Redundancy file system. data.

Backup and It doesn't provide Inbuilt mechanism It provides in house tools for
Recovery for backup and recovery of data if it backup and recovery of data even
is lost. if it is lost.

Query There is no efficient query Efficient query processing is there


processing processing in the file system. in DBMS.

Consistency There is more data consistency


There is less data consistency in the
because of the process
file system.
of normalization .

Complexity It is less complex as compared to It has more complexity in handling


DBMS. as compared to the file system.

Security DBMS has more security


Constraints File systems provide less security in
mechanisms as compared to file
comparison to DBMS.
systems.

Cost It has a comparatively higher cost


It is less expensive than DBMS.
than a file system.

Data In DBMS data


Independence independence exists, mainly of
two types:
There is no data independence.
1) Logical Data Independence .
2)Physical Data Independence.

User Access Only one user can access data at a Multiple users can access data at a
time. time.

Meaning The users are not required to write The user has to write procedures
procedures. for managing databases

Sharing Data is distributed in many files. So, Due to centralized nature data
it is not easy to share data. sharing is easy
Data It give details of storage and It hides the internal details
Abstraction representation of data of Database

Integrity Integrity Constraints are difficult to Integrity constraints are easy to


Constraints implement implement

Attribute s To access data in a file , user requires


attributes such as file name, file No such attributes are required.
location.

Example Cobol , C++ Oracle , SQL Server

DBMS Architecture 1-level, 2-Level, 3-Level


A DBMS architecture defines how users interact with the database to read, write, or update
information. A well-designed architecture and schema (a blueprint detailing tables, fields, and
relationships) ensure data consistency, improve performance, and keep data secure.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage
requirements.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1-Tier Architecture
In 1-Tier Architecture, the user works directly with the database on the same system. This
means the client, server, and database are all in one application. The user can open the
application, interact with the data, and perform tasks without needing a separate server or
network connection.
A common example is Microsoft Excel. Everything from the user interface to the logic and
data storage happens on the same device. The user enters data, performs calculations, and
saves files directly on their computer.
This setup is simple and easy to use, making it ideal for personal or standalone applications.
It does not require a network or complex setup, which is why it's often used in small-scale or
individual use cases.
This architecture is simple and works well for personal, standalone applications where no
external server or network connection is needed.
DBMS 1-Tier Architecture
Advantages of 1-Tier Architecture
Below mentioned are the advantages of 1-Tier Architecture.
Simple Architecture: 1-Tier Architecture is the most simple architecture to set up, as only a
single machine is required to maintain it.
Cost-Effective: No additional hardware is required for implementing 1-Tier Architecture,
which makes it cost-effective.
Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is mostly used in
small projects.
Disadvantages of 1-Tier Architecture
Limited to Single User: Only one person can use the application at a time. It’s not designed
for multiple users or teamwork.
Poor Security: Since everything is on the same machine, if someone gets access to the
system, they can access both the data and the application easily.
No Centralized Control: Data is stored locally, so there's no central database. This makes it
hard to manage or back up data across multiple devices.
Hard to Share Data: Sharing data between users is difficult because everything is stored on
one computer.
2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model. The application at the client
end directly communicates with the database on the server side. APIs
like ODBC and JDBC are used for this interaction. The server side is responsible for
providing query processing and transaction management functionalities.
On the client side, the user interfaces and application programs are run. The application on
the client side establishes a connection with the server side to communicate with the DBMS.
For Example: A Library Management System used in schools or small organizations is a
classic example of two-tier architecture.
Client Layer (Tier 1): This is the user interface that library staff or users interact with. For
example they might use a desktop application to search for books, issue them, or check due
dates.
Database Layer (Tier 2): The database server stores all the library records such as book
details, user information, and transaction logs.
The client layer sends a request (like searching for a book) to the database layer which
processes it and sends back the result. This separation allows the client to focus on the user
interface, while the server handles data storage and retrieval.

DBMS 2-Tier Architecture

Advantages of 2-Tier Architecture


Easy to Access: 2-Tier Architecture makes easy access to the database, which makes fast
retrieval.
Scalable: We can scale the database easily, by adding clients or upgrading hardware.
Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier Architecture.
Simple: 2-Tier Architecture is easily understandable as well as simple because of only two
components.
Disadvantages of 2-Tier Architecture
Limited Scalability: As the number of users increases, the system performance can slow
down because the server gets overloaded with too many requests.
Security Issues: Clients connect directly to the database, which can make the system more
vulnerable to attacks or data leaks.
Tight Coupling: The client and the server are closely linked. If the database changes, the
client application often needs to be updated too.
Difficult Maintenance: Managing updates, fixing bugs, or adding features becomes harder
when the number of users or systems increases.
3-Tier Architecture
In 3-Tier Architecture, there is another layer between the client and the server. The client
does not directly communicate with the server. Instead, it interacts with an application server
which further communicates with the database system and then the query processing and
transaction management takes place.
This intermediate layer acts as a medium for the exchange of partially processed data
between the server and the client. This type of architecture is used in the case of large web
applications.
Example: E-commerce Store
User: You visit an online store, search for a product and add it to your cart.
Processing: The system checks if the product is in stock, calculates the total price and applies
any discounts.
Database: The product details, your cart and order history are stored in the database for future
reference.

DBMS 3-Tier Architecture


Advantages of 3-Tier Architecture
Enhanced scalability: Scalability is enhanced due to the distributed deployment of application
servers. Now, individual connections need not be made between the client and server.
Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a middle layer
between the client and the server, data corruption can be avoided/removed.
Security: 3-Tier Architecture Improves Security. This type of model prevents direct
interaction of the client with the server thereby reducing access to unauthorized data.
Disadvantages of 3-Tier Architecture
More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier Architecture.
Communication Points are also doubled in 3-Tier Architecture.
Difficult to Interact: It becomes difficult for this sort of interaction to take place due to the
presence of middle layers.
Slower Response Time: Since the request passes through an extra layer (application server), it
may take more time to get a response compared to 2-Tier systems.
Higher Cost: Setting up and maintaining three separate layers (client, server, and database)
requires more hardware, software, and skilled people. This makes it more expensive.
For more information, you can refer to the Advantages and Disadvantages of 3-Tier
Architecture in DBMS.
What is DBMS architecture?
DBMS architecture is the design structure that defines how data is stored, managed, and
accessed in a database system. It organizes the database into layers or tiers to ensure efficient
data processing, better scalability, and easier management.
What is tier 1 and tier 2 and tier 3 architecture?
Tier 1 (Single-Tier Architecture): The database and application are on the same system.
Tier 2 (Two-Tier Architecture): The application and database are separate, directly connected
via a network.
Tier 3 (Three-Tier Architecture): Divides into three parts: user interface, logic and database,
all working together.
What are the levels of DBMS?
The levels of DBMS are:
Internal Level: Deals with how data is physically stored in the database.
Conceptual Level: Focuses on the structure of the entire database and how data is organized.
External Level: Shows the database to users in a way that suits their needs, such as views or
reports.

External level
It is also called view level. The reason this level is called “view” is because several users can
view their desired data from this level which is internally fetched from database with the help
of conceptual and internal level mapping. The user doesn’t need to know the database schema
details such as data structure, table definition etc. user is only concerned about data which is
what returned back to the view level after it has been fetched from database (present at the
internal level). External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level It is also called logical level. The whole design of the database such as
relationship among data, schema of data etc. are described in this level. Database constraints
and security are also implemented in this level of architecture. This level is maintained by
DBA (database administrator)
3. . Internal level This level is also known as physical level. This level describes how the
data is actually stored in the storage devices. This level is also responsible for allocating
space to the data. This is the lowest level of the architecture.
Caching
Caching is used to implement a high-speed system with a large number of users. A cache is a
high-speed data storage that stores data temporarily to serve future requests faster.
Database caching is like a helper for your primary database (DB). It is a mechanism that
stores frequently accessed data in temporary memory. Whenever the application requests the
data again, that can quickly get it from this helper, instead of from the main database. Cache
helps to reduce the database workloads. So it increases system speed by reducing the need to
fetch data from DB.
Why is Database Caching Strategy Important?
 Improved System Performance: In general, the frequently used data is stored in the
cache. So the applications can retrieve that data more quickly from the cache instead
of the primary DB. It improves the system performance and reduces latency.
 Increase Data Availability: Caching increases data availability if the primary DB goes
down, caching acts as a backup source. Depending on where the cache is stored, it can
still serve data to the application even if the primary DB is unavailable.
 Cost Savings: The caching mechanism reduces DB server load. Caching can help to
build the infrastructure with lower costs compared to DB servers. It allows
organizations to achieve the same level of performance with fewer resources.
 Improved User Experience: Using the Caching technique responses are returned
faster. It leads to a better overall user experience, particularly for applications that
require real-time or near-real-time access to data.
What is ResultSet Caching?
ResultSet Caching is a Dedicated SQL Pools feature which caches a dataset created using a
SELECT query, for example a query that aggregates data, or returns all the data from a table
(e.g. in a Power BI table load). If the query satisfies certain criteria (listed below) then it gets
added to the Dedicated SQL Pools cache. Then when a request comes in that matches the
query definition then the result can be served from the cache. This is really beneficial for a
couple of reasons:
 It speeds up the retrieval of the result as it can just return from the cache
 It also does not use up any resource allocation % for the Dedicated SQL Pool.
Criteria
There are a few criteria a query must meet before it can be added to the ResultSet Cache:
 ResultSet Caching must be enabled on the Dedicated SQL Pool database
 No data changes in the table
 No schema changes in the table
 Queries using user-defined functions are non-deterministic functions like GetDate()
 Dedicated SQL Pool tables with Row-Level Security
 Queries which return a dataset larger than 10GB
 Queries with row sizes larger that 64K…that is roughly 16 VARCHAR(4000)
columns
 Controls the result set caching behavior for the current client session.
 Applies to Azure Synapse Analytics

 Transact-SQL syntax conventions


 Syntax
 syntaxsqlCopy
 SET RESULT_SET_CACHING { ON | OFF };
 Note
 This syntax is not supported by serverless SQL pool in Azure Synapse Analytics.
 Remarks
 Run this command when connected to the user database where you want to configure
the result_set_caching setting for.
 ON
Enables result set caching for the current client session. Result set caching cannot be
turned ON for a session if it is turned OFF at the database level.
 OFF
Disable result set caching for the current client session.

Use SQL to turn on or turn off statement caching for an individual database session when
statement caching is enabled for the database server. The following statement shows how to
use SQL to turn on caching for the current database session:

SET STATEMENT CACHE ON

The following statement shows how to use SQL to turn off caching for the current database
session:

SET STATEMENT CACHE OFF

Object or field-level caching in a database management system (DBMS) involves storing


frequently accessed data in a temporary, high-speed storage area (cache) to reduce the need
for repeated database queries. This can significantly improve performance by reducing
database load and speeding up data retrieval.
Caching at the field level in a DBMS involves storing frequently accessed field values in a
faster storage medium (like cache memory) to improve query performance and reduce
database load. This technique can significantly speed up data retrieval, especially for read-
heavy workloads, by minimizing the need to access slower storage like hard drives.
Purpose:
Caching field values (or parts of rows) aims to reduce the number of database reads,
especially for frequently accessed fields or when dealing with large datasets.
How it works:
When a query is executed, the DBMS first checks the cache. If the required field value is
found in the cache (a cache hit), it's returned directly, bypassing the slower database
lookup. If the value is not in the cache (a cache miss), the DBMS fetches it from the database,
stores it in the cache for future use, and then returns it.
Benefits:
Faster query execution: Retrieving data from the cache is significantly faster than accessing
the disk, leading to quicker response times.
Reduced database load: By caching frequently accessed data, the load on the database server
is reduced, allowing it to handle more requests efficiently.
Improved scalability: Caching helps applications scale better, especially those with high read
traffic.
Where caching happens:
Caching can be implemented at different levels:
Database-level caching: Many DBMSs have built-in mechanisms like buffer pools (e.g.,
PostgreSQL, MySQL) that cache data pages (blocks of data from tables or indexes) in
memory.
Application-level caching: Applications can also implement their own caching mechanisms,
often using dedicated caching systems like Redis or Memcached.
Standalone caching layers: Dedicated caching systems can act as a separate layer between the
application and the database.

Example:
Imagine a website displaying product prices. Instead of querying the database for each price
request, the application can cache frequently accessed product prices in a cache. When a user
requests a product price, the application first checks the cache. If the price is cached, it's
returned immediately. Otherwise, the price is fetched from the database, stored in the cache,
and then returned.
Considerations:

Cache invalidation: When the data in the database changes, the cache needs to be
updated or invalidated to ensure data consistency.
Cache size: The cache should be appropriately sized to hold frequently accessed data
without causing performance issues due to cache misses or eviction.
Cache eviction policies: Algorithms like Least Recently Used (LRU) or Least Frequently
Used (LFU) can be used to manage the cache when it reaches its capacity.
Introduction of Shadow Paging
Shadow paging is a fundamental recovery technique used in database management systems
(DBMS) to ensure the reliability and consistency of data. It plays a crucial role in maintaining
atomicity and durability which are the two core properties of transaction management
by maintaining two versions of the database state: the shadow page table and the current page
table. This technique is also known as Cut-of-Place updating.
This technique ensures that a database can recover seamlessly from failures without losing
data integrity. During a transaction, updates are made to a new version of the database pages
tracked by the current page table, while the shadow page table preserves the pre-transaction
state. This dual-table approach allows for efficient crash recovery and simplifies the commit
and rollback processes.
Page Table : A page table is a data structure that maps logical pages (a logical division of
data) to physical pages (actual storage on disk).
Each entry in the page table corresponds to a physical page location on the disk.
The database uses the page table to retrieve or modify data.
How Shadow Paging Works ?
Shadow paging is a recovery technique that views the database as a collection of fixed-sized
logical storage units, known as pages, which are mapped to physical storage blocks using a
structure called the page table. The page table enables the system to efficiently locate and
manage database pages.
Here’s how shadow paging works in detail:
Start of Transaction:
The shadow page table is created by copying the current page table.
The shadow page table represents the original, unmodified state of the database.
This table is saved to disk and remains unchanged throughout the transaction.

Logical Page Shadow Page Table Current Page


(Disk) Table

P1 Address_1 Address_1

P2 Address_2 Address_2

P3 Address_3 Address_3

Transaction Execution:
Updates are made to the database by creating new pages.
The current page table reflects these changes, while the shadow page table remains
unchanged.
Page Modification:
If a logical page (e.g. P2) needs to be updated:
A new version of the page (P2’) is created in memory and written to a new physical storage
block.
The current page table entry for P2 is updated to point to P2’.
The shadow page table still points to the original page P2, ensuring it is unaffected by the
changes.
Logical Page Shadow Page Table Current Page
(Disk) Table

P1 Address_1 Address_1

P2 Address_2 Address_4 (P2')

P3 Address_3 Address_3

Commit:
If the transaction is successful, the shadow page table is replaced by the current page table.
This replacement makes the changes permanent.

Logical Page Shadow Page Table Current Page


(Disk) Table

P1 Address_1 Address_1

P2 Address_4 (P2') Address_4 (P2')

P3 Address_3 Address_3

Abort:
If the transaction is aborted, the current page table is discarded, leaving the shadow page
table intact.
Since the shadow page table still points to the original pages, no changes are reflected in the
database.
Consider the below example:
To understand concept, consider above figure. In this 2 write operations are performed on
page 3 and 5. Before start of write operation on page 3, current page table points to old page
3. When write operation starts following steps are performed :
Firstly, search start for available free block in disk blocks.
After finding free block, it copies page 3 to free block which is represented by Page 3 (New).
Now current page table points to Page 3 (New) on disk but shadow page table points to old
page 3 because it is not modified.
The changes are now propagated to Page 3 (New) which is pointed by current page table.
COMMIT Operation : To commit transaction following steps should be done :
All the modifications which are done by transaction which are present in buffers are
transferred to physical database.
Log based Recovery in DBMS
Log-based recovery in DBMS ensures data can be maintained or restored in the event of a
system failure. The DBMS records every transaction on stable storage, allowing for easy data
recovery when a failure occurs. For each operation performed on the database, a log file is
created. Transactions are logged and verified before being applied to the database, ensuring
data integrity.
A log is a sequence of records that document the operations performed during database
transactions. Logs are stored in a log file for each transaction, providing a mechanism to
recover data in the event of a failure. For every operation executed on the database, a
corresponding log record is created. It is critical to store these logs before the actual
transaction operations are applied to the database, ensuring data integrity and consistency
during recovery processes.
For example, consider a transaction to modify a student's city. This transaction generates the
following logs:
Start Log: When the transaction begins, a log is created to indicate the start of the transaction.
Format:<Tn, Start>
Here, Tn represents the transaction identifier.
Example: <T1, Start> indicates that Transaction 1 has started.
Operation Log: When the city is updated, a log is recorded to capture the old and new values
of the operation.
Format:<Tn, Attribute, Old_Value, New_Value>
Example: <T1, City, 'Gorakhpur', 'Noida'> shows that in Transaction 1, the value of
the City attribute has changed from 'Gorakhpur' to 'Noida'.
Commit Log: Once the transaction is successfully completed, a final log is created to indicate
that the transaction has been completed and the changes are now permanent.
Format:<Tn, Commit>
Example: <T1, Commit> signifies that Transaction 1 has been successfully completed.
These logs play a crucial role in ensuring that the database can recover to a consistent state
after a system crash. If a failure occurs, the DBMS can use these logs to either roll back
incomplete transactions or redo committed transactions to maintain data consistency.
Key Operations in Log-Based Recovery
Undo Operation
The undo operation reverses the changes made by an uncommitted transaction, restoring the
database to its previous state.
Example of Undo:
Consider a transaction T1 that updates a bank account balance but fails before committing:
Initial State:
Account balance = 500.
Transaction T1:
Update balance to 600.
Log entry:
<T1, Balance, 500, 600>

Failure:
T1 fails before committing.
Undo Process:
Use the old value from the log to revert the change.
Set balance back to 500.
Final log entry after undo:
<T1, Abort>
Redo Operation
The redo operation re-applies the changes made by a committed transaction to ensure
consistency in the database.
Example of Redo:
Consider a transaction T2 that updates an account balance but the database crashes before
changes are permanently reflected:
Initial State:
Account balance = 300.
Transaction T2:
Update balance to 400.
Log entries:
<T2, Start><T2, Balance, 300, 400><T2, Commit>

Crash:
Changes are not reflected in the database.
Redo Process:
Use the new value from the log to reapply the committed change.
Set balance to 400.
Undo-Redo Example:
Assume two transactions:
T1: Failed transaction (requires undo).
T2: Committed transaction (requires redo).
Log File:
<T1, Start><T1, Balance, 500, 600><T2, Start><T2, Balance, 300, 400><T2, Commit><T1,
Abort>
Recovery Steps:
Identify Committed and Uncommitted Transactions:
T1: Not committed → Undo.
T2: Committed → Redo.
Undo T1:
Revert balance from 600 to 500.
Redo T2:
Reapply balance change from 300 to 400.

Operation Trigger Action

For uncommitted/failed Revert changes using the old values in the


Undo
transactions log.

Reapply changes using the new values in


Redo For committed transactions
the log
Full backups
A full backup involves creating a complete copy of the entire database, including all data
files, schemas, and other database objects, at a specific point in time. It acts as a self-
contained snapshot of your database.
Pros of full backups
Complete Data Protection: Offers the highest level of data protection as it captures everything
in the database.
Faster Recovery: Restoring from a full backup is generally the quickest way to recover a
database as it requires only a single backup file.
Simplicity: The process is straightforward and easy to manage.
Cons of full backups
Time-Consuming: Can be very time-consuming, especially for large databases.
Resource Intensive: Requires significant storage space and can consume considerable
network bandwidth and I/O resources during creation.
Less Flexible for Frequent Backups: Due to resource consumption, it may not be practical to
perform full backups daily.
When to use full backups
As the initial backup in any backup strategy.
Periodically, for example, weekly or monthly, to establish new baselines.
Before major system changes like upgrades or migrations.
When a complete data snapshot is required for archival or compliance purposes.
2. Differential backups
A differential backup copies all the data that has changed or been added since the last full
backup. It builds upon the most recent full backup, capturing cumulative changes over time.
Pros of differential backups
Faster to Create than Full Backups: As they only copy changed data, differential backups are
faster to generate than full backups.
Faster Recovery than Incremental Backups: Restoring data from a differential backup is
typically quicker than incremental backups, requiring only the last full backup and the latest
differential backup.
Moderately Efficient Storage Use: Requires less storage space compared to performing full
backups every time.
Cons of differential backups
Requires More Storage than Incremental: Differential backups accumulate all changes since
the last full backup, potentially becoming larger than incremental backups over time.
Backup Size Increases Over Time: Each subsequent differential backup grows in size until a
new full backup is created.
Potential for Failed Recovery: If the initial full backup is compromised, restoring from
subsequent differential backups may not be possible.
When to use differential backups
When a balance between backup speed, storage consumption, and restoration time is desired.
In environments where data changes are significant but not so massive as to make differential
backups excessively large.
As part of a strategy with weekly full backups and daily differential backups to ensure
efficient daily data protection
Physical level (or Internal View / Schema): The lowest level of abstraction describes how
the data are actually stored. The physical level describes complex low-level data structures
in detail.
• Logical level (or Conceptual View / Schema): The next-higher level of abstraction
describes what data are stored in the database, and what relationships exist among those
data. The logical level thus describes the entire database in terms of a small number of
relatively simple structures. Although implementation of the simple structures at the logical
level may involve complex physical-level structures, the user of the logical level does not
need to be aware of this complexity. This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use
the logical level of abstraction.
• View level (or External View / Schema): The highest level of abstraction describes only
part of the entire database. Even though the logical level uses simpler structures, complexity
remains because of the variety of information stored in a large database. Many users of the
database system do not need all this information; instead, they need to access only a part of
the database. The view level of abstraction exists to simplify their interaction with the
system. The system may provide many views for the same database. Figure 1.2 shows the
relationship among the three levels of abstraction.
A columnar database,
also known as a column-oriented database, is a type of Database Management System
(DBMS) that stores data in columns rather than rows,. This contrasts with traditional row-
oriented databases (like most relational databases), where all the values of a single record
(row) are stored together.
Therefore, databases that are organized in columns provide better performances for
operations that require applications to read data such as data analytics and data warehousing.
This approach enhances the rate of query processing and reduces disk I/O making it suitable
for cases that involve a large number of queries that are processed on large amounts of data.

Example of Columnar Database


Here is an example of a simple database table with four columns and three rows.

ID Number Last Name First Name Bonus

534782 Miller Ginny 6000

585523 Parker Peter 8000

479148 Stacy Gwen 2000

In a Columnar DBMS, the data stored is in this format:


534782, 585523, 479148; Miller, Parker, Stacy; Ginny, Peter, Gwen; 6000, 8000, 2000.
In a Row-oriented DBMS, the data stored is in this format:
534782, Miller, Ginny, 6000; 585523, Parker, Peter, 8000; 479148, Stacy, Gwen, 2000.
When to use the Columnar Database
Queries that involve only a few columns.
Compression but column-wise only.
Clustering queries against a huge amount of data.
Advantages of Columnar Database
Columnar databases can be used for different tasks such as when the applications that are
related to big data comes into play then the column-oriented databases have greater attention
in such case.
The data in the columnar database has a highly compressible nature and has different
operations like (AVG), (MIN, MAX), which are permitted by the compression.
Efficiency and Speed: The speed of Analytical queries that are performed is faster in
columnar databases.
Self-indexing: Another benefit of a column-based DBMS is self-indexing, which uses less
disk space than a relational database management system containing the same data.
Limitation of Columnar Database
For loading incremental data, traditional databases are more relevant as compared to column-
oriented databases.
For Online transaction processing (OLTP) applications, Row oriented databases are more
appropriate than columnar databases.

Introduction of ER Model
The Entity-Relationship Model (ER Model) is a conceptual model for designing a databases.
This model represents the logical structure of a database, including entities, their attributes
and relationships between them.
Entity: An objects that is stored as data such as Student, Course or Company.
Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
Relationship: A connection between entities such as "a Student enrolls in a Course".

Why Use ER Diagrams In DBMS?


ER diagrams represent the E-R model in a database, making them easy to convert into
relations (tables).
These diagrams serve the purpose of real-world modeling of objects which makes them
intently useful.
Unlike technical schemas, ER diagrams require no technical knowledge of the underlying
DBMS used.
They visually model data and its relationships, making complex systems easier to understand.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
Rectangles: Rectangles represent entities in the ER Model.
Ellipses: Ellipses represent attributes in the ER Model.
Diamond: Diamonds represent relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other relationship types.
Double Ellipse: Double ellipses represent multi-valued Attributes, such as a student's
multiple phone numbers
Double Rectangle: Represents weak entities, which depend on other entities for
identification.

Symbols used in ER Diagram


What is an Entity?
An Entity represents a real-world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent these
entities.
Example of entities:
Real-World Objects: Person, Car, Employee etc.
Concepts: Course, Event, Reservation etc.
Things: Product, Document, Device etc.
The entity type defines the structure of an entity, while individual instances of that type
represent specific entities.
What is an Entity Set?
An entity refers to an individual object of an entity type, and the collection of all entities of a
particular type is called an entity set. For example, E1 is an entity that belongs to the entity
type "Student," and the group of all students forms the entity set.
In the ER diagram below, the entity type is represented as:

Entity Set
We can represent the entity sets in an ER Diagram but we can't represent individual entities
because an entity is like a row in a table, and an ER diagram shows the structure and
relationships of data, not specific data entries (like rows and columns). An ER diagram is a
visual representation of the data model, not the actual data itself.
Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema for
its identification. It has a primary key that ensures its uniqueness and is represented by a
rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a
strong entity to be identified. A weak entity is associated with an identifying entity (strong
entity), which helps in its identification. A weak entity are represented by a double rectangle.
The participation of weak entity types is always total. The relationship between the weak
entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
Example:
A company may store the information of dependents (Parents, Children, Spouse) of an
Employee. But the dependents can't exist without the employee. So dependent will be a Weak
Entity Type and Employee will be identifying entity type for dependent, which means it is
Strong Entity Type.
Strong Entity and Weak Entity
Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student entity
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define entity type
Student. In ER diagram, the attribute is represented by an oval.

Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with an underline.

Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.

Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.

Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity and Attributes
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.

Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.

Relationship Set
Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one
person.

Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.

Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship, the
relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
Cardinality can be of different types:
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the cardinality
is one-to-one. Let us assume that a male can marry one female and a female can marry one
male. So the relationship will be one-to-one.

One to One Cardinality


Using Sets, it can be represented as:
Set Representation of One-to-One
2. One-to-Many
In one-to-many mapping as well where each entity can be related to more than one entity. Let
us assume that one surgeon department can accommodate many doctors. So the Cardinality
will be 1 to M. It means one department has many Doctors.

one to many cardinality


Using sets, one-to-many cardinality can be represented as:

Set Representation of One-to-Many


3. Many-to-One
When entities in one entity set can take part only once in the relationship set and entities in
other entity sets can take part more than once in the relationship set, cardinality is many to
one.
Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students
but for one student, there will be only one course.
many to one cardinality
Using Sets, it can be represented as:

Set Representation of Many-to-One


In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship cardinality is
many to many. Let us assume that a student can take more than one course and one course
can be taken by many students. So the relationship will be many to many.

many to many cardinality


Using Sets, it can be represented as:
Many-to-Many Set Representation
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and
S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation: Each entity in the entity set must participate in the relationship. If each
student must enroll in a course, the participation of students will be total. Total participation
is shown by a double line in the ER diagram.
2. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.

The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.

Total Participation and Partial Participation


Using Set, it can be represented as,
Set representation of Total Participation and Partial Participation
Every student in the Student Entity set participates in a relationship but there exists a course
C4 that is not taking part in the relationship.
How to Draw an ER Diagram
1. Identify Entities: The very first step is to identify all the Entities. Represent these entities
in a Rectangle and label them accordingly.
2. Identify Relationships: The next step is to identify the relationship between them and
represent them accordingly using the Diamond shape. Ensure that relationships are not
directly connected to each other.
3. Add Attributes: Attach attributes to the entities by using ovals. Each entity can have
multiple attributes (such as name, age, etc.), which are connected to the respective entity.
4. Define Primary Keys: Assign primary keys to each entity. These are unique identifiers
that help distinguish each instance of the entity. Represent them with underlined attributes.
5. Remove Redundancies: Review the diagram and eliminate unnecessary or repetitive
entities and relationships.
6. Review for Clarity: Review the diagram make sure it is clear and effectively conveys the
relationships between the entities.

You might also like