0% found this document useful (0 votes)
3 views156 pages

Physical Data Structures in DBMS Overview

The document discusses Physical Data Structures in Database Management Systems (DBMS), detailing various storage formats such as files, pages, and indexing structures that optimize data access and retrieval. It also covers query optimization techniques, including cost-based and heuristic optimization methods, to enhance query execution efficiency. Additionally, it explains transaction processing, emphasizing the ACID properties (Atomicity, Consistency, Isolation, Durability) essential for maintaining data integrity during transactions.

Uploaded by

barhatedamodar25
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views156 pages

Physical Data Structures in DBMS Overview

The document discusses Physical Data Structures in Database Management Systems (DBMS), detailing various storage formats such as files, pages, and indexing structures that optimize data access and retrieval. It also covers query optimization techniques, including cost-based and heuristic optimization methods, to enhance query execution efficiency. Additionally, it explains transaction processing, emphasizing the ACID properties (Atomicity, Consistency, Isolation, Durability) essential for maintaining data integrity during transactions.

Uploaded by

barhatedamodar25
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Physical Data Structures:

In DBMS (Database Management Systems), Physical Data Structures refer to the


actual storage formats and data organization techniques used to manage data on
storage devices like hard disks or SSDs. These structures are crucial for e icient data
access, retrieval, and storage.

key Physical Data Structures in DBMS:

1. Files

Files are the basic unit of storage for any DBMS.

Data files: Store actual data records.

Index files: Store indexes for faster data retrieval.

Log files: Store transactional logs for recovery purposes.

2. Pages and Blocks

Data is stored in pages (or blocks), typically 4KB or 8KB in size.

A page is the smallest unit of data transfer between disk and memory.

3. Record Storage

Data inside pages is stored as records (tuples).

Fixed-length records: Easier to manage but waste space.

Variable-length records: More space-e icient but complex to manage.

4. Heap Files

Unordered collection of records.

New records are added at the end.

Simple, but searching is slow (linear scan).

5. Sorted Files (Sequential Files)

Records are stored in a sorted order based on one or more fields.

Faster for range queries and sequential access.

Insertion/deletion is more costly than heap.

6. Indexing Structures

Used to improve search performance.


Types of Indexes:

Single-level index

Multi-level index

Primary index (on primary key)

Secondary index (on non-primary fields)

Clustered index

Non-clustered index

Common Index Data Structures:

B+ Trees: Balanced tree used in most databases; great for range and point queries.

Hash Indexes: Fast for exact matches, not suitable for range queries.

7. Hashing

Used in hash-based indexing.

Static Hashing: Fixed size hash table.

Dynamic Hashing: Hash table grows/shrinks (e.g., Extendible Hashing, Linear


Hashing).

8. Clustering

Related records are physically stored close together to reduce disk I/O.

9. Bu er Management

Caches frequently accessed pages in memory.

Uses algorithms like LRU (Least Recently Used) for page replacement.

Relationship to Logical Data Structures:

The physical data structures are the concrete implementation of the logical data
structures. The logical data structures represent the abstract view of how data is organized,
while the physical data structures define how that organization is realized in memory or on
storage devices. For example, a logical view of a table might be defined as a set of columns
and rows. The physical implementation would then involve how those rows are stored in
files, pages, and possibly indexed for faster access.

Importance of Physical Data Structures:

Understanding physical data structures is crucial for:

Database Performance Optimization:

Choosing the right storage structures and indexing strategies can significantly impact the
speed of data retrieval and updates.
Database Design:

Database designers need to consider physical storage constraints and performance


requirements when creating the database schema.

Database Administration:

Database administrators need to manage physical storage and tune database


configurations for optimal performance.

Data Security:

Physical storage structures also play a role in data security, as they determine how data is
stored and accessed.
Syllabus
UNIT IV
Query Optimization: Introduction, steps of optimization,
various algorithms to implement select, project and join
operations of relational algebra, optimization methods:
heuristic based, cost estimation based.
Transaction Processing, Concurrency Control and
Recovery Management: Transaction Model properties, State
Serializability, Lock base protocols, Two Phase Locking,
Time Stamping Protocols for Concurrency Control, and
Validation Based Protocol, Multiple Granularities, Granularity
of Data Item. Multi version schemes, Recovery with
Concurrent Transaction, Recovery technique based on
Deferred Update and Immediate Update, Shadow Paging,
Recovery in Multi Database System and Database Backup and
Recovery from Catastrophic Failure
Query Optimization
• Query: A query is a request for information from a database.

Query Processing:
• Query Processing is a translation of high-level queries into low-
level expression.
• It is a step wise process that can be used at the physical level of
the file system, query optimization and actual execution of the
query to get the result.
• It requires the basic concepts of relational algebra and file
structure.
• It refers to the range of activities that are involved in extracting
data from the database.
• It includes translation of queries in high-level database languages
into expressions that can be implemented at the physical level of
the file system.
Query Optimization:
• A single query can be executed through different algorithms or
re-written in different forms and structures.
• Hence, the question of query optimization comes into the
picture – Which of these forms or pathways is the most optimal?
The query optimizer attempts to determine the most efficient
way to execute a given query by considering the possible query
plans.
Importance: The goal of query optimization is to reduce the
system resources required to fulfill a query, and ultimately
provide the user with the correct result set faster.
• In query processing, we will understand how these queries
are processed and how they are optimized.
• Detailed Diagram is drawn as:.
In the above diagram,
1. Parser & Translator:- The first step is to transform the query
into a standard form.
• A query is translated into SQL and into a relational algebraic
expression.
• During this process, Parser perform the syntax check ,
Semantic check and Shared pool check and verifies the
relations and the attributes which are used in the query.
• Parser performs the following checks as (refer detailed
diagram):
A. Syntax check – concludes SQL syntactic validity.
Example: SELECT * FORM employee
• Here error of wrong spelling of FROM is given by this check.
1. Parser & Translator:-

B. Semantic check – determines whether the statement is


meaningful or not. Example: query contains a tablename which
does not exist is checked by this check.

[Link] Pool check – Every query possess a hash code during


its execution. So, this check determines existence of written hash
code in shared pool if code exists in shared pool then database
will not take additional steps for optimization and execution.
2. The second step is Query Optimizer. In this, it transforms the
query into equivalent expressions that are more efficient to
execute.
•During optimization stage, database must perform a hard parse
atleast for one unique DML statement and perform optimization
during this parse. This database never optimizes DDL unless it
includes a DML component such as subquery that require
optimization.
•It is a process in which multiple query execution plan for
satisfying a query are examined and most efficient query plan is
satisfied for execution.
•Database catalog stores the execution plans and then optimizer
passes the lowest cost plan for execution.

[Link] third step is Query evaluation. It executes the above query


execution plan and returns the result.
Translating SQL Queries into Relational Algebra

Example: Fetch the details of 1st semesterstudents


Studentinfo

ID NAME SEM
1 A 5
2 B 6
3 C 1
4 D 1

SQL Query:
Select*from studentinfo where sem=1;

Relational Algebra:
∏: Projection
σ : Selection
∏ ID,name,sem ( σ sem=1)(studentinfo)
Example:
Query:
SELECT Ename FROM Employee WHERE Salary > 5000;

Translated into Relational Algebra Expression


σ Salary > 5000 (π Ename (Employee))
OR
π Ename (σ Salary > 5000 (Employee))

•A sequence of primitive operations that can be used to evaluate a


query is a Query Execution Plan or Query Evaluation Plan.
• The above diagram indicates that the query execution engine
takes a query execution plan and returns the answers to the query.
• Query Execution Plan minimizes the cost of query evaluation.
Methods of Query Optimization

There are two methods of query optimization


• Cost based Optimization (Physical)
• Heuristic Optimization (Logical)
1. Cost based Optimization (Physical):
• This is based on the cost of the query. The query can use
different paths based on indexes, constraints, sorting methods
etc.
• This method mainly uses the statistics like record size, number
of records, number of records per block, number of blocks,
table size, whether whole table fits in a block, organization of
tables, uniqueness of column values, size of columns etc.
• Assign cost to Different strategies then it estimate and
compare their cost of different queries.
Methods of Query Optimization….

Cost based Optimization (Physical):


The cost of executing a query includes…
1. Cost of accessing secondary storage: cost of searching,
reading and writing disk blocks
2. Storage cost: Cost of storing intermediate results
3. Computation Cost: Cost of performing in memory
operation
4. Memory usage Cost: Cost related to number of occupied
memory buffers.
5. Communication cost: Cost of communicating the query
result from one side to another.
Methods of Query Optimization..

Cost based Optimization (Physical):

Problems:
1. Cost function required to assign cost to each
2. Number of execution strategies to be considered not fixed.
3. Sometime be time being to compare cost
4. Does not guarantee finding best strategies
5. Not time effective to consider every possible strategies
6. It is time being and expensive technique
2. Heuristic Optimization (Logical):
• Optimizer apply heuristics rules to find the optimized query
execution strategies. Rules are called heuristics, Usually it helps
to reduce the cost of execution of query.
• This method is also known as rule-based optimization.
• This is based on the equivalence rule on relational expressions;
hence the number of combination of queries get reduces here.
Hence the cost of the query too reduces.
• It is based on query tree
• This method creates relational tree for the given query based on
the equivalence rules. These equivalence rules by providing an
alternative way of writing and evaluating the query, gives the
better path to evaluate the query. This rule need not be true in all
cases. It needs to be examined after applying those rules. The
most important set of rules followed in this method is listed
below:
Heuristic Optimization (Logical):
• Perform all the selection operation as early as possible in the query.
This should be first and foremost set of actions on the tables in the
query. By performing the selection operation, we can reduce the
number of records involved in the query, rather than using the
whole tables throughout the query.
Heuristic Rules:
• It used as an optimized technique to modify the internal
representation of the queries.
• Uses set of rules that typically improve execution performance of
input query.
• 1. Apply select operation before project operation then join or
binary operations are used
• 2. Apply Project operation before join or other binary operations.
• Select and project reduce the size of file hence applied before join .
Or binary.
• Size of resulting relation form a binary operation like join is very
large.
Transaction
• A transaction is an action or series of actions that are being
performed by a single user or application program, which reads or
updates the contents of the database.

• A transaction can be defined as a logical unit of work on the


database. This may be an entire program, a piece of a program, or
a single command (like the SQL commands such as INSERT or
UPDATE), and it may engage in any number of operations on the
database.

• Let’s take an example of a simple transaction. Suppose a bank


employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level
tasks.
A’s Account
Open_Account(A)
Old_Balance = [Link]
New_Balance = Old_Balance - 500
[Link] = New_Balance
Close_Account(A)

B’s Account
Open_Account(B)
Old_Balance = [Link]
New_Balance = Old_Balance + 500
[Link] = New_Balance
Close_Account(B)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the
database and stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the
database from the buffer.
Let's take an example to debit transaction from an account which
consists of following operations:
Let's assume the value of X before starting of the transaction is 4000.
The first operation reads X's value from database and stores it in a
buffer.
The second operation will decrease the value of X by 500. So buffer
will contain 3500.
The third operation will write the buffer's value to the database. So X's
final value will be 3500.
But it may be possible that because of the failure of hardware, software
or power, etc. that transaction may fail before finished all the operations
in the set.

For example: If in the above transaction, the debit transaction fails


after executing operation 2 then X's value will remain 4000 in the
database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
Properties of Transaction (ACID PROPERTIES):

• The transaction has the four properties. These are used


to maintain consistency in a database, before and after
the transaction.
• A transaction in a database system must
maintain Atomicity, Consistency, Isolation, and
Durability commonly known as ACID properties in
order to ensure accuracy, completeness, and data
integrity.
1. Atomicity
It states that all operations of the transaction take place at once if not, the transaction is aborted.
There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as
one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs
600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.


If the transaction T fails after the completion of transaction T1 but before completion
of transaction T2, then the amount will be deducted from A but not added to B. This
shows the inconsistent database state.
In order to ensure correctness of database state, the transaction must be executed
in entirety.
2. Consistency − The database must remain in a consistent state after
any transaction. No transaction should have any adverse effect on the
data residing in the database. If the database was in a consistent state
before the execution of a transaction, it must remain consistent after the
execution of the transaction as well.

Referring to the example above,


The total amount before and after the transaction must be
maintained.
For example: The total amount must be maintained before or after the
transaction.

Total before T occurs = 500 + 200 = 700.


Total after T occurs = 400 + 300 = 700.

Therefore, database is consistent. Inconsistency occurs in case T1 completes


but T2 fails. As a result T is incomplete.
3. Durability
The durability property is used to indicate the
performance of the database's consistent state. It states
that the transaction made the permanent changes.
They cannot be lost by the erroneous operation of a faulty
transaction or by the system failure. When a transaction is
completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even
in the event of a system's failure.
The recovery subsystem of the DBMS has the
responsibility of Durability property.
4. Isolation −
• In a database system where more than one
transaction are being executed simultaneously and
in parallel, the property of isolation states that all
the transactions will be carried out and executed as
if it is the only transaction in the system. No
transaction will affect the existence of any other
transaction.
States of Transactions

A transaction in a database can be in one of the following states −


• Active − In this state, the transaction is being
executed. This is the initial state of every transaction.
• For example: Insertion or deletion or updating a
record is done here. But all the records are still not
saved to the database.

• Partially Committed − When a transaction executes


its final operation, it is said to be in a partially
committed state. but the data is still not saved to the
database.
• Failed − A transaction is said to be in a failed state if
any of the checks made by the database recovery
system fails. A failed transaction can no longer
proceed further.
• Aborted − If any of the checks fails and the transaction has
reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the
database back to its original state where it was prior to the
execution of the transaction. Transactions in this state are
called aborted. The database recovery module can select
one of the two operations after a transaction aborts −
Re-start the transaction
Kill the transaction
If the transaction fails in the middle of the transaction then
before executing the transaction, all the executed
transactions are rolled back to its consistent state.
• Committed − If a transaction executes all its operations
successfully, it is said to be committed. All its effects are
now permanently established on the database system.
Schedule
• This schedule determines the exact order of operations
that are going to be performed on database. A series of
operation from one transaction to another transaction
is known as schedule. It is used to preserve the order
of the operation in each of the individual transaction.
• It is a bundle of transaction executing together.
• When transactions are executing concurrently in an
interleaved fashion, then the order of execution of
operation from the various transaction is known as
schedule.

• Transaction processing system usually allow


multiple transaction to run concurrently.
For example:
• Suppose T1 & T2 be two
transactions that transfer funds
from one account to another.
Transaction T1 transfer Rs50 from
account A to account B. It is
defined in in Fig 16.4
• Transaction T2 Transfers 10
percent of the balance from account
A to account B. It is defined in fig
16.5
Types of Schedules
Types of Schedule
1. Serial Schedule:
• In Serial schedule, a transaction is executed completely before starting the
execution of another transaction.
• In other words, you can say that in serial schedule, a transaction does not
start execution until the currently running transaction finished execution.
• This type of execution of transaction is also known as non-
interleaved execution. The example we have seen above is the serial
schedule.
In this schedule,
• There are two transactions T1 and T2 executing serially one after the
other.
• Transaction T1 executes first.
• After T1 completes its execution, transaction T2 executes.
So, this schedule is an example of a Serial Schedule.
Another Example:
In this schedule,
• There are two transactions T1 and T2
executing serially one after the other.
• Transaction T2 executes first.
• After T2 completes its execution,
transaction T1 executes.
So, this schedule is an example of a
Serial Schedule.
Types of Serializability…

2. Non-Serial Schedules-
• In non-serial schedules,
• Multiple transactions execute concurrently.
• Operations of all the transactions are inter leaved or mixed with each
other.

In this schedule,
• There are two transactions T1
and T2 executing concurrently.
• The operations of T1 and T2 are
interleaved.
• So, this schedule is an example of
a Non-Serial Schedule.
Non-Serial Schedules
Types of Non-Serial Schedules
1. Serializable schedule:
• A serializable schedule always leaves the database in consistent
state.
• A serial schedule is always a serializable schedule because in
serial schedule, a transaction only starts when the other
transaction finished execution. However a non-serial schedule
needs to be checked for Serializability.
• A non-serial schedule of n number of transactions is said to be
serializable schedule, if it is equivalent to the serial schedule of
those n transactions.
• Some non-serial schedules may lead to inconsistency of the
database.
• Serializability is a concept that helps to identify which non-serial
schedules are correct and will maintain the consistency of the
database.
Types of Serializability
There are two types of Serializability.
1) Conflict Serializability
2) View Serializability

• Conflict Serializable:
• A schedule is called conflict serializable if it can be transformed
into a serial schedule by swapping non-conflicting operations.

Two operations are said to be in conflict, if they satisfy all the


following three conditions:
• Both the operations should belong to different transactions.
• Both the operations are working on same data item.
• At least one of the operation is a write operation.
Checking Whether a Schedule is Conflict Serializable Or Not-
Follow the following steps to check whether a given non-serial schedule is
conflict serializable or not-

Step-01: Find and list all the conflicting operations.

Step-02: Start creating a precedence graph by drawing one node for each
transaction.

Step-03:Draw an edge for each conflict pair such that if Xi (V) and Yj (V)
forms a conflict pair then draw an edge from Ti toTj.
This ensures that Ti gets executed beforeTj.

Step-04: Check if there is any cycle formed in the graph. If there is no cycle
found, then the schedule is conflict serializable otherwise not.
EXAMPLE
T2
Types of Serializability……
View Serializable schedule
• If a given schedule is found to be view equivalent to some
serial schedule, then it is called as a view serializable schedule
• Consider two schedules S1 and S2 each consisting of two
transactions T1 and T2.
• Schedules S1 and S2 are called view equivalent if the
following three conditions hold true for them-

• “Initial readers must be same for all the data items”.


• “Write-read sequence must be same.”.
• “Final writers must be same for all the data items”.
Non serializable Schedules
•A non-serial schedule which is not serializable is called as a
non-serializable schedule.
•A non-serializable schedule is not guaranteed to produce the
the same effect as produced by some serial schedule on any
consistent database.

Characteristics-

Non-serializable schedules-
•may or may not be consistent
•may or may not be recoverable
• .
Types of Non serializable
1. Recoverable schedule:
• Schedules in which transactions commit only after all transactions
whose changes they read commit are called recoverable
schedules.
• In other words, if some transaction Tj is reading value updated or
written by some other transaction Ti, then the commit of Tj must
occur after the commit of Ti.
Here,
•The commit operation of the transaction that performs the dirty read is
delayed.
•This ensures that it still has a chance to recover if the uncommitted
transaction fails later.
Types of Non serializable
1. Recoverable schedule…………
Here,
•T2 performs a dirty read operation.
•The commit operation of T2 is delayed till T1 commits or roll
backs.
•T1 commits later.
•T2 is now allowed to commit.
•In case, T1 would have failed, T2 has a chance to recover by
rolling back.
Types of Non serializable……
2. Non-recoverable schedule:
• If in a schedule, A transaction performs a dirty read operation
from an uncommitted transaction
• And commits before the transaction from which it has read the
value then such a schedule is known as an Irrecoverable
Schedule.
Types of Non serializable……
2. Non-recoverable schedule……
Here,
•T2 performs a dirty read operation.
•T2 commits before T1.
•T1 fails later and roll backs.
•The value that T2 read now stands to be incorrect.
•T2 can not recover since it has already committed.
Types of Recoverable schedule
1. Cascading Schedule :
• If in a schedule, failure of one transaction causes several other
dependent transactions to rollback or abort, then such a schedule
is called as a Cascading Schedule or Cascading Rollback or
Cascading Abort.
• It simply leads to the wastage of CPU time.
Here,
• Transaction T2 depends on transaction T1.
• Transaction T3 depends on transaction T2.
• Transaction T4 depends on transaction T3.

In this schedule,
• The failure of transaction T1 causes the transaction T2 to rollback.
• The rollback of transaction T2 causes the transaction T3 to
rollback.
• The rollback of transaction T3 causes the transaction T4 to
rollback.
• Such a rollback is called as a Cascading Rollback.
Types of Recoverable schedule………
2. Cascadeless Schedule-
• If in a schedule, a transaction is not allowed to read a data item until
the last transaction that has written it is committed or aborted, then
such a schedule is called as a Cascadeless Schedule.
• In other words, Cascadeless schedule allows only committed read
operations. Therefore, it avoids cascading roll back and thus saves
CPU time.
Types of Recoverable schedule………
3. Strict Schedule-
If in a schedule, a transaction is neither allowed to read nor write a
data item until the last transaction that has written it is committed or
aborted, then such a schedule is called as a Strict Schedule.
In other words,
•Strict schedule allows only committed read and write operations.
•Clearly, strict schedule implements more restrictions than
cascadeless schedule.
Types of Recoverable schedule………
Remember-
•Strict schedules are stricter than Cascadeless schedules.
•All strict schedules are Cascadeless schedules.
•All cascadeless schedules are not strict schedules.
Concurrency Problems in DBMS-

•When multiple transactions execute concurrently in an


uncontrolled or unrestricted manner, then it might lead to
several problems.
•Such problems are called as concurrency problems.
1. Dirty Read Problem-
This read is called as dirty read because-
•There is always a chance that the uncommitted transaction
might roll back later.
•Thus, uncommitted transaction might make other
transactions read a value that does not even exist.
•This leads to inconsistency of the database.
2. Unrepeatable Read Problem-
his problem occurs when a transaction gets to read
unrepeated i.e. different values of the same variable in
its different read operations even when it has not
updated its value.
3. Lost Update Problem-
This problem occurs when multiple transactions
execute concurrently and updates from one or more
transactions get lost.
4. Phantom Read Problem-
This problem occurs when a transaction reads some
variable from the buffer and when it reads the same
variable later, it finds that the variable does not exist.
Avoiding Concurrency Problems-

•To ensure consistency of the database, it is very important


to prevent the occurrence of above problems.
•Concurrency Control Protocols help to prevent the
occurrence of above problems and maintain the consistency
of the database.
•Such problems are called as concurrency problems.
Concurrency Control
• Concurrency Control in Database Management System is a
procedure of managing simultaneous operations without
conflicting with each other.
• It ensures that Database transactions are performed
concurrently and accurately to produce correct results without
violating data integrity of the respective Database.
• When more than one transactions are running simultaneously
there are chances of a conflict to occur which can leave
database to an inconsistent state.
• To handle these conflicts we need concurrency control in
DBMS, which allows transactions to run simultaneously but
handles them in such a way so that the integrity of data remains
intact.
Concurrency Control
• The technique is used to protect data when multiple users
are accessing same data concurrently (same time) is called
concurrency control
Concurrency control protocols can be broadly divided into
two categories

1)Lock based protocols


A lock is a mechanism to control concurrent access to a data item
Data items can be locked in two modes :

1. exclusive (X) mode. Data item can be both read as well as


written. X-lock is requested using lock-X instruction.

2. shared (S) mode. Data item can only be read. S-lock is


requested using lock-S instruction.
Lock requests are made to the concurrency-control manager by the
programmer. Transaction can proceed only after request is granted.
Lock-Based Protocols (Cont.)

Lock-compatibility matrix

A transaction may be granted a lock on an item if the


requested lock is compatible with locks already held on
the item by other transactions
Any number of transactions can hold shared locks on an
item,
But if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is
made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted.
Lock-Based Protocols (Cont.)
Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
Locking as above is not sufficient to guarantee serializability —
if A and B get updated in-between the read of A and B, the
displayed sum would be wrong.
A locking protocol is a set of rules followed by all transactions
while requesting and releasing locks. Locking protocols restrict
the set of possible schedules.
Concurrency control protocols can be broadly
divided into two categories

1)Lock based protocols…..


• Lock-compatibility Matrix :
• S X S TRUE FALSE X FALSE FALSE
• • A transaction may be granted a lock on an item if the
requested lock is compatible with locks already held on
the item by other transaction • Any number of
transactions can hold shared locks on an item, But if
any transaction holds an exclusive on the item no other
transaction may hold any lock on the item
Lock Based Protocol
• If a lock cannot be granted, the requesting transaction
is made to wait till all incompatible locks held by other
transaction have been released. The lock is then granted.
• Example:
T1: lock-S(A);
// Grant-S(A,
T1) read (A);
unlock(A);
lock-S(B);
// Grant-S(B,T1) read (B);
unlock(B);
display(A+B
Pitfalls of Lock Based Protocol

➢ Locking as above is not sufficient to guarantee


serializability - if A and B get updated in-between the
read of A and B, the displayed sum would be wrong.
➢ A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks.
Locking protocols restrict the set of possible
schedules.
➢ Locking as above is not sufficient to guarantee
serializability - if A and B get updated in-between the
read of A and B, the displayed sum would be wrong.
➢ A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks.
Locking protocols restrict the set of possible
schedules.
Pitfalls of Lock Based Protocol

➢ Starvation is also possible if concurrency


control manager is badly designed.
➢ For example: – A transaction may be waiting for
an X-lock on an item, while a sequence of other
transactions request and are granted
➢ an S-lock on the same item. – The same
transaction is repeatedly rolled back due to
deadlocks.
➢ • Concurrency control manager can be designed
to prevent starvation.
Two Phase Locking Protocol
• This is a protocol which ensures conflict-serializable
schedules.
• Phase 1: Growing Phase – transaction may obtain
locks – transaction may not release locks
• Phase 2: Shrinking Phase – transaction may release
locks – transaction may not obtain locks
• The protocol assures serializability. It can be proved
that the transactions can be serialized in the order of
their lock points (i.e. the point where a transaction
acquired its final lock).
Two Phase Locking Protocol ………
Two-phase locking does not ensure freedom from
deadlocks • Cascading roll-back is possible under two-
phase locking. To avoid this, follow a modified protocol
called strict two-phase locking. Here a transaction must
hold all its exclusive locks till it commits/aborts. •
Rigorous two-phase locking is even stricter: here all
locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they
commit.
Two Phase Locking Protocol ………
Two-phase locking does not ensure freedom from
deadlocks • Cascading roll-back is possible under two-
phase locking. To avoid this, follow a modified protocol
called strict two-phase locking. Here a transaction must
hold all its exclusive locks till it commits/aborts. •
Rigorous two-phase locking is even stricter: here all
locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they
commit.
The Two-Phase Locking Protocol

This protocol ensures conflict-serializable schedules.


Phase 1: Growing Phase
Transaction may obtain locks
Transaction may not release locks
Phase 2: Shrinking Phase
Transaction may release locks
Transaction may not obtain locks
The protocol assures serializability. It can be proved that
the transactions can be serialized in the order of their lock
points (i.e., the point where a transaction acquired its
final lock).
The Two-Phase Locking Protocol (Cont.)

There can be conflict serializable schedules that cannot


be obtained if two-phase locking is used.
However, in the absence of extra information (e.g.,
ordering of access to data), two-phase locking is needed
for conflict serializability in the following sense:
Given a transaction Ti that does not follow two-phase locking, we can
find a transaction Tj that uses two-phase locking, and a schedule for Ti
and Tj that is not conflict serializable.
Lock Conversions

Two-phase locking with lock conversions:


– First Phase:
can acquire a lock-S on item
can acquire a lock-X on item
can convert a lock-S to a lock-X (upgrade)
– Second Phase:
can release a lock-S
can release a lock-X
can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on
the programmer to insert the various locking
instructions.
Automatic
Acquisition of
Locks
A transaction Ti issues the standard read/write
instruction, without explicit locking calls.
The operation read(D) is processed as:
if Ti has a lock on D
then
read(D)
else begin
if necessary wait until no other
transaction has a lock-X on D
grant Ti a lock-S on D;
read(D)
end
Automatic
Acquisition of
Locks (Cont.)
write(D) is processed as:
if Ti has a lock-X on D
then
write(D)
else begin
if necessary wait until no other transaction has any lock
on D,
if Ti has a lock-S on D
then
upgrade lock on D to lock-X
else
grant Ti a lock-X on D
write(D)
end;
All locks are released after commit or abort
Deadlocks
Deadlock •A set of processes is deadlocked if each process
in the set is waiting for an event that only another process in
the set can cause. Ti Tj
Consider the partial schedule

Neither T3 nor T4 can make progress — executing lock-S(B)


causes T4 to wait for T3 to release its lock on B, while
executing lock-X(A) causes T3 to wait for T4 to release its
lock on A.
Such a situation is called a deadlock.
To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
Deadlocks (Cont.)

Two-phase locking does not ensure freedom from


deadlocks.
In addition to deadlocks, there is a possibility of
starvation.
Starvation occurs if the concurrency control manager
is badly designed. For example:
A transaction may be waiting for an X-lock on an item, while a
sequence of other transactions request and are granted an S-lock on
the same item.
The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to
prevent starvation.
Deadlocks (Cont.)

The potential for deadlock exists in most locking


protocols. Deadlocks are a necessary evil.
When a deadlock occurs there is a possibility of
cascading roll-backs.
Cascading roll-back is possible under two-phase
locking. To avoid this, follow a modified protocol called
strict two-phase locking -- a transaction must hold all
its exclusive locks till it commits/aborts.
Rigorous two-phase locking is even stricter. Here, all
locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they
commit.
Implementation of
Locking

A lock manager can be implemented as a separate


process to which transactions send lock and unlock
requests
The lock manager replies to a lock request by sending a
lock grant messages (or a message asking the transaction
to roll back, in case of a deadlock)
The requesting transaction waits until its request is
answered
The lock manager maintains a data-structure called a
lock table to record granted locks and pending requests
The lock table is usually implemented as an in-memory
hash table indexed on the name of the data item being
locked
Lock Table

Dark blue rectangles indicate granted locks; light


blue indicate waiting requests
Lock table also records the type of lock granted or
requested
New request is added to the end of the queue of
requests for the data item, and granted if it is
compatible with all earlier locks
Unlock requests result in the request being deleted,
and later requests are checked to see if they can now
be granted
If transaction aborts, all waiting or granted requests
of the transaction are deleted
lock manager may keep a list of locks held by
each transaction, to implement this efficiently
Deadlock Handling

System is deadlocked if there is a set of transactions such


that every transaction in the set is waiting for another
transaction in the set.
Deadlock prevention protocols ensure that the system will
never enter into a deadlock state. Some prevention
strategies :
Require that each transaction locks all its data items before it begins
execution (predeclaration).
Impose partial ordering of all data items and require that a transaction
can lock data items only in the order specified by the partial order.
More Deadlock
Prevention
Strategies
Following schemes use transaction timestamps for the sake of
deadlock prevention alone.
wait-die scheme — non-preemptive
older transaction may wait for younger one to release data item. (older
means smaller timestamp) Younger transactions never Younger transactions
never wait for older ones; they are rolled back instead.
a transaction may die several times before acquiring needed data item
wound-wait scheme — preemptive
older transaction wounds (forces rollback) of younger transaction instead of
waiting for it. Younger transactions may wait for older ones.
may be fewer rollbacks than wait-die scheme.
Deadlock
prevention (Cont.)

Both in wait-die and in wound-wait schemes, a rolled back


transactions is restarted with its original timestamp. Older
transactions thus have precedence over newer ones, and
starvation is hence avoided.
Timeout-Based Schemes:
a transaction waits for a lock only for a specified amount of time. If the
lock has not been granted within that time, the transaction is rolled back
and restarted,
Thus, deadlocks are not possible
simple to implement; but starvation is possible. Also difficult to
determine good value of the timeout interval.
Deadlock Detection

Deadlocks can be described as a wait-for graph, which


consists of a pair G = (V,E),
V is a set of vertices (all the transactions in the system)
E is a set of edges; each element is an ordered pair Ti →Tj.
If Ti → Tj is in E, then there is a directed edge from Ti to Tj,
implying that Ti is waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj,
then the edge Ti → Tj is inserted in the wait-for graph. This
edge is removed only when Tj is no longer holding a data
item needed by Ti.
The system is in a deadlock state if and only if the wait-for
Deadlock Detection (Cont.)

Wait-for graph without a cycle Wait-for graph with a cycle


Deadlock Recovery

When deadlock is detected :


Some transaction will have to rolled back (made a victim) to break
deadlock. Select that transaction as victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction
Total rollback: Abort the transaction and then restart it.
More effective to roll back transaction only as far as necessary to
break deadlock.
Starvation happens if same transaction is always chosen as victim. Include
the number of rollbacks in the cost factor to avoid starvation
Multiple Granularity

Allow data items to be of various sizes and define a


hierarchy of data granularities, where the small
granularities are nested within larger ones
Can be represented graphically as a tree.
When a transaction locks a node in the tree explicitly, it
implicitly locks all the node's descendents in the same
mode.
Granularity of locking (level in tree where locking is
done):
fine granularity (lower in tree): high concurrency, high locking overhead
coarse granularity (higher in tree): low locking overhead, low
Example of
Granularity
Hierarchy

The levels, starting from the coarsest (top) level are


database
area
file
record
Intention Lock
Modes

In addition to S and X lock modes, there are three


additional lock modes with multiple granularity:
intention-shared (IS): indicates explicit locking at a lower level of the tree
but only with shared locks.
intention-exclusive (IX): indicates explicit locking at a lower level with
exclusive or shared locks
shared and intention-exclusive (SIX): the subtree rooted by that node is
locked explicitly in shared mode and explicit locking is being done at a
lower level with exclusive-mode locks.
intention locks allow a higher level node to be locked in S
or X mode without having to check all descendent nodes.
Compatibility Matrix with Intention Lock
Modes
The compatibility matrix for all lock modes is:
Multiple Granularity
Locking Scheme
Transaction Ti can lock a node Q, using the following rules:
1. The lock compatibility matrix must be observed.
2. The root of the tree must be locked first, and may be locked in any mode.
3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is
currently locked by Ti in either IX or IS mode.
4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is
currently locked by Ti in either IX or SIX mode.
5. Ti can lock a node only if it has not previously unlocked any node (that is, Ti is
two-phase).
6. Ti can unlock a node Q only if none of the children of Q are currently locked
by Ti.
Observe that locks are acquired in root-to-leaf order, whereas
they are released in leaf-to-root order.
Lock granularity escalation: in case there are too many locks at
a particular level, switch to higher granularity S or X lock
Timestamp-Based
Protocols

Each transaction is issued a timestamp when it enters the


system. If an old transaction Ti has time-stamp TS(Ti), a new
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti)
<TS(Tj).
The protocol manages concurrent execution such that the time-
stamps determine the serializability order.
In order to assure such behavior, the protocol maintains for
each data Q two timestamp values:
W-timestamp(Q) is the largest time-stamp of any transaction that executed
write(Q) successfully.
R-timestamp(Q) is the largest time-stamp of any transaction that executed
read(Q) successfully.
Timestamp-Based Protocols (Cont.)

The timestamp ordering protocol ensures that any


conflicting read and write operations are executed in
timestamp order.
Suppose a transaction Ti issues a read(Q)
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q that
was already overwritten.
 Hence, the read operation is rejected, and Ti is rolled back.
2. If TS(Ti)  W-timestamp(Q), then the read operation is executed, and
R-timestamp(Q) is set to max(R-timestamp(Q), TS(Ti)).
Timestamp-Based
Protocols (Cont.)

Suppose that transaction Ti issues write(Q).


1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was
needed previously, and the system assumed that that value would never
be produced.
 Hence, the write operation is rejected, and Ti is rolled back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q.
 Hence, this write operation is rejected, and Ti is rolled back.
3. Otherwise, the write operation is executed, and W-timestamp(Q) is set
to TS(Ti).
Example Use of the Protocol

A partial schedule for several data items for transactions with


timestamps 1, 2, 3, 4, 5
Correctness of Timestamp-Ordering
Protocol
The timestamp-ordering protocol guarantees serializability
since all the arcs in the precedence graph are of the form:

Thus, there will be no cycles in the precedence graph


Timestamp protocol ensures freedom from deadlock as no
transaction ever waits.
But the schedule may not be cascade-free, and may not
even be recoverable.
Recoverability and
Cascade Freedom
Problem with timestamp-ordering protocol:
Suppose Ti aborts, but Tj has read a data item written by Ti
Then Tj must abort; if Tj had been allowed to commit earlier, the schedule is
not recoverable.
Further, any transaction that has read a data item written by Tj must abort
This can lead to cascading rollback --- that is, a chain of rollbacks
Solution 1:
A transaction is structured such that its writes are all performed at the end of
its processing
All writes of a transaction form an atomic action; no transaction may execute
while a transaction is being written
A transaction that aborts is restarted with a new timestamp
Solution 2: Limited form of locking: wait for data to be
committed before reading it
Solution 3: Use commit dependencies to ensure recoverability
Thomas’ Write Rule

Modified version of the timestamp-ordering protocol in


which obsolete write operations may be ignored under
certain circumstances.
When Ti attempts to write data item Q, if TS(Ti) < W-
timestamp(Q), then Ti is attempting to write an obsolete
value of {Q}.
Rather than rolling back Ti as the timestamp ordering protocol would
have done, this {write} operation can be ignored.
Otherwise this protocol is the same as the timestamp
ordering protocol.
Thomas' Write Rule allows greater potential concurrency.
Validation-Based
Protocol

Execution of transaction Ti is done in three phases.


1. Read and execution phase: Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti performs a ''validation
test''
to determine if local variables can be written without
violating
serializability.
3. Write phase: If Ti is validated, the updates are applied to
the
database; otherwise, Ti is rolled back.
The three phases of concurrently executing transactions can be
interleaved, but each transaction must go through the three
phases in that order.
Assume for simplicity that the validation and write phase occur together,
atomically and serially
I.e., only one transaction executes validation/write at a time.
Also called as optimistic concurrency control since transaction
executes fully in the hope that all will go well during validation
Validation-Based
Protocol (Cont.)

Each transaction Ti has 3 timestamps


Start(Ti) : the time when Ti started its execution
Validation(Ti): the time when Ti entered its validation phase
Finish(Ti) : the time when Ti finished its write phase
Serializability order is determined by timestamp given at
validation time; this is done to increase concurrency.
Thus, TS(Ti) is given the value of Validation(Ti).
This protocol is useful and gives greater degree of
concurrency if probability of conflicts is low.
because the serializability order is not pre-decided, and
relatively few transactions will have to be rolled back.
Validation Test for
Transaction Tj

If for all Ti with TS (Ti) < TS (Tj) either one of the


following condition holds:
finish(Ti) < start(Tj)
start(Tj) < finish(Ti) < validation(Tj) and the set of data items written by Ti
does not intersect with the set of data items read by Tj.
then validation succeeds and Tj can be committed.
Otherwise, validation fails and Tj is aborted.
Justification: Either the first condition is satisfied, and
there is no overlapped execution, or the second condition is
satisfied and
 the writes of Tj do not affect reads of Ti since they occur after Ti has
finished its reads.
 the writes of Ti do not affect reads of Tj since Tj does not read any
item written by Ti.
Schedule Produced
by Validation

Example of schedule produced using validation


Multiversion
Schemes

Multiversion schemes keep old versions of data item to


increase concurrency.
Multiversion Timestamp Ordering
Multiversion Two-Phase Locking
Each successful write results in the creation of a new
version of the data item written.
Use timestamps to label versions.
When a read(Q) operation is issued, select an appropriate
version of Q based on the timestamp of the transaction, and
return the value of the selected version.
reads never have to wait as an appropriate version is
Multiversion
Timestamp
Ordering

Each data item Q has a sequence of versions <Q1, Q2,....,


Qm>. Each version Qk contains three data fields:
Content -- the value of version Qk.
W-timestamp(Qk) -- timestamp of the transaction that created (wrote)
version Qk
R-timestamp(Qk) -- largest timestamp of a transaction that successfully
read version Qk
When a transaction Ti creates a new version Qk of Q, Qk's
W-timestamp and R-timestamp are initialized to TS(Ti).
R-timestamp of Qk is updated whenever a transaction Tj
reads Qk, and TS(Tj) > R-timestamp(Qk).
Multiversion
Timestamp
Ordering (Cont)
Suppose that transaction Ti issues a read(Q) or write(Q)
operation. Let Qk denote the version of Q whose write
timestamp is the largest write timestamp less than or equal to
TS(Ti).
1. If transaction Ti issues a read(Q), then the value returned is the content
of version Qk.
2. If transaction Ti issues a write(Q)
1. if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.
2. if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten
3. else a new version of Q is created.
Observe that
Reads always succeed
A write by Ti is rejected if some other transaction Tj that (in the serialization
order defined by the timestamp values) should read
Ti's write, has already read a version created by a transaction older than Ti.
Protocol guarantees serializability
Multiversion Two-
Phase Locking

Differentiates between read-only transactions and update


transactions
Update transactions acquire read and write locks, and hold all
locks up to the end of the transaction. That is, update
transactions follow rigorous two-phase locking.
Each successful write results in the creation of a new version of the data item
written.
Each version of a data item has a single timestamp whose value is obtained
from a counter ts-counter that is incremented during commit processing.
Read-only transactions are assigned a timestamp by reading the
current value of ts-counter before they start execution; they
follow the multiversion timestamp-ordering protocol for
performing reads.
Multiversion Two-
Phase Locking
(Cont.)
When an update transaction wants to read a data item:
it obtains a shared lock on it, and reads the latest version.
When it wants to write an item
it obtains X lock on; it then creates a new version of the item and sets this
version's timestamp to .
When update transaction Ti completes, commit processing
occurs:
Ti sets timestamp on the versions it has created to ts-counter + 1
Ti increments ts-counter by 1
Read-only transactions that start after Ti increments ts-counter
will see the values updated by Ti.
Read-only transactions that start before Ti increments the
ts-counter will see the value before the updates by Ti.
Only serializable schedules are produced.
MVCC:
Implementation
Issues

Creation of multiple versions increases storage overhead


Extra tuples
Extra space in each tuple for storing version information
Versions can, however, be garbage collected
E.g. if Q has two versions Q5 and Q9, and the oldest active transaction
has timestamp > 9, than Q5 will never be required again
Snapshot Isolation

Motivation: Decision support queries that read large amounts


of data have concurrency conflicts with OLTP transactions that
update a few rows
Poor performance results
Solution 1: Give logical “snapshot” of database state to read
only transactions, read-write transactions use normal locking
Multiversion 2-phase locking
Works well, but how does system know a transaction is read only?
Solution 2: Give snapshot of database state to every
transaction, updates alone use 2-phase locking to guard against
concurrent updates
Problem: variety of anomalies such as lost update can result
Partial solution: snapshot isolation level (next slide)
Proposed by Berenson et al, SIGMOD 1995
Variants implemented in many database systems
E.g. Oracle, PostgreSQL, SQL Server 2005
Snapshot Isolation

A transaction T1 executing with Snapshot Isolation T1 T2 T3


takes snapshot of committed data at start
always reads/modifies data in its own snapshot W(Y := 1)
updates of concurrent transactions are not Commit
visible to T1
writes of T1 complete when it commits Start
First-committer-wins rule: R(X) → 0
Commits only if no other concurrent
transaction has already written data that T1 R(Y)→ 1
intends to write. W(X:=2)
W(Z:=3)
Commit
R(Z) → 0
R(Y) → 1
W(X:=3)
Concurrent updates not visible
Own updates are visible Commit-Req
Not first-committer of X Abort
Serialization error, T2 is rolled back
Snapshot Read
 Concurrent updates invisible to snapshot read
Snapshot Write:
First Committer Wins

Variant: “First-updater-wins”
Check for concurrent updates when write occurs by locking item
But lock should be held till all concurrent transactions have finished
(Oracle uses this plus some extra features)
Differs only in when abort occurs, otherwise equivalent
Benefits of SI

Reading is never blocked,


and also doesn’t block other txns activities
Performance similar to Read Committed
Avoids the usual anomalies
No dirty read
No lost update
No non-repeatable read
Predicate based selects are repeatable (no phantoms)
Problems with SI
SI does not always give serializable executions
Serializable: among two concurrent txns, one sees the effects of the
other
In SI: neither sees the effects of the other
Result: Integrity constraints can be violated
Snapshot Isolation

E.g. of problem with SI


T1: x:=y
T2: y:= x
Initially x = 3 and y = 17
Serial execution: x = ??, y = ??
if both transactions start at the same time, with snapshot isolation:
x = ?? , y = ??
Called skew write
Skew also occurs with inserts
E.g:
Find max order number among all orders
Create a new order with order number = previous max + 1
Snapshot Isolation
Anomalies

SI breaks serializability when txns modify different items, each based on a previous state
of the item the other modified
Not very common in practice
E.g., the TPC-C benchmark runs correctly under SI
when txns conflict due to modifying different data, there is usually also a
shared item they both modify too (like a total quantity) so SI will abort one
of them
But does occur
Application developers should be careful about write skew
SI can also cause a read-only transaction anomaly, where read-only transaction may see
an inconsistent state even if updaters are serializable
We omit details
Using snapshots to verify primary/foreign key integrity can lead to inconsistency
Integrity constraint checking usually done outside of snapshot
SI In Oracle and
PostgreSQL

Warning: SI used when isolation level is set to serializable, by


Oracle, and PostgreSQL versions prior to 9.1
PostgreSQL’s implementation of SI (versions prior to 9.1) described in Section
[Link]
Oracle implements “first updater wins” rule (variant of “first committer wins”)
concurrent writer check is done at time of write, not at commit time
Allows transactions to be rolled back earlier
Oracle and PostgreSQL < 9.1 do not support true serializable execution
PostgreSQL 9.1 introduced new protocol called “Serializable Snapshot Isolation”
(SSI)
Which guarantees true serializabilty including handling predicate reads
(coming up)
SI In Oracle and
PostgreSQL

Can sidestep SI for specific queries by using select .. for update in


Oracle and PostgreSQL
E.g.,
1. select max(orderno) from orders for update
2. read value into local variable maxorder
3. insert into orders (maxorder+1, …)
Select for update (SFU) treats all data read by the query as if it were also updated,
preventing concurrent updates
Does not always ensure serializability since phantom phenomena can occur
(coming up)
In PostgreSQL versions < 9.1, SFU locks the data item, but
releases locks when the transaction completes, even if other
concurrent transactions are active
Not quite same as SFU in Oracle, which keeps locks until all
concurrent transactions have completed
Insert and Delete
Operations

If two-phase locking is used :


A delete operation may be performed only if the transaction deleting
the tuple has an exclusive lock on the tuple to be deleted.
A transaction that inserts a new tuple into the database is given an X-
mode lock on the tuple
Insertions and deletions can lead to the phantom
phenomenon.
A transaction that scans a relation
(e.g., find sum of balances of all accounts in Perryridge)
and a transaction that inserts a tuple in the relation
(e.g., insert a new account at Perryridge)
(conceptually) conflict in spite of not accessing any tuple in
common.
If only tuple locks are used, non-serializable schedules can result
E.g. the scan transaction does not see the new account, but reads
Insert and Delete
Operations (Cont.)

The transaction scanning the relation is reading information that


indicates what tuples the relation contains, while a transaction
inserting a tuple updates the same information.
The conflict should be detected, e.g. by locking the information.
One solution:
Associate a data item with the relation, to represent the information about what
tuples the relation contains.
Transactions scanning the relation acquire a shared lock in the data item,
Transactions inserting or deleting a tuple acquire an exclusive lock on the data
item. (Note: locks on the data item do not conflict with locks on individual tuples.)
Above protocol provides very low concurrency for
insertions/deletions.
Index locking protocols provide higher concurrency while
preventing the phantom phenomenon, by requiring locks
on certain index buckets.
Index Locking
Protocol

Index locking protocol:


Every relation must have at least one index.
A transaction can access tuples only after finding them through one or more
indices on the relation
A transaction Ti that performs a lookup must lock all the index leaf nodes that it
accesses, in S-mode
Even if the leaf node does not contain any tuple satisfying the index lookup
(e.g. for a range query, no tuple in a leaf is in the range)
A transaction Ti that inserts, updates or deletes a tuple ti in a relation r
must update all indices to r
must obtain exclusive locks on all index leaf nodes affected by the
insert/update/delete
The rules of the two-phase locking protocol must be observed
Guarantees that phantom phenomenon won’t occur
Next-Key Locking

Index-locking protocol to prevent phantoms required


locking entire leaf
Can result in poor concurrency if there are many inserts
Alternative: for an index lookup
Lock all values that satisfy index lookup (match lookup value, or fall in
lookup range)
Also lock next key value in index
Lock mode: S for lookups, X for insert/delete/update
Ensures that range queries will conflict with
inserts/deletes/updates
Regardless of which happens first, as long as both are concurrent
Concurrency in
Index Structures

Indices are unlike other database items in that their only job is
to help in accessing data.
Index-structures are typically accessed very often, much more
than other database items.
Treating index-structures like other database items, e.g. by 2-phase locking of
index nodes can lead to low concurrency.
There are several index concurrency protocols where locks on
internal nodes are released early, and not in a two-phase
fashion.
It is acceptable to have nonserializable concurrent access to an index as long as
the accuracy of the index is maintained.
In particular, the exact values read in an internal node of a
B+-tree are irrelevant so long as we land up in the correct leaf node.
Concurrency in
Index Structures
(Cont.)
Example of index concurrency protocol:
Use crabbing instead of two-phase locking on the nodes of the B+-tree, as follows. During
search/insertion/deletion:
First lock the root node in shared mode.
After locking all required children of a node in shared mode, release the lock on the
node.
During insertion/deletion, upgrade leaf node locks to exclusive mode.
When splitting or coalescing requires changes to a parent, lock the parent in exclusive
mode.
Above protocol can cause excessive deadlocks
Searches coming down the tree deadlock with updates going up the tree
Can abort and restart search, without affecting transaction
Better protocols are available; see Section 16.9 for one such protocol, the B-link tree protocol
Intuition: release lock on parent before acquiring lock on child
And deal with changes that may have happened between lock release and acquire
Weak Levels of
Consistency

Degree-two consistency: differs from two-phase locking in that


S-locks may be released at any time, and locks may be acquired
at any time
X-locks must be held till end of transaction
Serializability is not guaranteed, programmer must ensure that no erroneous
database state will occur]
Cursor stability:
For reads, each tuple is locked, read, and lock is immediately released
X-locks are held till end of transaction
Special case of degree-two consistency
Weak Levels of
Consistency in SQL
SQL allows non-serializable executions
Serializable: is the default
Repeatable read: allows only committed records to be read, and repeating a
read should return the same value (so read locks should be retained)
However, the phantom phenomenon need not be prevented
T1 may see some records inserted by T2, but may not see others
inserted by T2
Read committed: same as degree two consistency, but most systems
implement it as cursor-stability
Read uncommitted: allows even uncommitted data to be read
In many database systems, read committed is the default
consistency level
has to be explicitly changed to serializable when required
set isolation level serializable
Transactions across User Interaction

Many applications need transaction support across user interactions


Can’t use locking
Don’t want to reserve database connection per user
Application level concurrency control
Each tuple has a version number
Transaction notes version number when reading tuple
select [Link], [Link] into :A, :version
from r where acctId =23
When writing tuple, check that current version number is same as the version
when tuple was read
update r set [Link] = [Link] + :deposit
where acctId = 23 and [Link] = :version
Equivalent to optimistic concurrency control without validating read set
Used internally in Hibernate ORM system, and manually in many applications
Version numbering can also be used to support first committer wins check of snapshot
isolation
Unlike SI, reads are not guaranteed to be from a single snapshot
End of Module 16
Deadlocks

Consider the following two transactions:


T1: write (X) T2: write(Y)
write(Y) write(X)

Schedule with deadlock


Transaction in DBMS
In a Database Management System (DBMS), a transaction is a sequence of operations performed
as a single logical unit of work. These operations may involve reading, writing, updating, or
deleting data in the database. A transaction is considered complete only if all its operations are
successfully executed, Otherwise the transaction must be rolled back, ensuring the database
remains in a consistent state.

What does a Transaction mean in DBMS?


A transaction refers to a sequence of one or more operations (such as read, write, update, or delete)
performed on the database as a single logical unit of work. A transaction ensures that either all the
operations are successfully executed (committed) or none of them take effect (rolled back).
Transactions are designed to maintain the integrity, consistency and reliability of the database,
even in the case of system failures or concurrent access.

Transaction

All types of database access operation which are held between the beginning and end transaction
statements are considered as a single logical transaction. During the transaction the database is
inconsistent. Only once the database is committed the state is changed from one consistent state to
another.

Example:

Let’s consider an online banking application:

 Transaction: When a user performs a money transfer, several operations occur, such as:
o Reading the account balance of the sender.
o Writing the deducted amount from the sender’s account.
o Writing the added amount to the recipient’s account.
In a transaction, all these steps should either complete successfully or, if any error occurs, the
database should rollback to its previous state, ensuring no partial data is written to the system.

Facts about Database Transactions


 A transaction is a program unit whose execution may or may not change the contents of a
database.

 The transaction is executed as a single unit.

 If the database operations do not update the database but only retrieve data, this type of
transaction is called a read-only transaction.

 A successful transaction can change the database from one CONSISTENT STATE to
another.

 DBMS transactions must be atomic, consistent, isolated and durable.

 If the database were in an inconsistent state before a transaction, it would remain in the
inconsistent state after the transaction.

Operations of Transaction
A user can make different types of requests to access and modify the contents of a database. So,
we have different types of operations relating to a transaction. They are discussed as follows:

1) Read(X)

A read operation is used to read the value of a particular database element X and stores it in a
temporary buffer in the main memory for further actions such as displaying that value.

Example:

For a banking system, when a user checks their balance, a Read operation is performed on their
account balance:

SELECT balance FROM accounts WHERE account_id = 'A123';

This updates the balance of the user's account after withdrawal

2) Write(X)

A write operation is used to write the value to the database from the buffer in the main memory.
For a write operation to be performed, first a read operation is performed to bring its value in
buffer, and then some changes are made to it, e.g. some set of arithmetic operations are performed
on it according to the user's request, then to store the modified value back in the database, a write
operation is performed.

Example:

For the banking system, if a user withdraws money, a Write operation is performed after the
balance is updated:

UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A123';

This updates the balance of the user’s account after withdrawal.

3) Commit

This operation in transactions is used to maintain integrity in the database. Due to some failure of
power, hardware, or software, etc., a transaction might get interrupted before all its operations are
completed. This may cause ambiguity in the database, i.e. it might get inconsistent before and after
the transaction.

Example:

After a successful money transfer in a banking system, a Commit operation finalizes the
transaction:

COMMIT;

Once the transaction is committed, the changes to the database are permanent, and the
transaction is considered successful.

4) Rollback

If an error occurs, the Rollback operation undoes all the changes made by the transaction, reverting
the database to its last consistent state. In simple words, it can be said that a rollback operation
does undo the operations of transactions that were performed before its interruption to achieve a
safe state of the database and avoid any kind of ambiguity or inconsistency.

Example:

Suppose during the money transfer process, the system encounters an issue, like insufficient funds
in the sender’s account. In that case, the transaction is rolled back:

ROLLBACK;

This will undo all the operations performed so far and ensure that the database remains
consistent.
Transaction Schedules
When multiple transaction requests are made at the same time, we need to decide their order of
execution. Thus, a transaction schedule can be defined as a chronological order of execution of
multiple transactions.

There are broadly two types of transaction schedules discussed as follows:

i) Serial Schedule

In this kind of schedule, when multiple transactions are to be executed, they are executed serially,
i.e. at one time only one transaction is executed while others wait for the execution of the current
transaction to be completed. This ensures consistency in the database as transactions do not execute
simultaneously.

But, it increases the waiting time of the transactions in the queue, which in turn lowers the
throughput of the system, i.e. number of transactions executed per time. To improve the throughput
of the system, another kind of schedule are used which has some more strict rules which help the
database to remain consistent even when transactions execute simultaneously.

Example:

In a serial schedule, the first transaction completes before the second transaction starts:

1. Transaction 1: Read balance → Update balance → Commit

2. Transaction 2: Read balance → Update balance → Commit

ii) Non-Serial Schedule

To reduce the waiting time of transactions in the waiting queue and improve the system efficiency,
we use non-serial schedules which allow multiple transactions to start before a transaction is
completely executed. This may sometimes result in inconsistency and errors in database operation.

So, these errors are handled with specific algorithms to maintain the consistency of the database
and improve CPU throughput as well. Non-serial schedules are also sometimes referred to as
parallel schedules, as transactions execute in parallel in these kinds of schedules.

Example:

Transaction 1 and Transaction 2 can be executed in parallel:

 Transaction 1: Read balance → Update balance

 Transaction 2: Read balance → Update balance


Without proper isolation mechanisms, this may cause inconsistencies.

Serializable

Serializability in DBMS is the property of a non-serial schedule that determines whether it would
maintain the database consistency or not. The non-serial schedule which ensures that the database
would be consistent after the transactions are executed in the order determined by that schedule is
said to be Serializable Schedules.

The serial schedules always maintain database consistency as a transaction starts only when the
execution of the other transaction has been completed under it. Thus, serial schedules are always
serializable. A transaction is a series of operations, so various states occur in its completion
journey.

Properties of Transaction
 As transactions deal with accessing and modifying the contents of the database, they must
have some basic properties which help maintain the consistency and integrity of the
database before and after the transaction. Transactions follow 4 properties, namely,
Atomicity, Consistency, Isolation, and Durability.

 Generally, these are referred to as ACID properties of transactions in DBMS. ACID is


the acronym used for transaction properties. A brief description of each property of the
transaction is as follows.

ACID Properties

1) Atomicity

This property ensures that either all operations of a transaction are executed or it is aborted. In any
case, a transaction can never be completed partially. Each transaction is treated as a single unit
(like an atom).

Atomicity is achieved through commit and rollback operations, i.e. changes are made to the
database only if all operations related to a transaction are completed, and if it gets interrupted, any
changes made are rolled back using rollback operation to bring the database to its last saved state.
Example:

If a user is transferring money from one account to another, both the debit and credit operations
must be successful, or none should happen. If any step fails, the transaction will be rolled back
entirely.

2) Consistency

 This property of a transaction keeps the database consistent before and after a transaction
is completed.

 Execution of any transaction must ensure that after its execution, the database is either in
its prior stable state or a new stable state.

 In other words, the result of a transaction should be the transformation of a database from
one consistent state to another consistent state.

 Consistency, here means, that the changes made in the database are a result of logical
operations only which the user desired to perform and there is not any ambiguity.

Example:

If an account balance before the transaction is 1000, after a successful transaction of 200, the new
balance should be 800, ensuring the database stays in a consistent state.

3) Isolation

This property states that two transactions must not interfere with each other, i.e. if some data is
used by a transaction for its execution, then any other transaction can not concurrently access that
data until the first transaction has completed. It ensures that the integrity of the database is
maintained and we don't get any ambiguous values. Thus, any two transactions are isolated from
each other.

Example:

 Transaction 1: Withdraw $100 from account A.

 Transaction 2: Withdraw $200 from account A.

Both transactions should execute as if they are isolated from each other, and their changes should
not conflict.

iv) Durability

 This property ensures that the changes made to the database after a transaction is
completely executed, are durable.
 It indicates that permanent changes are made by the successful execution of a transaction.

 In the event of any system failures or crashes, the consistent state achieved after the
completion of a transaction remains intact. The recovery subsystem of DBMS is
responsible for enforcing this property.

Example:

After transferring money between two accounts, if the system crashes, the changes made by the
transaction should still be saved when the system restarts.
Transaction States in DBMS

Transaction in DBMS is a set of logically related operations executed as a single unit. These
logic are followed to perform modification on data while maintaining integrity and
consistency. Transactions are performed in a way that concurrent actions from di erent
users don’t malfunction the database. Transfer of money from one account to another in a
bank management system is the best example of Transaction.

A transaction goes through several states during its lifetime. These states indicate the
current status of the transaction and guide how it will proceed. They determine whether the
transaction will be successfully completed (committed) or stopped (aborted). These states
also use a transaction log to keep track of the process.

What is a Transaction State?

A transaction is a set of operations or tasks performed to complete a logical process, which


may or may not change the data in a database. To handle di erent situations, like system
failures, a transaction is divided into di erent states.

A transaction state refers to the current phase or condition of a transaction during its
execution in a database. It represents the progress of the transaction and determines
whether it will successfully complete (commit) or fail (abort).

A transaction involves two main operations:

1. Read Operation: Reads data from the database, stores it temporarily in memory
(bu er), and uses it as needed.

2. Write Operation: Updates the database with the changed data using the bu er.

From the start of executing instructions to the end, these operations are treated as a single
transaction. This ensures the database remains consistent and reliable throughout the
process.

Di erent Types of Transaction States in DBMS

Transaction States

These are di erent types of Transaction States :

1. Active State – This is the first stage of a transaction, when the transaction’s instructions
are being executed.
 It is the first stage of any transaction when it has begun to execute. The execution of
the transaction takes place in this state.

 Operations such as insertion, deletion, or updation are performed during this state.

 During this state, the data records are under manipulation and they are not saved to
the database, rather they remain somewhere in a bu er in the main memory.

2. Partially Committed –

 The transaction has finished its final operation, but the changes are still not saved to
the database.

 After completing all read and write operations, the modifications are initially stored
in main memory or a local bu er. If the changes are made permanent on the
DataBase then the state will change to “committed state” and in case of failure it
will go to the “failed state”.

3. Failed State –If any of the transaction-related operations cause an error during the
active or partially committed state, further execution of the transaction is stopped and it is
brought into a failed state. Here, the database recovery system makes sure that the
database is in a consistent state.

5. Aborted State- If a transaction reaches the failed state due to a failed check, the
database recovery system will attempt to restore it to a consistent state. If recovery is not
possible, the transaction is either rolled back or cancelled to ensure the database remains
consistent.

In the aborted state, the DBMS recovery system performs one of two actions:

 Kill the transaction: The system terminates the transaction to prevent it from
a ecting other operations.

 Restart the transaction: After making necessary adjustments, the system reverts
the transaction to an active state and attempts to continue its execution.

6. Commuted- This state of transaction is achieved when all the transaction-related


operations have been executed successfully along with the Commit operation, i.e. data is
saved into the database after the required manipulations in this state. This marks the
successful completion of a transaction.

7. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.

Example of Transaction States

Imagine a bank transaction where a user wants to transfer $500 from Account A to
Account B.

Transaction States:
1. Active State:
The transaction begins. It reads the balance of Account A and checks if it has
enough funds.

o Example: Read balance of Account A = $1000.

2. Partially Committed State:


The transaction performs all its operations but hasn’t yet saved (committed) the
changes to the database.

o Example: Deduct $500 from Account A’s balance ($1000 – $500 = $500) and
temporarily update Account B’s balance (add $500).

3. Committed State:
The transaction successfully completes, and the changes are saved permanently in
the database.

o Example: Account A’s new balance = $500; Account B’s new balance =
$1500. Changes are written to the database.

4. Failed State:
If something goes wrong during the transaction (e.g., power failure, system crash),
the transaction moves to this state.

o Example: System crashes after deducting $500 from Account A but before
adding it to Account B.

5. Aborted State:
The failed transaction is rolled back, and the database is restored to its original
state.

o Example: Account A’s balance is restored to $1000, and no changes are


made to Account B.

These states ensure that either the transaction completes successfully (committed) or the
database is restored to its original state (aborted), maintaining consistency and preventing
errors.
Serializability in DBMS

In this article, we are going to explain the serializability concept and how this concept affects the
DBMS deeply, we also understand the concept of serializability with some examples, and we
will finally conclude this topic with an example of the importance of serializability. The DBMS
form is the foundation of the most modern applications, and when we design the form properly,
it provides high-performance and relative storage solutions to our application.

What is a serializable schedule, and what is it used for?


If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to be
serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it yields
the same results as a serial timetable.

Non-serial Schedule

A schedule where the transactions are overlapping or switching places. As they are used to carry
out actual database operations, multiple transactions are running at once. It's possible that these
transactions are focusing on the same data set. Therefore, it is crucial that non-serial schedules
can be serialized in order for our database to be consistent both before and after the transactions
are executed.

Example:

Transaction-1 Transaction-2

R(a)

W(a)

R(b)

W(b)

R(b)

R(a)

W(b)

W(a)

We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and
they are both working on the same data, i.e., "a" and "b", interchangeably. Where "R"-Read,
"W"-Write
Serializability testing

We can utilize the Serialization Graph or Precedence Graph to examine a schedule's


serializability. A schedule's full transactions are organized into a Directed Graph, what a
serialization graph is.

Precedence Graph

It can be described as a Graph G(V, E) with vertices V = "V1, V2, V3,..., Vn" and directed edges
E = "E1, E2, E3,..., En". One of the two operations—READ or WRITE—performed by a certain
transaction is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is
either performing read or write before the transaction-Tj.

Types of Serializability

There are two ways to check whether any non-serial schedule is serializable.

Types of Serializability - Conflict & View

1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses on maintaining the
consistency of a database while ensuring that identical data items are executed in an order. In a
DBMS each transaction has a value and all the transactions, in the database rely on this
uniqueness. This uniqueness ensures that no two operations with the conflict value can occur
simultaneously.

For example lets consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However there are
restrictions for achieving conflict serializability in the database. Here are a few of them.

1. Different transactions should be used for the two procedures.

2. The identical data item should be present in both transactions.

3. Between the two operations, there should be at least one write operation.

Example
Three transactions—t1, t2, and t3—are active on a schedule "S" at once. Let's create a graph of
precedence.

Transaction - 1 (t1) Transaction - 2 (t2) Transaction - 3 (t3)

R(a)

R(b)

R(b)

W(b)

W(a)

W(a)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.
DAG of transactions

As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will finally
be executed.

Therefore, the serial schedule's equivalent order is: t1 --> t3 --> t2

Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting


serializable schedule, on the other hand, might or might not be serial. We employ the idea of
View Serializability to further examine its serial behavior.

2. View Serializability

View serializability is a kind of operation in a serializable in which each transaction should


provide some results, and these outcomes are the output of properly sequentially executing the
data item. The view serializability, in contrast to conflict serialized, is concerned with avoiding
database inconsistency. The view serializability feature of DBMS enables users to see databases
in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1 and
S2. The two transactions T1 and T2 should be used to establish these two schedules. Each
schedule must follow the three transactions in order to retain the equivalent of the transaction.
These three circumstances are listed below.

1. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both
schedules S1 and S2. The schedules are not equal to one another if one schedule commits
a transaction but it does not match the transaction of the other schedule.

2. The second requirement is that different read or write operations should not be used in
either schedule. On the other hand, we say that two schedules are not similar if schedule
S1 has two write operations whereas schedule S2 only has one. The number of the write
operation must be the same in both schedules, however there is no issue if the number of
the read operation is different.

3. The second to last requirement is that there should not be a conflict between either
timetable. execution order for a single data item. Assume, for instance, that schedule S1's
transaction is T1, and schedule S2's transaction is T2. The data item A is written by both
the transaction T1 and the transaction T2. The schedules are not equal in this instance.
However, we referred to the schedule as equivalent to one another if it had the same
number of all write operations in the data item.

What is view equivalency?


Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as equivalent:

1. The same piece of data must be read for the first time. For instance, if transaction t1 is
reading "A" from the database in schedule S1, then t1 must also read A in schedule S2.

2. The same piece of data must be used for the final write. As an illustration, if transaction
t1 updated A last in S1, it should also conduct final write in S2.
3. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and
t2 updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule's views are
equivalent.

Example
We have a schedule "S" with two concurrently running transactions, "t1" and "t2."

Schedule - S:

Transaction-1 (t1) Transaction-2 (t2)

R(a)

W(a)

R(a)
Transaction-1 (t1) Transaction-2 (t2)

W(a)

R(b)

W(b)

R(b)

W(b)

By switching between both transactions' mid-read-write operations, let's create its view
equivalent schedule (S').

Schedule - S':

Transaction-1 (t1) Transaction-2 (t2)

R(a)

W(a)

R(b)

W(b)

R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is conceivable.


Note: A conflict serializable schedule is always viewed as serializable, but vice versa is not
always true.
Advantages of Serializability

1. Execution is predictable: In serializable, the DBMS's threads are all performed


simultaneously. The DBMS doesn't include any such surprises. In DBMS, no data loss or
corruption occurs and all variables are updated as intended.
2. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.

3. Lower Costs: The cost of the hardware required for the efficient operation of the database
can be decreased with the aid of the serializable property. It may also lower the price of
developing the software.

4. Increased Performance: Since serializable executions provide developers the opportunity


to optimize their code for performance, they occasionally outperform non-serializable
equivalents.

For a DBMS transaction to be regarded as serializable, it must adhere to the ACID properties. In
DBMS, serializability comes in a variety of forms, each having advantages and disadvantages of
its own. Most of the time, choosing the best sort of serializability involves making a choice
between performance and correctness.
ACID Properties in DBMS

In the world of Database Management Systems (DBMS), transactions are fundamental


operations that allow us to modify and retrieve data. However, to ensure the integrity of a
database, it is important that these transactions are executed in a way that maintains
consistency, correctness, and reliability. This is where the ACID properties come into play.

ACID stands for Atomicity, Consistency, Isolation, and Durability. These four key
properties define how a transaction should be processed in a reliable and predictable
manner, ensuring that the database remains consistent, even in cases of failures or
concurrent accesses.

What Are Transactions in DBMS?

A transaction in DBMS refers to a sequence of operations performed as a single unit of


work. These operations may involve reading or writing data to the database. To maintain
data integrity, DBMS ensures that each transaction adheres to the ACID properties. Think
of a transaction like an ATM withdrawal. When we withdraw money from our account, the
transaction involves several steps:

 Checking your balance.

 Deducting the money from your account.

 Adding the money to the bank’s record.

For the transaction to be successful, all steps must be completed. If any part of this
process fails (e.g., if there’s a system crash), the entire transaction should fail, and no data
should be altered. This ensures the database remains in a consistent state.

The Four ACID Properties

1. Atomicity: “All or Nothing”


Atomicity ensures that a transaction is atomic, it means that either the entire transaction
completes fully or doesn’t execute at all. There is no in-between state i.e. transactions do
not occur partially. If a transaction has multiple operations, and one of them fails, the
whole transaction is rolled back, leaving the database unchanged. This avoids partial
updates that can lead to inconsistency.

 Commit: If the transaction is successful, the changes are permanently applied.

 Abort/Rollback: If the transaction fails, any changes made during the transaction
are discarded.

Example: Consider the following transaction T consisting of T1 and T2 : Transfer of $100


from account X to account Y .

Example

If the transaction fails after completion of T1 but before completion of T2 , the database
would be left in an inconsistent state. With Atomicity, if any part of the transaction fails, the
entire process is rolled back to its original state, and no partial changes are made.

2. Consistency: Maintaining Valid Data States

Consistency ensures that a database remains in a valid state before and after a
transaction. It guarantees that any transaction will take the database from one consistent
state to another, maintaining the rules and constraints defined for the data. In simple
terms, a transaction should only take the database from one valid state to another. If a
transaction violates any database rules or constraints, it should be rejected, ensuring that
only consistent data exists after the transaction.

Example: Suppose the sum of all balances in a bank system should always be constant.
Before a transfer, the total balance is $700. After the transaction, the total balance should
remain $700. If the transaction fails in the middle (like updating one account but not the
other), the system should maintain its consistency by rolling back the transaction
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .

3. Isolation: Ensuring Concurrent Transactions Don’t Interfere

This property ensures that multiple transactions can occur concurrently without leading
to the inconsistency of the database state. Transactions occur independently without
interference. Changes occurring in a particular transaction will not be visible to any other
transaction until that particular change in that transaction is written to memory or has been
committed.

This property ensures that when multiple transactions run at the same time, the result will
be the same as if they were run one after another in a specific order. This property prevents
issues such as dirty reads (reading uncommitted data), non-repeatable reads (data
changing between two reads in a transaction), and phantom reads (new rows appearing in
a result set after the transaction starts).

Example: Let’s consider two transactions:Consider two transactions T and T”.

 X = 500, Y = 500

Transaction T:

 T wants to transfer $50 from X to Y.


 T reads Y (value: 500), deducts $50 from X (new X = 450), and adds $50 to Y (new Y =
550).

Transaction T”:

 T” starts and reads X (value: 500) and Y (value: 500), then calculates the sum: 500 +
500 = 1000.

But, by the time T finishes, X and Y have changed to 450 and 550 respectively, so the
correct sum should be 450 + 550 = 1000. Isolation ensures that T” should not see the old
values of X and Y while T is still in progress. Both transactions should be independent, and
T” should only see the final state after T commits. This prevents inconsistent data like the
incorrect sum calculated by T”

4. Durability: Persisting Changes

This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even if a
system failure occurs. These updates now become permanent and are stored in non-
volatile memory. In the event of a failure, the DBMS can recover the database to the state
it was in after the last committed transaction, ensuring that no data is lost.

Example: After successfully transferring money from Account A to Account B, the changes
are stored on disk. Even if there is a crash immediately after the commit, the transfer
details will still be intact when the system recovers, ensuring durability.

How ACID Properties Impact DBMS Design and Operation

The ACID properties, in totality, provide a mechanism to ensure the correctness and
consistency of a database in a way such that each transaction is a group of operations that
acts as a single unit, produces consistent results, acts in isolation from other operations,
and updates that it makes are durably stored.

1. Data Integrity and Consistency

ACID properties safeguard the data integrity of a DBMS by ensuring that transactions
either complete successfully or leave no trace if interrupted. They prevent partial updates
from corrupting the data and ensure that the database transitions only between valid
states.

2. Concurrency Control

ACID properties provide a solid framework for managing concurrent transactions.


Isolation ensures that transactions do not interfere with each other, preventing data
anomalies such as lost updates, temporary inconsistency, and uncommitted data.

3. Recovery and Fault Tolerance


Durability ensures that even if a system crashes, the database can recover to a consistent
state. Thanks to the Atomicity and Durability properties, if a transaction fails midway, the
database remains in a consistent state.

Property Responsibility for maintaining properties

Atomicity Transaction Manager

Consistency Application programmer

Isolation Concurrency Control Manager

Durability Recovery

Advantages of ACID Properties in DBMS

1. Data Consistency: ACID properties ensure that the data remains consistent and
accurate after any transaction execution.

2. Data Integrity: It maintains the integrity of the data by ensuring that any changes to
the database are permanent and cannot be lost.

3. Concurrency Control: ACID properties help to manage multiple transactions


occurring concurrently by preventing interference between them.

4. Recovery: ACID properties ensure that in case of any failure or crash, the system
can recover the data up to the point of failure or crash.

Disadvantages of ACID Properties in DBMS

1. Performance Overhead: ACID properties can introduce performance costs,


especially when enforcing isolation between transactions or ensuring atomicity.

2. Complexity: Maintaining ACID properties in distributed systems (like microservices


or cloud environments) can be complex and may require sophisticated solutions
like distributed locking or transaction coordination.

3. Scalability Issues: ACID properties can pose scalability challenges, particularly in


systems with high transaction volumes, where traditional relational databases may
struggle under load.

ACID in the Real World: Where Is It Used?

In modern applications, ensuring the reliability and consistency of data is crucial. ACID
properties are fundamental in sectors like:
 Banking: Transactions involving money transfers, deposits, or withdrawals must
maintain strict consistency and durability to prevent errors and fraud.

 E-commerce: Ensuring that inventory counts, orders, and customer details are
handled correctly and consistently, even during high tra ic, requires ACID
compliance.

 Healthcare: Patient records, test results, and prescriptions must adhere to strict
consistency, integrity, and security standards.

You might also like