0% found this document useful (0 votes)
2 views19 pages

Understanding Transaction Processing in DBMS

This document provides an overview of transaction processing in databases, detailing the definition of transactions, their operations, states, and properties, particularly the ACID properties which ensure data integrity. It also discusses scheduling types, concurrency control techniques, and various locking protocols to manage simultaneous transactions without conflicts. Additionally, it explains the importance of serializability and conflict serializability in maintaining consistency during concurrent transaction execution.

Uploaded by

pghmsh48d8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

Understanding Transaction Processing in DBMS

This document provides an overview of transaction processing in databases, detailing the definition of transactions, their operations, states, and properties, particularly the ACID properties which ensure data integrity. It also discusses scheduling types, concurrency control techniques, and various locking protocols to manage simultaneous transactions without conflicts. Additionally, it explains the importance of serializability and conflict serializability in maintaining consistency during concurrent transaction execution.

Uploaded by

pghmsh48d8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 3

TRANSACTION PROCESSING

Transactions
A transaction is a program including a collection of database operations, executed as a logical unit of
data processing. The operations performed in a transaction include one or more of database operations
like insert, delete, update or retrieve data. It is an atomic process that is either performed into
completion entirely or is not performed at all. A transaction involving only data retrieval without any
data update is called read-only transaction.

Each high level operation can be divided into a number of low level tasks or operations. For example, a
data update operation can be divided into three tasks −

read_item() − reads data item from storage to main memory.

modify_item() − change value of item in the main memory.

write_item() − write the modified value from main memory to storage.

Transaction Operations

The low level operations performed in a transaction are −

 begin_transaction − A marker that specifies start of transaction execution.

 read_item or write_item − Database operations that may be interleaved with main memory
operations as a part of transaction.

 end_transaction − A marker that specifies end of transaction.

 commit − A signal to specify that the transaction has been successfully completed in its entirety
and will not be undone.

 rollback − A signal to specify that the transaction has been unsuccessful and so all temporary
changes in the database are undone. A committed transaction cannot be rolled back.

Transaction States

A transaction may go through a subset of five states, active, partially committed, committed, failed
and aborted.
Active − The initial state where the transaction enters is the active state. The transaction remains in
this state while it is executing read, write or other operations.

Partially Committed − The transaction enters this state after the last statement of the transaction
has been executed.

Committed − The transaction enters this state after successful completion of the transaction and
system checks have issued commit signal.

Failed − The transaction goes from partially committed state or active state to failed state when it is
discovered that normal execution can no longer proceed or system checks fail.

Aborted − This is the state after the transaction has been rolled back after failure and the database
has been restored to its state that was before the transaction began.

Transaction property
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.

Property of Transaction

Atomicity

Consistency
Isolation

Durability

ACID Properties

A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in
a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known
as ACID properties − in order to ensure accuracy, completeness, and data integrity.

Atomicity

All changes to data are performed as if they are a single operation. That is, all the changes are
performed, or none of them are.

For example, in an application that transfers funds from one account to another, the atomicity property
ensures that, if a debit is made successfully from one account, the corresponding credit is made to the
other account.

Consistency

Data is in a consistent state when a transaction starts and when it ends.

For example, in an application that transfers funds from one account to another, the consistency
property ensures that the total value of funds in both the accounts is the same at the start and end of
each transaction.

Isolation

The intermediate state of a transaction is invisible to other transactions. As a result, transactions that run
concurrently appear to be serialized.

For example, in an application that transfers funds from one account to another, the isolation property
ensures that another transaction sees the transferred funds in one account or the other, but not in both,
nor in neither.

Durability

After a transaction successfully completes, changes to data persist and are not undone, even in the
event of a system failure.

For example, in an application that transfers funds from one account to another, the durability property
ensures that the changes made to each account will not be reversed.

Schedule
A series of operation from one transaction to another transaction is known as schedule. It is used to
preserve the order of the operation in each of the individual transaction.
Types of Schedules
There are two types of schedules −

Serial Schedules − In a serial schedule, at any point of time, only one transaction is active, i.e. there
is no overlapping of transactions. This is depicted in the following graph −

Parallel Schedules − In parallel schedules, more than one transactions are active simultaneously, i.e.
the transactions contain operations that overlap at time. This is depicted in the following graph −

Non-serial Schedule

if interleaving of operations is allowed, then there will be non-serial schedule.

It contains many possible orders in which the system can execute the individual operations of the
transactions.
Types of Schedules based Recoverability in DBMS

1. Recoverable Schedule –

A schedule is said to be recoverable if it is recoverable as name suggest. Only reads are allowed before
write operation on same data. Only reads (Ti->Tj) is permissible.

Example –

S1: R1(x), W1(x), R2(x), R1(y), R2(y),

W2(x), W1(y), C1, C2;

Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no
chances of conflict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e.
completion of first transaction performed first update on data item x, hence given schedule is
recoverable.

Lets see example of unrecoverable schedule to clear the concept more:

S2: R1(x), R2(x), R1(z), R3(x), R3(y), W1(x),

W3(y), R2(y), W2(z), W2(y), C1, C2, C3;

Ti->Tj => C2->C3 but W3(y) executed before W2(y) which leads to conflicts thus it must be committed
before T2 transaction. So given schedule is unrecoverable. if Ti->Tj => C3->C2 is given in schedule then it
will become recoverable schedule.

Note: A committed transaction should never be rollback. It means that reading value from uncommitted
transaction and commit it will enter the current transaction into inconsistent or unrecoverable state this
is called Dirty Read problem.

2. Cascadeless Schedule –

When no read or write-write occurs before execution of transaction then corresponding schedule is
called cascadeless schedule.

Example –

S3: R1(x), R2(z), R3(x), R1(z), R2(y), R3(y), W1(x), C1,

W2(z), W3(y), W2(y), C3, C2;

In this schedule W3(y) and W2(y) overwrite conflicts and there is no read, therefore given schedule is
cascadeless schedule.

3. Strict Schedule –

if schedule contains no read or write before commit then it is known as strict schedule. Strict schedule is
strict in nature.

Example –

S4: R1(x), R2(x), R1(z), R3(x), R3(y),


W1(x), C1, W3(y), C3, R2(y), W2(z), W2(y), C2;

In this schedule no read-write or write-write conflict arises before commit hence its strict schedule.

4. Cascading Abort –

Cascading Abort can also be rollback. If transaction T1 abort as T2 read data that written by T1 which is
not committed. Hence it’s cascading rollback.

Example:
Serializable schedule
The serializability of schedules is used to find non-serial schedules that allow the transaction to execute
concurrently without interfering with one another.

It identifies which schedules are correct when executions of the transaction have interleaving of their
operations.

A non-serial schedule will be serializable if its result is equal to the result of its transactions executed
serially.

Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair G =
(V, E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used to
contain all the transactions participating in the schedule.

The set of edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:
 Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).

 Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).

 Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).

If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed before
the first instruction of Tj is executed.

If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence graph
has no cycle, then S is known as serializable.

Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges

Read(B): In T2, no subsequent writes to B, so no new edges

Read(C): In T3, no subsequent writes to C, so no new edges


Write(B): B is subsequently read by T3, so add edge T2 → T3

Write(C): C is subsequently read by T1, so add edge T3 → T1

Write(A): A is subsequently read by T2, so add edge T1 → T2

Write(A): In T2, no subsequent reads to A, so no new edges

Write(C): In T1, no subsequent reads to C, so no new edges

Write(B): In T3, no subsequent reads to B, so no new edges

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-serializable.

Conflict Serializable Schedule


A schedule is called conflict serializability if after swapping of non-conflicting operations, it can
transform into a serial schedule.

The schedule will be a conflict serializable if it is conflict equivalent to a serial schedule.

Conflicting Operations
The two operations become conflicting if all conditions satisfy:

1. Both belong to separate transactions.

2. They have the same data item.

3. They contain at least one write operation.

Example:

Swapping is possible only if S1 and S2 are logically equal.


Here, S1 = S2. That means it is non-conflict.

Here, S1 ≠ S2. That means it is conflict.

Conflict Equivalent

In the conflict equivalent, one can be transformed to another by swapping non-conflicting operations. In
the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2 by swapping non-conflicting
operations).

Two schedules are said to be conflict equivalent if and only if:

They contain the same set of the transaction.

If each pair of conflict operations are ordered in the same way.


Example:

Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before starting any
operation of T2. Schedule S1 can be transformed into a serial schedule by swapping non-conflicting
operations of S1.

After swapping of non-conflict operations, the schedule S1 becomes:

T1

Read(A)

Write(A)

Read(B)

Write(B)

T2

Read(A)

Write(A)

Read(B)

Write(B)

Since, S1 is conflict serializable.


Concurrency control
It is a procedure in DBMS which helps us for the management of two simultaneous
processes to execute without conflicts between each other, these conflicts occur in multi
user systems.

Concurrency can simply be said to be executing multiple transactions at a time. It is


required to increase time efficiency. If many transactions try to access the same data,
then inconsistency arises. Concurrency control required to maintain consistency data.

Advantages
The advantages of concurrency control are as follows −

 Waiting time will be decreased.

 Response time will decrease.

 Resource utilization will increase.

 System performance & Efficiency is increased.


The simultaneous execution of transactions over shared databases can create several data
integrity and consistency problems.

Main problems in using Concurrency


The problems which arise while using concurrency are as follows −

 Updates will be lost − One transaction does some changes and another transaction
deletes that change. One transaction nullifies the updates of another transaction.

 Uncommitted Dependency or dirty read problem − On variable has updated in one


transaction, at the same time another transaction has started and deleted the value of
the variable there the variable is not getting updated or committed that has been done
on the first transaction this gives us false values or the previous values of the variables
this is a major problem.

 Inconsistent retrievals − One transaction is updating multiple different variables,


another transaction is in a process to update those variables, and the problem occurs is
inconsistency of the same variable in different instances.
Concurrency control techniques
The concurrency control techniques are as follows −

Locking
Lock guaranties exclusive use of data items to a current transaction. It first accesses the
data items by acquiring a lock, after completion of the transaction it releases the lock.

Types of Locks

The types of locks are as follows −

 Shared Lock [Transaction can read only the data item values]

 Exclusive Lock [Used for both read and write data item values]

Time Stamping
Time stamp is a unique identifier created by DBMS that indicates relative starting time of a
transaction. Whatever transaction we are doing it stores the starting time of the transaction
and denotes a specific time.

This can be generated using a system clock or logical counter. This can be started whenever a
transaction is started. Here, the logical counter is incremented after a new timestamp has been
assigned.

Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.

o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the
transaction.

o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

There are four types of lock protocols available:


1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow
all the transactions to get the lock on the data before insert or delete or update on it. It will
unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol


o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.

o Before initiating an execution of the transaction, it requests DBMS for all the lock on all
those data items.

o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.

o If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.

3. Two-phase locking (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into three
parts.

o In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.

o In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.

There are two phases of 2PL:

o Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.

o Shrinking phase: In the shrinking phase, existing lock held by the transaction
may be released, but no new locks can be acquired.

4. Strict Two-phase locking (Strict-2PL)


o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.

o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.

o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at
a time.

o Strict-2PL protocol does not have shrinking phase of lock release.

Timestamp Ordering Protocol


o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.

o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.

o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as soon
as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009 times.
T1 has the higher priority, so it executes first as it is entered the system first.

o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read (X) operation:

o If W_TS(X) >TS(Ti) then the operation is rejected.

o If W_TS(X) <= TS(Ti) then the operation is executed.

o Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues a Write(X) operation:

o If TS(Ti) < R_TS(X) then the operation is rejected.

o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise
the operation is executed.

Advantages and Disadvantages of TO protocol:


o TO protocol ensures serializability since the precedence graph is as follows:

o TS protocol ensures freedom from deadlock that means no transaction ever


waits.

o But the schedule may not be recoverable and may not even be cascade- free.
Multiversion Concurrency Control Techniques

Other protocols for concurrency control keep the old values of a data item when the item is
updated. These are known as multiversion concurrency control, because several versions
(values) of an item are maintained. When a transaction requires access to an item,
an appropriate version is chosen to maintain the serializability of the currently executing
schedule, if possible.

The idea is that some read operations that would be rejected in other techniques can still be
accepted by reading an older version of the item to maintain serializability. When a transaction
writes an item, it writes a new version and the old version(s) of the item are retained. Some
multiver-sion concurrency control algorithms use the concept of view serializability rather than
conflict serializability.

1. Multiversion Technique Based on Timestamp Ordering

In this method, several versions X1, X2, ..., Xk of each data item X are maintained. For each
version, the value of version Xi and the following two timestamps are kept:

read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of transactions
that have successfully read version Xi.
write_TS(Xi). The write timestamp of Xi is the timestamp of the transaction that wrote the
value of version Xi.

2. Multiversion Two-Phase Locking Using Certify Locks

In this multiple-mode locking scheme, there are three locking modes for an item:
read, write, and certify, instead of just the two modes (read, write). Hence, the
state of LOCK(X) for an item X can be one of read-locked, write-locked,
certify-locked, or unlocked. In the standard locking scheme, with only read and
write locks ,a write lock is an exclusive lock. We can describe the relationship
between read and write locks in the standard scheme by means of the lock
compatibility table . An entry of Yes means that if a transaction T holds the type
of lock specified in the column header on item X and if transaction T requests the
type of lock specified in the row header on the same item X, then T can obtain the
lock because the locking modes are compatible. On the other hand, an entry
of No in the table indicates that the locks are not compatible, so T must
wait until T releases the lock.

You might also like