UNIT - 6
- Transaction Concept
- Transaction State
- Implementation of Atomicity and Durability
– Concurrent – Executions
– Serializability
- Recoverability
– Implementation of Isolation
– Testing for serializability
- Lock –Based Protocols
– Timestamp Based Protocols
- Validation- Based Protocols
– Multiple Granularity.
Transaction Concept
A transaction is a unit of program execution that accesses
and possibly updates various data items
Usually, a transaction is initiated by a user program written
in a data-manipulation language or programming language
(for example SQL, Java), where it is delimited by statements
(or function calls) of the form begin transaction and end
transaction
The database system has to maintain the following
properties of the transactions:
Atomicity: Either all operations of the transaction are
reflected properly in the database, or none are
Consistency: Execution of a transaction in isolation (that is,
with no other transaction executing concurrently) preserves
the consistency of the database
Isolation: Even though multiple transactions may execute
concurrently, the system guarantees that, for every pair of
transactions Ti and Tj , it appears to Ti that either Tj
finished execution before Ti started, or Tj started execution
after Ti finished. Thus, each transaction is unaware of other
transactions executing concurrently in the system
Durability: After a transaction completes successfully, the
changes it has made to the database persist, even if there are
system failures
These properties are often called the ACID properties
Consider a simplified banking system consisting of several
accounts and a set of transactions that access and update
those accounts
For the time being, we assume that the database
permanently resides on disk, but that some portion of it is
temporarily residing in main memory
Transactions access data using two operations:
read(X): which transfers the data item X from the database
to a local buffer belonging to the transaction that executed
the read operation
write(X): which transfers the data item X from the local
buffer of the transaction that executed the write back to the
database
Let Ti be a transaction that transfers $50 from account A to
account B
This transaction can be defined as
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
Let us now consider each of the ACID requirements:
Consistency:
The consistency requirement here is that the sum of A and
B be unchanged by the execution of the transaction
Ensuring consistency for an individual transaction is the
responsibility of the application programmer who codes the
transaction
This task may be facilitated by automatic testing of
integrity constraints
Atomicity:
Suppose that, just before the execution of transaction Ti the
values of accounts A and B are $1000 and $2000,
respectively
Suppose that, during the execution of transaction Ti, a
failure occurs that prevents Ti from completing its execution
successfully
Failures include power failures, hardware failures, and
software errors
Suppose that the failure happened after the write(A)
operation but before the write(B) operation
In this case, the values of accounts A and B reflected in the
database are $950 and $2000. The system destroyed $50 as
a result of this failure
In particular, we note that the sum A + B is no longer
preserved
Because of the failure, the state of the system changed to
inconsistent state
Ensuring atomicity is the responsibility of the database
system itself; specifically, it is handled by a component
called the transaction-management component
Durability:
The durability property guarantees that, once a transaction
completes successfully, all the updates that it carried out on
the database persist, even if there is a system failure after the
transaction completes execution
Ensuring durability is the responsibility of a component of
the database system called the recovery-management
component
Isolation:
If several transactions are executed concurrently, their
operations may interleave in some undesirable way,
resulting in an inconsistent state
A way to avoid the problem of concurrently executing
transactions is to execute transactions serially—that is, one
after the other
Ensuring the isolation property is the responsibility of a
component of the database system called the concurrency-
control component
Transaction State
In the absence of failures, all transactions complete
successfully
A transaction may not always complete its execution
successfully. Such a transaction is termed aborted
If we are to ensure the atomicity property, an aborted
transaction must have no effect on the state of the database
Once the changes caused by an aborted transaction have
been undone, we say that the transaction has been rolled
back
A transaction that completes its execution successfully is
said to be committed
Once a transaction has committed, we cannot undo its
effects by aborting it
A transaction must be in one of the following states:
Active, the initial state; the transaction stays in this state
while it is executing
Partially committed, after the final statement has been
executed
Failed, after the discovery that normal execution can no
longer proceed
Aborted, after the transaction has been rolled back and the
database has been restored to its state prior to the start of the
transaction
Committed, after successful completion
The state diagram corresponding to a transaction appears in
Figure 15.1
A transaction starts in the active state. When it finishes its
final statement, it enters the partially committed state
At this point, the transaction has completed its execution,
but it is still possible that it may have to be aborted, since
the actual output may still be temporarily residing in main
memory, and thus a hardware failure may preclude its
successful completion
After writing information to the disk, the transaction enters
the committed state
A transaction enters the failed state after the system
determines that the transaction can no longer proceed with
its normal execution
Such a transaction must be rolled back. Then, it enters the
aborted state
At this point, the system has two options:
[Link] can restart the transaction
2. It can kill the transaction
Implementation of Atomicity and Durability
The transaction-management component and recovery-
management component of a database system can support
atomicity and durability by a variety of schemes
shadow copy scheme: It is based on making copies of the
database, called shadow copies
The scheme also assumes that the database is simply a file
on disk
A pointer called db-pointer is maintained on disk; it points
to the current copy of the database
In the shadow-copy scheme, a transaction that wants to
update the database first creates a complete copy of the
database
All updates are done on the new database copy, leaving the
original copy, the shadow copy, untouched
If at any point the transaction has to be aborted, the system
merely deletes the new copy. The old copy of the database
has not been affected
If the transaction completes, it is committed as follows:
First, the operating system is asked to make sure that all
pages of the new copy of the database have been written out
to disk
After the operating system has written all the pages to disk,
the database system updates the pointer db-pointer to point
to the new copy of the database
The new copy then becomes the current copy of the
database
The old copy of the database is then deleted
Figure 15.2 depicts the scheme
The transaction is said to have been committed at the point
where the updated db-pointer is written to disk
Now consider how the technique handles transaction and
system failures:
Transaction failure: If the transaction fails at any time before
db-pointer is updated, the old contents of the database are
not affected
We can abort the transaction by just deleting the new copy
of the database
Once the transaction has been committed, all the updates
that it performed are in the database pointed to by db-pointer
System failure: Suppose that the system fails at any time
before the updated db-pointer is written to disk
When the system restarts, it will read db-pointer and will
thus see the original contents of the database, and none of
the effects of the transaction will be visible on the database
Next, suppose that the system fails after db-pointer has
been updated on disk
Before the pointer is updated, all updated pages of the new
copy of the database were written to disk
We assume that, once a file is written to disk, its contents
will not be damaged even if there is a system failure
Concurrent Executions
Transaction-processing systems usually allow multiple
transactions to run concurrently
Allowing multiple transactions to update data concurrently
causes several complications with consistency of the data
There are two good reasons for allowing concurrency:
Improved throughput and resource utilization:
A transaction consists of many steps. Some involve I/O
activity; others involve CPU activity
The CPU and the disks in a computer system can operate in
parallel. Therefore, I/O activity can be done in parallel with
processing at the CPU
All of this increases the throughput of the system-that is,
the number of transactions executed in a given amount of
time
Correspondingly, the processor and disk utilization also
increase
Reduced waiting time:
There may be a mix of transactions running on a system,
some short and some long
If transactions run serially, a short transaction may have to
wait for a preceding long transaction to complete
Which can lead to unpredictable delays in running a
transaction
If the transactions are operating on different parts of the
database, it is better to let them run concurrently, sharing
the CPU cycles and disk accesses among them
It also reduces the average response time: the average
time for a transaction to be completed after it has been
submitted
The database system must control the interaction among
the concurrent transactions to prevent them from destroying
the consistency of the database
Example
Transaction T1 transfers $50 from account A to account B.
It is defined as
T1: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
Transaction T2 transfers 10 percent of the balance from
account A to account B. It is defined as
T2: read(A);
temp := A * 0.1;
A := A − temp;
write(A);
read(B); B := B + temp; write(B)
Suppose the current values of accounts A and B are $1000
and $2000, respectively
The execution sequences just described are called
schedules. These schedules are serial
Suppose that the two transactions are executed
concurrently. One possible schedule appears in Figure 15.5.
In Schedules 1, 2 and 3, the sum A + B is preserved
The following concurrent schedule does not preserve the
value of (A + B)
Serializability
Basic Assumption – Each transaction preserves database
consistency
Thus serial execution of a set of transactions preserves
database consistency
A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule
Different forms of schedule equivalence give rise to the
notions of:
1. conflict serializability
2. view serializability
We ignore operations other than read and write
instructions
Our simplified schedules consist of only read and write
instructions
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed by
both li and lj, and at least one of these instructions write Q
1. li = read(Q), lj = read(Q). li and lj don’t conflict
2. li = read(Q), lj = write(Q). They conflict
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical)
temporal order between them
If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they
had been interchanged in the schedule
Conflict Serializability
If a schedule S can be transformed into a schedule S' by a
series of swaps of non-conflicting instructions, we say that
S and S' are conflict equivalent
We say that a schedule S is conflict serializable if it is
conflict equivalent to a serial schedule
Schedule 3 can be transformed into Schedule 5, by series
of swaps of non-conflicting instructions
Therefore Schedule 3 is conflict serializable
Schedule 3
Example of a schedule that is not conflict serializable:
We are unable to swap instructions in the above schedule to
obtain either the serial schedule < T3, T4 >, or the serial
schedule < T4, T3 >
View Serializability
Let S and S' be two schedules with the same set of
transactions. S and S' are view equivalent if the following
three conditions are met:
[Link] each data item Q, if transaction Ti reads the initial
value of Q in schedule S, then transaction Ti must, in
schedule S', also read the initial value of Q
[Link] each data item Q if transaction Ti executes read(Q)
in schedule S, and that value was produced by transaction Tj
(if any), then transaction Ti must in schedule S' also read the
value of Q that was produced by transaction Tj
[Link] each data item Q, the transaction (if any) that
performs the final write(Q) operation in schedule S must
perform the final write(Q) operation in schedule S'
As can be seen, view equivalence is also based purely on
reads and writes alone
A schedule S is view serializable it is view equivalent to a
serial schedule
Every conflict serializable schedule is also view
serializable
In our previous examples, schedule 1 (Fig 15.3) is not
view equivalent to schedule 2 (Fig 15.4) , since, in schedule
1, the value of account A read by transaction T2 was
produced by T1, whereas this case does not hold in schedule
2
However, schedule 1 is view equivalent to schedule 3 (Fig
15.5)
Recoverability
We now address the effect of transaction failures during
concurrent execution
If a transaction Ti fails, for whatever reason, we need to
undo the effect of this transaction to ensure the atomicity
property of the transaction
Any transaction Tj that is dependent on Ti (that is, Tj has
read data written by Ti) is also aborted
There are two types of schedules:
[Link] Schedules
[Link] Schedules
Recoverable Schedules
Consider schedule 11 in Figure 15.13
Suppose that the system allows T9 to commit immediately
after executing the read(A) instruction
Thus, T9 commits before T8 does
Now suppose that T8 fails before it commits
Since T9 has read the value of data item A written by T8, we
must abort T9 to ensure transaction atomicity
However, T9 has already committed and cannot be aborted
Thus, we have a situation where it is impossible to recover
correctly from the failure of T8
Schedule 11, with the commit happening immediately after
the read(A) instruction, is an example of a non recoverable
schedule
Database system require that all schedules be recoverable
A recoverable schedule is one where, for each pair of
transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears
before the commit operation of Tj
Cascading rollback
A single transaction failure leads to a series of transaction
rollbacks
Consider the following schedule where none of the
transactions has yet committed (so the schedule is
recoverable)
If T10 fails, T11 and T12 must also be rolled back
Cascadeless schedules
Cascading rollbacks cannot occur; for each pair of
transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti
appears before the read operation of Tj
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are
cascadeless
Implementation of Isolation
1. Different concurrency-control schemes
To ensure that database remains consistent when multiple
transactions are executed concurrently.
2. Example for concurrency-control scheme is locking.
A transaction acquires a lock on the entire db before it starts and
releases the lock after it has committed.
While a transaction holds a lock, no other transaction is allowed
to acquire the lock, and all must therefore wait for the lock to be
released.
This scheme leads to poor performance.
The goal of concurrency-control schemes is to provide a high degree
of concurrency without violating any acid property.
Testing for Serializability
How to determine a given schedule S is serializable or not.
Consider a schedule S.
Construct a directed graph, called a precedence graph, from S.
The graph consists of a pair G=(V,E), where V is a set of vertices
and E is s set of edges.
The set of vertices consists of all the transactions participating in the
schedule.
The set of edges consists of all edges Ti Tj for which one of three
conditions holds:
1. Ti executes write(Q) before Tj executes read(Q)
2. Ti executes read(Q) before Tj executes write(Q)
3. Ti executes write(Q) before Tj executes write(Q)
If an edge Ti Tj exists in the precedence graph, then,
in any serial schedule S1 equivalent to S, Ti must appear
before Tj.
Examples:
T1 T2 , since all the instructions of T1 are executed
before the first instruction of T2 is executed.
Similarly, T2 T1 , since all the instructions of T2 are
executed before the first instruction of T1 is executed.
If the precedence graph for S has a cycle, then schedule S
is not conflict serializable.
If the graph contains no cycles, then the schedule S is
conflict serializable.
A serializability order of the transactions can be obtained
through topological sorting, which determines a linear
order consistent with the partial order of the precedence
graph.
In general, several possible linear orders that can be
obtained through a topological sorting.
Concurrency control schemes
1. Lock-based protocols
2. Timestamp-based protocols
3. Validation-based protocols
4. Multiple granularity
Note: All the above schemes are based on the serializability
property.
Lock-Based Protocols
A lock is a mechanism to control concurrent access to a
data item
Data items can be locked in two modes :
1. exclusive (X) mode: Data item can be both read as
well as written. X-lock is requested using lock-X
instruction
2. shared (S) mode: Data item can only be read. S-lock is
requested using lock-S instruction
Lock requests are made to concurrency-control manager
Transaction can proceed only after request is granted
Lock-compatibility matrix
A transaction may be granted a lock on an item if the
requested lock is compatible with locks already held on the
item by other transactions
Any number of transactions can hold shared locks on an
item, but if any transaction holds an exclusive on the item
no other transaction may hold any lock on the item
If a lock cannot be granted, the requesting transaction is
made to wait till all incompatible locks held by other
transactions have been released. The lock is then granted
Let T1 transfers $50 from account B to account A.
T2 displays the total amount of money in accounts A and B.
Suppose the values of accounts A and B are $100 and $2000 respectively.
T1: lock-X(B)
read(B)
B:=B-50
write(B)
unlock(B)
lock-X(A)
read(A)
A:=A+50
write(A)
unlock(A)
T2: lock-S(A)
read(A)
unlock(A)
lock-S(B)
read(B)
unlock(B)
display(A+B)
T1 T2 Concurrency-control manager
lock-x(B)
grant-x(B,T1)
read(B)
B:=B-50
write(B)
unlock(B)
lock-s(A)
grant-s(A,T2)
read(A)
unlock(A)
lock-s(B)
grant-s(B,T2)
read(B)
unlock(B)
display(A+B)
lock-X(A)
grant-x(A,T2)
read(A)
A:=A+50
write(A)
unlock(A)
Delay in Unlock (T1 ,T2 are changed to T3 and T4 respectively).
T3: lock-X(B) T4: lock-S(A)
read(B) read(A)
B:=B-50 lock-S(B)
write(B) read(B)
lock-X(A) display(A+B)
read(A) unlock(B)
A:=A+50 unlock(A)
write(A)
unlock(B)
unlock(A)
A locking protocol is a set of rules followed by all
transactions while requesting and releasing locks. Locking
protocols restrict the set of possible schedules.
Pitfalls of Lock-Based Protocols
Consider the partial schedule
Neither T3 nor T4 can make progress - executing lock-S(B)
causes T4 to wait for T3 to release its lock on B, while
executing lock-X(A) causes T3 to wait for T4 to release its
lock on A
Such a situation is called a deadlock. To handle a deadlock
one of T3 or T4 must be rolled back
and its locks released
Starvation If a transaction never get a chance to execute then it is
said to be starved.
Example:
T2 has a shared-mode lock on a data item, and T1 requests an
exclusive-mode lock.
Meanwhile, a transaction T3 may request a shared mode lock on the
same data item. The lock request is compatible with the lock granted
to T2,so T3 may be granted the shared-mode lock.
At the this point T2 may release the lock, but T1 has to wait for T3
to finish.
Still if a transaction T4 request a shared-mode lock on the same data
item, and may be granted.
Granting of locks
We can avoid starvation of transactions by granting locks in
the following manner.
When a transaction Ti requests a lock on a data item Q in a
particular mode M, the concurrency-control manager grants
the lock provided that
1. There is no other transaction holding a lock on Q in a
mode that conflicts with M.
2. There is no other transaction that is waiting for a lock on
Q and that made its lock request before Ti.
The Two-Phase Locking Protocol: This protocol requires that
each transaction issue lock and unlock requests in two phases.
Phase 1: Growing Phase
Transaction may obtain locks, may not release any lock.
Phase 2: Shrinking Phase
Transaction may release locks, may not obtain any new locks.
Initially a transaction is in the growing phase.
The transaction acquires locks as needed.
Once the transaction releases a lock, it enters the shrinking phase,
and it can issue no more lock requests.
For example, T3 and T4 are two phase, but T1 and T2 are not two
phase.
We can show that the two-phase locking protocol ensures
conflict serializability.
Consider any transaction.
Lock points: The point in the schedule where the
transaction has obtained its final lock (the end if its
growing phase) is called the lock point of the transaction.
Now, transactions can be ordered according to their lock
points, this order is a serializability ordering for the
transactions.
Two-phase locking does not ensure freedom from deadlock.
Types:
1. Strict two-phase locking protocol: This protocol
requires not only that locking called be two phase, but
also that all exclusive-mode locks taken by a transaction
be held until that transaction commits.
It avoids cascading rollbacks.
2. Rigorous two-phase locking protocol: It requires that
all locks be held until the transaction commits.
Lock Conversions
Consider the following two transactions.
T8: read(a1); T9: read(a1);
read(a2); read(a2);
… display(a1+a2);
read(an);
write(a1);
Upgrade: conversion from shared to exclusive mode.
Downgrade: conversion from exclusive to shared mode.
Lock Conversions (contd)
T8 T9
lock-S(a1)
lock-S(a1)
lock-S(a2)
lock-S(a2)
lock-S(a3)
lock-S(a4)
unlock(a1)
unlock(a2)
lock-S(an)
upgrade(a1)
Upgrade: conversion from shared to exclusive mode.
Downgrade: conversion from exclusive to shared mode.
Automatic Acquisition of Locks: It is on the basis of read
and write requests from the transaction.
When a transaction Ti issues a read(Q) operation, the
system issues a lock-S(Q) instruction followed by the
read(Q) instruction.
When a transaction Ti issues a write(Q) operation, the
system checks to see whether Ti already holds a shared lock
on Q. If it does, then the system issues an upgrade(Q)
instruction, followed by the write(Q). Otherwise, the system
issues a lock-X(Q) instruction, followed by the write(Q)
instruction.
All locks obtained by a transaction are unlocked after that
transaction commits or aborts.
Implementation of Locking
A lock manager can be implemented as a separate process
to which transactions send lock and unlock requests
The lock manager replies to a lock request by sending a
lock grant messages (or a message asking the transaction to
roll back, in case of a deadlock)
The requesting transaction waits until its request is
answered
The lock manager maintains a data-structure called a lock
table to record granted locks and pending requests
The lock table is usually implemented as an in-memory
hash table indexed on the name of the data item being
locked
Lock Table
Black rectangles indicate granted locks, white ones indicate
waiting requests
Lock table also records the type of lock granted or
requested.
Timestamp-Based Protocols
Each transaction is issued a timestamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a new
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti)
<TS(Tj)
The protocol manages concurrent execution such that the
time-stamps determine the serializability order
In order to assure such behavior, the protocol maintains
for each data Q two timestamp values:
W-timestamp(Q) is the largest time-stamp of any
transaction that executed write(Q) successfully
R-timestamp(Q) is the largest time-stamp of any
transaction that executed read(Q) successfully
The timestamp ordering protocol ensures that any
conflicting read and write operations are executed in
timestamp order
Validation-Based Protocol
Execution of transaction Ti is done in three phases.
1. Read and execution phase: Transaction Ti writes only
to temporary local variables
2. Validation phase: Transaction Ti performs a "validation
test" to determine if local variables can be written without
violating serializability
3. Write phase: If Ti is validated, the updates are applied
to the database; otherwise, Ti is rolled back
Each transaction Ti has 3 timestamps
Start(Ti) : the time when Ti started its execution
Validation(Ti): the time when Ti entered its validation
phase
Finish(Ti) : the time when Ti finished its write phase
Validation Test for Transaction Tj
If for all Ti with TS (Ti) < TS (Tj) either one of the
following condition holds:
• finish(Ti) < start(Tj)
• start(Tj) < finish(Ti) < validation(Tj) and the set of
data items written by Ti does not intersect with the set
of data items read by Tj, then validation succeeds and
Tj can be committed. Otherwise, validation fails and Tj
is aborted
Multiple Granularity
Allow data items to be of various sizes and define a hierarchy of data
granularities, where the small granularities are nested within larger ones
Can be represented graphically as a tree
When a transaction locks a node in the tree explicitly, it implicitly locks
all the node's descendents in the same mode.
Granularity of locking (level in tree where locking is done):
– fine granularity (lower in tree): high concurrency, high locking
overhead
– coarse granularity (higher in tree): low locking overhead, low
concurrency
Example of Granularity Hierarchy
The levels, starting from the
coarsest (top) level are
– database
– area
– file
– record
Intention Lock Modes
In addition to S and X lock modes, there are three additional lock modes
with multiple granularity:
– intention-shared (IS): indicates explicit locking at a lower level of
the tree but only with shared locks.
– intention-exclusive (IX): indicates explicit locking at a lower level
with exclusive or shared locks
– shared and intention-exclusive (SIX): the subtree rooted by that
node is locked explicitly in shared mode and explicit locking is
being done at a lower level with exclusive-mode locks.
Each transaction can lock a node Q by following rules:
It must observe the lock-compatibility function.
It must lock the root of the tree first, and can lock it in any mode.
It can lock a node Q in S or IS mode only if it currently has the parent of Q locked
in either IS or SIX mode.
It can lock a node Q in X, SIX or IX mode only if it currently has the parent of Q
locked in either IX or SIX mode.
It can lock a node only if it has not previously unlocked any node.
It can unlock a node Q only if it currently has none of the children of Q locked.
The multiple granularity requires that locks be
acquired in top-down order, whereas locks must be
released in bottom-up order.
Suppose that transaction T18 reads record ra2 in file
Fa. Then, T18 needs to lock the database, area A1 and
Fa in IS mode, and finally to lock ra2 in S mode.
Intention Lock Modes
Compatibility Matrix with
Intention Lock Modes
IS IX S S IX X
IS
IX
S
S IX
X