GPREC Centre for Digital Education
Transaction Concepts
By
Sri. [Link]
Associate Professor
Emerging Technologies in Computer Science (ECS)
G. Pulla Reddy Engineering College (Autonomous):
Kurnool, Andhra Pradesh, India.
e-mail: [Link]@[Link]
[Link]@[Link]
Transaction Concept
In a Database Management System (DBMS), a
transaction is a sequence of one or more
operations performed as a single logical unit of
work.
Collections of operations that form a single logical
unit of work are called transactions.
The transaction consists of all operations executed
between the begin transaction and end
transaction.
A transaction must follow the ACID properties
(Atomicity, Consistency, Isolation, Durability) to
ensure database integrity.
Transaction Concept
ACID Properties:
Atomicity: Ensures that either all operations
of a transaction are completed or none of
them are.
Consistency: Ensures that the database
remains in a valid state before and after the
transaction.
Isolation: Ensures that transactions do not
interfere with each other.
Durability: Ensures that once a transaction is
committed, it remains permanent even if
there are system failures.
Transaction Concept
States of a Transaction:
Active: The transaction is in progress.
Partially Committed: All operations are
performed but not yet saved.
Committed: The transaction is successfully
saved to the database.
Failed: The transaction is aborted due to an
error.
Aborted: The changes made by the
transaction are rolled back.
Transaction Concept
Types of Transactions:
Read-Only Transactions: Transactions that only
retrieve data without modifying it.
Read-Write Transactions: Transactions that modify
the database.
Transaction Control Commands:
COMMIT: Saves all changes made by the transaction.
ROLLBACK: Reverts changes made by the transaction.
SAVEPOINT: Creates a checkpoint to roll back to a
specific point.
Transaction Concept
Concurrency Control:
Uses mechanisms like Locks, Timestamps,
and Multi-version Concurrency Control
(MVCC) to handle multiple transactions
occurring simultaneously.
Recovery Techniques:
Log-Based Recovery: Keeps a log of all
transactions to restore changes in case of
failure.
Shadow Paging: Uses shadow copies of pages
to ensure consistency.
Transaction Concept
ACID Properties with an Example
A transaction is a unit of program execution that
accesses and possibly updates various data
items.
Transactions access data using two operations:
read(X), which transfers the data item X from
the database to a local buffer belonging to
the transaction that executed the read
operation.
write(X), which transfers the data item X from
the local buffer of the transaction that
executed the write back to the database.
ACID Properties with an Example
Example: A transaction T1 to transfer $50
from account A to account B:
T1
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
ACID Properties with an Example
Atomicity requirement
If the transaction fails after step 3 and before
step 6, money will be “lost” leading to an
inconsistent database state
Failure could be due to software errors or
hardware failure or power failure
The system should ensure that updates of a
partially executed transaction are not
reflected in the database
If the atomicity property is present, all actions
of the transaction are reflected in the
database, or none are.
ACID Properties with an Example
Consistency requirement
The sum of A and B is unchanged by the execution of
the transaction.
A transaction must see a consistent database.
Consistency can be verified easily that, if the database
is consistent before an execution of the transaction,
the database remains consistent after the execution of
the transaction.
During transaction execution the database may be
temporarily inconsistent.
When the transaction completes successfully the
database must be consistent
Erroneous transaction logic can lead to
inconsistency
ACID Properties with an Example
Isolation requirement
If between steps 3 and 6, another transaction T2 is
allowed to access the partially updated database, it
will see an inconsistent database (the sum A + B will
be less than it should be).
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
ACID Properties with an Example
Isolation requirement
Isolation can be ensured trivially by running
transactions serially that is, one after the other.
However, executing multiple transactions
concurrently has significant benefits, as we will see
later.
Durability requirement
Once the user has been notified that the transaction
has completed (i.e., the transfer of the $50 has
taken place), the updates to the database by the
transaction must persist even if there are software
or hardware failures.
Transaction State
Active – the initial state; the transaction stays in
this state while it is executing
Partially committed – after the final statement
has been executed.
Failed – after the discovery that normal
execution can no longer proceed.
Aborted – after the transaction has been rolled
back and the database restored to its state prior
to the start of the transaction. Two options after
it has been aborted:
restart the transaction
kill the transaction
Transaction State
Committed – after successful completion. A
committed transaction that has performed
updates transforms the database into a new
consistent state, which must persist even if there
is a system failure.
Implementation of Atomicity and Durability
The recovery-management component of a
database system implements the support for
atomicity and durability.
Example: The shadow-database scheme:
• all updates are made on a shadow copy of the
database
• db_pointer is made to point to the updated
shadow copy after the transaction reaches
partial commit and all updated pages have
been flushed to disk.
Implementation of Atomicity and Durability
Implementation of Atomicity and Durability
db_pointer always points to the current
consistent copy of the database.
• In case transaction fails, old consistent copy
pointed to by db_pointer can be used, and the
shadow copy can be deleted.
The shadow-database scheme:
• Assumes that only one transaction is active at
a time.
• Assumes disks do not fail
• Useful for text editors, but extremely
inefficient for large databases
• Does not handle concurrent transactions
Concurrent Executions
Multiple transactions are allowed to run
concurrently in the system. Advantages are:
Improved throughput and resource utilization
Throughput – Number of transactions executed
in a given amount of time.
Resource Utilization – processor and disk
utilization increases.
E.g. one transaction can be using the CPU while
another is reading from or writing to the disk
Reduced waiting time for transactions: short
transactions need not wait behind long ones.
Concurrent Executions
Concurrency control schemes – mechanisms to
achieve isolation
• that is, to control the interaction among the
concurrent transactions in order to prevent them
from destroying the consistency of the database
Schedules
Schedule – a sequences of instructions that specify the
chronological order in which instructions of concurrent
transactions are executed
• A schedule for a set of transactions must consist of all
instructions of those transactions
• Must preserve the order in which the instructions appear in
each individual transaction.
• Thus, for a set of n transactions, there exist n! different valid
serial schedules.
A transaction that successfully completes its execution will have
a commit instructions as the last statement. By default
transaction assumed to execute commit instruction as its last
step.
A transaction that fails to successfully complete its execution will
have an abort instruction as the last statement
Schedules
Schedule 1 –
• Let T1 transfer $50 from A to B, and T2 transfer 10% of
the balance from A to B.
• A serial schedule in which T1 is followed by T2.
Schedules
Schedule 2 –
• A serial schedule where T2 is followed by T1
Schedules
Schedule 3 –
• Let T1 and T2 be the transactions defined previously.
The following schedule is not a serial schedule, but it is
equivalent to Schedule 1.
• In Schedules 1, 2 and 3, the sum A + B is preserved.
Schedules
Schedule 4 –
• The following concurrent schedule does not preserve
the value of (A + B ).
Serializability
Basic Assumption – Each transaction preserves database
consistency.
Thus serial execution of a set of transactions preserves database
consistency.
A (possibly concurrent) schedule is serializable if it is equivalent
to a serial schedule. Different forms of schedule equivalence
give rise to the notions of:
1. Conflict serializability
2. View serializability
Simplified view of transactions
• We ignore operations other than read and write instructions
• We assume that transactions may perform arbitrary
computations on data in local buffers in between reads and
writes.
• Our simplified schedules consist of only read and write
instructions.
Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict
if and only if there exists some item Q accessed by both li and lj,
and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
5. Ii = read(P) / write(P), Ij = read(Q) / write(Q). li and lj
don’t conflict
Intuitively, a conflict between li and lj forces a (logical) temporal
order between them.
If li and lj are consecutive in a schedule and they do not conflict,
their results would remain the same even if they had been
interchanged in the schedule.
Serializability
If a schedule S can be transformed into a schedule S´ by a series
of swaps of non-conflicting instructions, we say that S and S´ are
conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict
equivalent to a serial schedule
Schedule 3 can be transformed into Schedule 6, a serial schedule
where T2 follows T1, by series of swaps of non-conflicting
instructions.
Therefore Schedule 3 is conflict serializable.
Schedule 3 Schedule 6
Serializability
Example of a schedule that is not conflict serializable:
We are unable to swap instructions in the above schedule to
obtain either the serial schedule < T3, T4 >, or the serial schedule
< T4, T3 >.
Serializability
Let S and S´ be two schedules with the same set of transactions. S
and S´ are view equivalent if the following three conditions are
met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then
in schedule S’ also transaction Ti must read the initial value of
Q.
2. If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S’ also
transaction Ti must read the value of Q that was produced by
the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q)
operation in schedule S must also perform the final write(Q)
operation in schedule S’.
As can be seen, view equivalence is also based purely on reads and
writes alone.
Serializability
A schedule S is view serializable if it is view equivalent to a serial
schedule.
Schedule 1 and Schedule 3 are view equivalent and schedule 3 is
said to be view serializable
Every conflict serializable schedule is also view serializable.
Schedule 1 Schedule 3
Serializability
Below is a schedule which is view-serializable but not conflict
serializable.
What serial schedule is above equivalent to?
Every view serializable schedule that is not conflict serializable
has blind writes.
Testing for Serializability
When designing concurrency control schemes, we must show
that schedules generated by the scheme are serializable.
We now present a simple and efficient method for determining
conflict serializability of a schedule. Consider a schedule S. We
construct a directed graph, called a precedence graph, from S.
This graph consists of a pair G = (V, E), where V is a set of
vertices and E is a set of edges. The set of vertices consists of all
the transactions participating in the schedule.
If an edge Ti → Tj exists in the precedence graph, then, in any
serial schedule S equivalent to S’, Ti must appear before Tj . For
example, the precedence graph for schedule 1 contains the
single edge T1 → T2, since all the instructions of T1 are
executed before the first instruction of T2 is executed.
Testing for Serializability
The precedence graph for schedule 4. It contains the edge T1 →T2,
because T1 executes read(A) before T2 executes write(A). It also contains
the edge T2 → T1, because T2 executes read(B) before T1 executes
write(B).
Schedule 4
Testing for Serializability
If the precedence graph for S has a cycle, then schedule S is
not conflict serializable. If the graph contains no cycles,
then the schedule S is conflict serializable.
Thus, to test for conflict serializability, we need to
construct the precedence graph and to invoke a cycle-
detection algorithm.
Testing for Serializability
• Testing for view serializability is rather complicated.
• Thus, almost certainly there exists no efficient
algorithm to test for view serializability.
• However, concurrency-control schemes can still use
sufficient conditions for view serializability. That is, if
the sufficient conditions are satisfied, the schedule is
view serializable, but there may be view-serializable
schedules that do not satisfy the sufficient
conditions.
Recoverability
Recoverable Schedules:
Recoverable schedules — if a transaction Tj reads a data item
previously written by a transaction Ti , then the commit
operation of Ti appears before the commit operation of Tj.
The following schedule is not recoverable if T9 commits
immediately after the read.
If T8 should abort, T9 would have read (and possibly shown to
the user) an inconsistent database state. Hence, database must
ensure that schedules are recoverable.
Recoverability
Cascadeless Schedules:
Cascading rollback – a single transaction failure leads to a series
of transaction rollbacks. Consider the following schedule where
none of the transactions has yet committed (so the schedule is
recoverable)
If T10 fails, T11 and T12 must also be rolled back.
Can lead to the undoing of a significant amount of work
Recoverability
Cascadeless schedules — cascading rollbacks cannot occur; for
each pair of transactions Ti and Tj such that Tj reads a data item
previously written by Ti, the commit operation of Ti appears before
the read operation of Tj.
Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are
cascadeless.
Key Characteristics of Cascadeless Schedules:
• No Dirty Reads: Transactions only read committed data,
meaning they do not depend on uncommitted changes.
• Prevents Cascading Rollbacks: Since transactions do not rely
on uncommitted changes, failure of one transaction does not
force multiple rollbacks.
• Ensures Strict Recoverability: Cascadeless schedules help in
maintaining recoverability, ensuring that if a transaction
commits, all transactions dependent on it will have consistent
data.
Implementation of Isolation
Schedules must be conflict or view serializable, and
recoverable, for the sake of database consistency, and
preferably cascadeless.
As a trivial example of a concurrency-control scheme,
consider this scheme: A transaction acquires a lock on the
entire database before it starts and releases the lock after it
has committed.
A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency.
The goal of concurrency-control schemes is to provide a high
degree of concurrency, while ensuring that all schedules that
can be generated are conflict or view serializable, and are
cascadeless.
Implementation of Isolation
Concurrency-control schemes tradeoff between the
amount of concurrency they allow and the amount of
overhead that they incur.
Some schemes allow only conflict-serializable schedules
to be generated, while others allow view-serializable
schedules that are not conflict-serializable.
Lock-Based Protocols
Locks:
A lock is a mechanism to control concurrent access to a data
item
Data items can be locked in two modes :
1. Exclusive (X) mode. Data item can be both read as well
as written. X-lock is requested using lock-X instruction.
2. Shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
Lock requests are made to the concurrency-control manager
by the transaction. Transaction can proceed with the
operation only after the concurrency-control manager grants
the lock to the transaction.
A transaction can unlock a data item using unlock
instruction.
Lock-Based Protocols
Lock-compatibility matrix
A transaction may be granted a lock on an item if the requested
lock is compatible with locks already held on the item by other
transactions
Any number of transactions can hold shared locks on an item,
But if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to
wait till all incompatible locks held by other transactions have
been released. The lock is then granted.
Lock-Based Protocols
Note that shared mode is compatible with shared mode, but not
with exclusive mode. At any time, several shared-mode locks can
be held simultaneously (by different transactions) on a particular
data item. A subsequent exclusive-mode lock request has to wait
until the currently held shared-mode locks are released.
A transaction requests a shared lock on data item Q by executing
the lock-S(Q) instruction. Similarly, a transaction requests an
exclusive lock through the lock-X(Q) instruction. A transaction can
unlock a data item Q by the unlock(Q) instruction.
Transaction Ti may unlock a data item that it had locked at some
earlier point. Note that a transaction must hold a lock on a data
item as long as it accesses that item. Moreover, for a transaction
to unlock a data item immediately after its final access of that data
item is not always desirable, since serializability may not be
ensured.
Lock-Based Protocols
Lock-Based Protocols
• Let A and B be two accounts that are accessed by
transactions T1 and T2. Transaction T1 transfers $50 from
account B to account A (Figure 16.2). Transaction T2 displays
the total amount of money in accounts A and B—that is, the
sum A + B (Figure 16.3).
• Suppose that the values of accounts A and B are $100 and
$200, respectively. If these two transactions are executed
serially, either in the order T1, T2 or the order T2, T1, then
transaction T2 will display the value $300.
• If, however, these transactions are executed concurrently,
then schedule 1, in Figure 16.4 is possible. In this case,
transaction T2 displays $250, which is incorrect. The reason
for this mistake is that the transaction T1 unlocked data item
B too early, as a result of which T2 saw an inconsistent state.
Lock-Based Protocols
Lock-Based Protocols
Lock-Based Protocols
Suppose now that unlocking is delayed to the end of the
transaction. Transaction T3 corresponds to T1 with unlocking
delayed (Figure 16.5). Transaction T4 corresponds to T2 with
unlocking delayed (Figure 16.6).
You should verify that the sequence of reads and writes in
schedule 1, which lead to an incorrect total of $250 being
displayed, is no longer possible with T3 and T4.
Lock-Based Protocols
Consider the partial schedule of Figure 16.7 for T3 and T4. Since
T3 is holding an exclusive-mode lock on B and T4 is requesting a
shared-mode lock on B, T4 is waiting for T3 to unlock B. Similarly,
since T4 is holding a shared-mode lock on A and T3 is requesting
an exclusive-mode lock on A, T3 is waiting for T4 to unlock A.
Thus, we have arrived at a state where neither of these
transactions can ever proceed with its normal execution. This
situation is called deadlock.
When deadlock occurs, the system must roll back one of the two
transactions. Once a transaction has been rolled back, the data
items that were locked by that transaction are unlocked. These
data items are then available to the other transaction, which can
continue with its execution
Lock-Based Protocols
We shall require that each transaction in the system follow a set of
rules, called a locking protocol, indicating when a transaction may
lock and unlock each of the data items. Locking protocols restrict
the number of possible schedules. The set of all such schedules is
a proper subset of all possible serializable schedules. We shall
present several locking protocols that allow only conflict-
serializable schedules.
We say that a schedule S is legal under a given locking protocol if S
is a possible schedule for a set of transactions that follow the rules
of the locking protocol. We say that a locking protocol ensures
conflict serializability if and only if all legal schedules are conflict
serializable; in other words, for all legal schedules the associated
→ relation is acyclic.
Lock-Based Protocols
Granting of Locks
When a transaction requests a lock on a data item in a particular
mode, and no other transaction has a lock on the same data item
in a conflicting mode, the lock can be granted. However, care must
be taken to avoid the following scenario.
Suppose a transaction T2 has a shared-mode lock on a data item,
and another transaction T1 requests an exclusive-mode lock on
the data item. Clearly, T1 has to wait for T2 to release the shared-
mode lock. Meanwhile, a transaction T3 may request a shared-
mode lock on the same data item. The lock request is compatible
with the lock granted to T2, so T3 may be granted the shared-
mode lock. At this point T2 may release the lock, but still T1 has to
wait for T3 to finish. The transaction T1 may never make progress,
and is said to be starved.
Lock-Based Protocols
We can avoid starvation of transactions by granting locks
in the following manner: When a transaction Ti requests
a lock on a data item Q in a particular mode M, the
concurrency-control manager grants the lock provided
that
1. There is no other transaction holding a lock on Q in a
mode that conflicts with M.
2. There is no other transaction that is waiting for a
lock on Q, and that made its lock request before Ti.
Thus, a lock request will never get blocked by a lock
request that is made later.
Lock-Based Protocols
The Two-Phase Locking Protocol
One protocol that ensures serializability is the two-phase locking protocol.
This protocol requires that each transaction issue lock and unlock requests
in two phases:
1. Growing phase: A transaction may obtain locks, but may not release any
lock.
2. Shrinking phase: A transaction may release locks, but may not obtain any
new locks.
Initially, a transaction is in the growing phase. The transaction acquires locks
as needed. Once the transaction releases a lock, it enters the shrinking
phase, and it can issue no more lock requests. For example, transactions T3
and T4 are two phase. On the other hand, transactions T1 and T2 are not
two phase.
Two-phase locking does not ensure freedom from deadlock. Observe that
transactions T3 and T4 are two phase, but, in schedule 2 (Figure 16.7), they
are deadlocked.
Lock-Based Protocols
In addition to being serializable, schedules should be cascadeless.
Cascading rollback may occur under two-phase locking. As an
illustration, consider the partial schedule of Figure 16.8. Each
transaction observes the two-phase locking protocol, but the
failure of T5 after the read(A) step of T7 leads to cascading
rollback of T6 and T7.
Cascading rollbacks can be avoided by a modification of two-phase
locking called the strict two-phase locking protocol. This protocol
requires not only that locking be two phase, but also that all
exclusive-mode locks taken by a transaction be held until that
transaction commits. This requirement ensures that any data
written by an uncommitted transaction are locked in exclusive
mode until the transaction commits, preventing any other
transaction from reading the data.
Lock-Based Protocols
Lock-Based Protocols
Another variant of two-phase locking is the rigorous two-phase
locking protocol, which requires that all locks be held until the
transaction commits. With rigorous two-phase locking,
transactions can be serialized in the order in which they commit.
Most database systems implement either strict or rigorous two-
phase locking.
Consider the following two transactions, for which we have shown
only some of the significant read and write operations:
Lock-Based Protocols
If we employ the two-phase locking protocol, then T8 must lock a1
in exclusive mode. Therefore, any concurrent execution of both
transactions amounts to a serial execution. Notice, however, that
T8 needs an exclusive lock on a1 only at the end of its execution,
when it writes a1. Thus, if T8 could initially lock a1 in shared
mode, and then could later change the lock to exclusive mode, we
could get more concurrency, since T8 and T9 could access a1 and
a2 simultaneously. This observation leads us to a refinement of
the basic two-phase locking protocol, in which lock conversions
are allowed.
We denote conversion from shared to exclusive modes by
upgrade, and from exclusive to shared by downgrade. Lock
conversion cannot be allowed arbitrarily. Rather, upgrading can
take place in only the growing phase, whereas downgrading can
take place in only the shrinking phase.
Lock-Based Protocols
Returning to our example, transactions T8 and T9 can run
concurrently under the refined two-phase locking protocol, as
shown in the incomplete schedule of Figure 16.9, where only
some of the locking instructions are shown.
Lock-Based Protocols
Note that a transaction attempting to upgrade a lock on
an item Q may be forced to wait. This enforced wait
occurs if Q is currently locked by another transaction in
shared mode.
Just like the basic two-phase locking protocol, two-phase
locking with lock conversion generates only conflict-
serializable schedules, and transactions can be serialized
by their lock points. Further, if exclusive locks are held
until the end of the transaction, the schedules are
cascadeless.
Strict two-phase locking and rigorous two-phase locking
(with lock conversions) are used extensively in
commercial database systems.
Lock-Based Protocols
A simple but widely used scheme automatically generates the
appropriate lock and unlock instructions for a transaction, on
the basis of read and write requests from the transaction:
When a transaction Ti issues a read(Q) operation, the system
issues a lock-S(Q) instruction followed by the read(Q)
instruction.
When Ti issues a write(Q) operation, the system checks to see
whether Ti already holds a shared lock on Q. If it does, then the
system issues an upgrade(Q) instruction, followed by the
write(Q) instruction. Otherwise, the system issues a lock-X(Q)
instruction, followed by the write(Q) instruction.
All locks obtained by a transaction are unlocked after that
transaction commits or aborts.
Lock-Based Protocols
Implementation of Locking:
A lock manager can be implemented as a separate process to
which transactions send lock and unlock requests.
The lock manager replies to a lock request by sending a lock grant
messages (or a message asking the transaction to roll back, in case
of a deadlock).
The requesting transaction waits until its request is answered
The lock manager maintains a data-structure called a lock table to
record granted locks and pending requests.
The lock table is usually implemented as an in-memory hash table
indexed on the name of the data item being locked.
This algorithm guarantees freedom from starvation for lock
requests, since a request can never be granted while a request
received earlier is waiting to be granted.
Lock-Based Protocols
Dark blue rectangles indicate granted
locks; light blue indicate waiting
requests
Lock table also records the type of lock
granted or requested
New request is added to the end of the
queue of requests for the data item, and
granted if it is compatible with all earlier
locks
Unlock requests result in the request
being deleted, and later requests are
checked to see if they can now be
granted
If transaction aborts, all waiting or
granted requests of the transaction are
deleted
lock manager may keep a list of
Fig. Lock table locks held by each transaction, to
implement this efficiently
Deadlock Handling
System is deadlocked if there is a set of transactions such that every
transaction in the set is waiting for another transaction in the set.
The only remedy to this undesirable situation is for the system to
invoke some drastic action, such as rolling back some of the
transactions involved in the deadlock. Rollback of a transaction may
be partial: That is, a transaction may be rolled back to the point
where it obtained a lock whose release resolves the deadlock.
There are two principal methods for dealing with the deadlock
problem. We can use a deadlock prevention protocol to ensure that
the system will never enter a deadlock state. Alternatively, we can
allow the system to enter a deadlock state, and then try to recover
by using a deadlock detection and deadlock recovery scheme.
Prevention is commonly used if the probability that the system
would enter a deadlock state is relatively high; otherwise, detection
and recovery are more efficient.
Deadlock Handling
Deadlock Prevention:
There are two approaches to deadlock prevention. One approach is
Advance locking of data items in random order or ordered way i.e.,
ascending or descending order. It ensures that no cyclic waits can
occur by ordering the requests for locks, or requiring all locks to be
acquired together.
The simplest scheme under the first approach requires that each
transaction locks all its data items before it begins execution.
Moreover, either all are locked in one step or none are locked. There
are two main disadvantages to this protocol:
(1) it is often hard to predict, before the transaction begins,
what data items need to be locked;
(2) data-item utilization may be very low, since many of the
data items may be locked but unused for a long time.
Deadlock Handling
The second approach for preventing deadlocks is to use preemption
and transaction rollbacks. In preemption, when a transaction T2
requests a lock that transaction T1 holds, the lock granted to T1 may
be preempted by rolling back of T1, and granting of the lock to T2.
To control the preemption, we assign a unique timestamp to each
transaction. The system uses these timestamps only to decide
whether a transaction should wait or roll back.
Two different deadlock prevention schemes using timestamps have
been proposed:
1. The wait–die scheme is a non-preemptive technique. When
transaction Ti requests a data item currently held by Tj , Ti is allowed
to wait only if it has a timestamp smaller than that of Tj (that is, Ti is
older than Tj ). Otherwise, Ti is rolled back (dies).
Deadlock Handling
For example, suppose that transactions T22, T23, and T24 have
timestamps 5, 10, and 15, respectively. If T22 requests a data item held by
T23, then T22 (wait) will wait. If T24 requests a data item held by T23, then
T24 (dies) will be rolled back.
2. The wound–wait scheme is a preemptive technique. It is a
counterpart to the wait–die scheme. When transaction Ti requests a data
item currently held by Tj , Ti is allowed to wait only if it has a timestamp
larger than that of Tj (that is, Ti is younger than Tj ). Otherwise, Tj is rolled
back (Tj is wounded by Ti). Returning to our example, with transactions
T22, T23, and T24, if T22 requests a data item held by T23, then the data
item will be preempted from T23 (rollback-wounded), and T23 will be
rolled back. If T24 requests a data item held by T23, then T24 will wait.
Whenever the system rolls back transactions, it is important to ensure that
there is no starvation—that is, no transaction gets rolled back repeatedly
and is never allowed to make progress.
Both the wound–wait and the wait–die schemes avoid starvation
Deadlock Handling
There are, however, significant differences in the way that the two
schemes operate.
In the wait–die scheme, an older transaction must wait for a
younger one to release its data item. Thus, the older the transaction
gets, the more it tends to wait. By contrast, in the wound–wait
scheme, an older transaction never waits for a younger transaction.
In the wait–die scheme, if a transaction Ti dies and is rolled back
because it requested a data item held by transaction Tj, then Ti may
reissue the same sequence of requests when it is restarted. If the data
item is still held by Tj , then Ti will die again. Thus, Ti may die several
times before acquiring the needed data item. Contrast this series of
events with what happens in the wound–wait scheme. Transaction Ti is
wounded and rolled back because Tj requested a data item that it
holds. When Ti is restarted and requests the data item now being held
by Tj , Ti waits. Thus, there may be fewer rollbacks in the wound–wait
scheme.
Deadlock Handling
Timeout-Based Schemes:
A transaction waits for a lock only for a specified amount
of time. If the lock has not been granted within that time,
the transaction is rolled back and restarted,
Thus, deadlocks are not possible
Simple to implement; but starvation is possible. Also
difficult to determine good value of the timeout interval.
Deadlock Handling
Deadlock Detection:
Deadlocks can be described precisely in terms of a directed graph called a
wait-for graph. This graph consists of a pair G = (V, E), where V is a set of
vertices and E is a set of edges. The set of vertices consists of all the
transactions in the system. Each element in the set E of edges is an
ordered pair Ti → Tj.
If Ti → Tj is in E, then there is a directed edge from transaction Ti to Tj ,
implying that transaction Ti is waiting for transaction Tj to release a data
item that it needs.
When transaction Ti requests a data item currently being held by
transaction Tj , then the edge Ti → Tj is inserted in the wait-for graph. A
deadlock exists in the system if and only if the wait-for graph contains a
cycle.
To detect deadlocks, the system needs to maintain the wait-for graph, and
periodically to invoke an algorithm that searches for a cycle in the graph.
Deadlock Handling
Since the graph has no cycle, the system is not in a deadlock state.
This time, the graph contains the cycle T26 → T28 →T27 →T26
implying that transactions T26, T27, and T28 are all deadlocked.
Deadlock Handling
Deadlock Recovery:
When a detection algorithm determines that a deadlock exists, the system
must recover from the deadlock. The most common solution is to roll back
one or more transactions to break the deadlock. Three actions need to be
taken:
1. Selection of a victim. Given a set of deadlocked transactions, we must
determine which transaction (or transactions) to roll back to break the
deadlock. We should roll back those transactions that will incur the
minimum cost.
2. Rollback. Once we have decided that a particular transaction must be
rolled back, we must determine how far this transaction should be rolled
back. The simplest solution is a total rollback: Abort the transaction and
then restart it. However, it is more effective to roll back the transaction
only as far as necessary to break the deadlock. Such partial rollback
requires the system to maintain additional information about the state of
all the running transactions.
Deadlock Handling
3. Starvation. In a system where the selection of victims is
based primarily on cost factors, it may happen that the
same transaction is always picked as a victim. As a result,
this transaction never completes its designated task, thus
there is starvation. We must ensure that transaction can
be picked as a victim only a (small) finite number of times.
The most common solution is to include the number of
rollbacks in the cost factor.
Timestamp-Based Protocols
Another method for determining the serializability order is to select an
ordering among transactions in advance. The most common method for
doing so is to use a timestamp-ordering scheme.
Timestamps
If a transaction Ti has been assigned timestamp TS(Ti), and a new
transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple
methods for implementing this scheme:
1. Use the value of the system clock as the timestamp; that is, a
transaction’s timestamp is equal to the value of the clock when the
transaction enters the system.
2. Use a logical counter that is incremented after a new timestamp
has been assigned; that is, a transaction’s timestamp is equal to the value
of the counter when the transaction enters the system.
The timestamps of the transactions determine the serializability order.
Thus, if TS(Ti) < TS(Tj ), then the system must ensure that the produced
schedule is equivalent to a serial schedule in which transaction Ti appears
before transaction Tj .
Timestamp-Based Protocols
To implement this scheme, we associate with each data
item Q two timestamp values:
W-timestamp(Q) denotes the largest timestamp of any
transaction that executed write(Q) successfully.
R-timestamp(Q) denotes the largest timestamp of any
transaction that executed read(Q) successfully.
These timestamps are updated whenever a new read(Q)
or write(Q) instruction is executed.
Timestamp-Based Protocols
The Timestamp-Ordering Protocol:
The timestamp-ordering protocol ensures that any conflicting read and
write operations are executed in timestamp order. This protocol
operates as follows:
1. Suppose that transaction Ti issues read(Q).
a. If TS(Ti) < W-timestamp(Q), then Ti needs to read a
value of Q that was already overwritten. Hence, the read operation is
rejected, and Ti is rolled back.
b. If TS(Ti) ≥ W-timestamp(Q), then the read operation is
executed, and sets R-timestamp(Q) as TS(Ti).
2. Suppose that transaction Ti issues write(Q).
a. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is
producing was needed previously, and the system assumed that the
value would never be produced. Hence, the system rejects the write
operation and rolls Ti back.
Timestamp-Based Protocols
b. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an
obsolete value of Q. Hence, the system rejects this write operation
and rolls Ti back.
c. Otherwise, the system executes the write operation and sets
W-timestamp(Q) as TS(Ti).
The timestamp-ordering protocol ensures conflict serializability. This
is because conflicting operations are processed in timestamp order.
The protocol ensures freedom from deadlock, since no transaction
ever waits.
However, there is a possibility of starvation of long transactions if a
sequence of conflicting short transactions causes repeated
restarting of the long transaction. If a transaction is found to be
getting restarted repeatedly, conflicting transactions need to be
temporarily blocked to enable the transaction to finish.
Timestamp-Based Protocols
Thus, in schedule 3 of Figure 16.13, TS(T14) < TS(T15), and the
schedule is possible under the timestamp protocol.
Timestamp-Based Protocols
Thomas Write rule:
We now present a modification to the timestamp-ordering protocol that
allows greater potential concurrency than timestamp-ordering protocol.
Let us consider schedule 4 of Figure 16.14, and apply the timestamp-
ordering protocol. Since T16 starts before T17, we shall assume that
TS(T16) < TS(T17). The read(Q) operation of T16 succeeds, as does the
write(Q) operation of [Link] T16 attempts its write(Q) operation, we
find that TS(T16) < W-timestamp(Q), since W-timestamp(Q) = TS(T17).
Thus, the write(Q) by T16 is rejected and transaction T16 must be rolled
back.
Timestamp-Based Protocols
Although the rollback of T16 is required by the timestamp-ordering protocol, it is
unnecessary. Since T17 has already written Q, the value that T16 is attempting to
write is one that will never need to be read.
This observation leads to a modified version of the timestamp-ordering protocol
in which obsolete write operations can be ignored under certain circumstances.
The protocol rules for read operations remain unchanged. The protocol rules for
write operations, however, are slightly different from the timestamp-ordering
protocol
The modification to the timestamp-ordering protocol, called Thomas’ write rule,
is this: Suppose that transaction Ti issues write(Q).
1. If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is producing was
previously needed, and it had been assumed that the value would never be
produced. Hence, the system rejects the write operation and rolls Ti back.
2. If TS(Ti) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q. Hence, this write operation can be ignored.
3. Otherwise, the system executes the write operation and sets W-
timestamp(Q) as TS(Ti).
Timestamp-Based Protocols
Thomas’ write rule makes use of view serializability by, in
effect, deleting obsolete write operations from the
transactions that issue them.
For example, schedule 4 of Figure 16.14 is not conflict
serializable and, thus, is not possible under any of two-
phase locking, the tree protocol, or the timestamp-
ordering protocol. Under Thomas’ write rule, the write(Q)
operation of T16 would be ignored. The result is a schedule
that is view equivalent to the serial schedule <T16, T17>.
Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to some internal error
condition such as bad input, data not found, overflow or resource limit
exceeded
System errors: the database system must terminate an active
transaction due to an undesirable state. (e.g., deadlock)
System crash: a power failure or other hardware or software failure causes
the system to crash.
Fail-stop assumption: non-volatile storage contents are assumed to
not be corrupted as result of a system crash
Database systems have numerous integrity checks to prevent
corruption of disk data
Disk failure: a head crash or similar disk failure destroys all or part of disk
storage. Copies of the data on other disks, or archival backups on tertiary
media, such as tapes, are used to recover from the failure.
Storage Structure
STORAGE TYPES
Volatile storage:
does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
survives system crashes
examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
but may still fail, loss of information
Stable storage:
a mythical form of storage that survives all failures
approximated by maintaining multiple copies on distinct nonvolatile
media
Information residing in stable storage is never lost
Storage Structure
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
copies can be at remote sites to protect against disasters such as fire or
flooding – remote backup.
copies are present at the same location in case of mirrored disks
Failure during data transfer can still result in inconsistent copies.
Block transfer can result in
Successful completion
Partial failure: destination block has incorrect information
Total failure: destination block was never updated
Protecting storage media from failure during data transfer (one solution):
Execute output operation as follows (assuming two copies of each block):
Write the information onto the first physical block.
When the first write successfully completes, write the same information
onto the second physical block.
The output is completed only after the second write successfully
completes.
Storage Structure
Protecting storage media from failure during data transfer (cont.):
Copies of a block may differ due to failure during output operation. To
recover from failure:
First find inconsistent blocks:
Expensive solution: Compare the two copies of every disk block.
Better solution:
Record in-progress disk writes on non-volatile storage (Non-
volatile RAM or special area of disk).
Use this information during recovery to find blocks that may be
inconsistent, and only compare copies of these.
Used in hardware RAID systems
If either copy of an inconsistent block is detected to have an error (bad
checksum), overwrite it by the other copy. If both have no error, but
contents are different, overwrite the second block by the first block.
Storage Structure
Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main
memory.
Block movements between disk and main memory are initiated
through the following two operations:
input(B) transfers the physical block B to main memory.
output(B) transfers the buffer block B to the disk, and
replaces the appropriate physical block there.
We assume, for simplicity, that each data item fits in, and is
stored inside, a single block.
Figure 17.1 illustrates this scheme.
Storage Structure
Storage Structure
Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
Ti's local copy of a data item X is denoted by xi.
BX denotes block containing X
Transferring data items between system buffer blocks and its private work-
area done by:
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xi to data item {X} in the buffer
block.
Transactions
Must perform read(X) before accessing X for the first time (subsequent
reads can be from local copy)
The write(X) can be executed at any time before the transaction commits
Note that output(BX) need not immediately follow write(X). System can
perform the output operation when it deems fit.
Storage Structure
The output(BX) operation for the buffer block BX on which X resides
does not need to take effect immediately after write(X) is executed,
since the block BX may contain other data items that are still being
accessed. Thus, the actual output may take place later. Notice that,
if the system crashes after the write(X) operation was executed but
before output(BX) was executed, the new value of X is never written
to disk and, thus, is lost.
Storage Structure
buffer
Buffer Block A input(A)
X A
Buffer Block B Y B
output(B)
read(X)
Write(Y)
x1
y1
Private work area of T1
memory disk
Recovery and Atomicity
Consider again our simplified banking system and transaction Ti that
transfers $50 from account A to account B, with initial values of A and
B being $1000 and $2000, respectively.
Suppose that a system crash has occurred during the execution of Ti,
after output(BA) has taken place, but before output(BB) was executed,
where BA and BB denote the buffer blocks on which A and B reside.
Since the memory contents were lost, we do not know the fate of the
transaction; thus, we could invoke one of two possible recovery
procedures:
Reexecute Ti. This procedure will result in the value of A becoming
$900, rather than $950. Thus, the system enters an inconsistent
state.
Do not reexecute Ti. The current system state has values of $950
and $2000 for A and B, respectively. Thus, the system enters an
inconsistent state.
Recovery and Atomicity
To ensure atomicity despite failures, we first output information describing
the modifications to stable storage without modifying the database itself.
In either case, the database is left in an inconsistent state, and thus this
simple recovery scheme does not work. The reason for this difficulty is
that we have modified the database without having assurance that the
transaction will indeed commit. Our goal is to perform either all or no
database modifications made by Ti.
However, if Ti performed multiple database modifications, several output
operations may be required, and a failure may occur after some of these
modifications have been made, but before all of them are made.
To achieve our goal of atomicity, we must first output information
describing the modifications to stable storage, without modifying the
database itself.
As we shall see, this procedure will allow us to output all the modifications
made by a committed transaction, despite failures.
Recovery and Atomicity
Two techniques to achieve atomicity,
We study log-based recovery mechanisms in detail
Less used alternative: shadow-paging
Log-Based Recovery
The most widely used structure for recording database modifications is the
log.
The log is a sequence of log records, recording all the update activities in
the database.
An update log record describes a single database write. It has these fields:
Transaction identifier is the unique identifier of the transaction that
performed the write operation.
Data-item identifier is the unique identifier of the data item written.
Typically, it is the location on disk of the data item.
Old value is the value of the data item prior to the write.
New value is the value that the data item will have after the write.
We denote the various types of log records as:
<Ti start>. Transaction Ti has started.
<Ti, Xj, V1, V2>. Transaction Ti has performed a write on data item Xj .
Xj had value V1 before the write, and will have value V2 after the
write.
Log-Based Recovery
<Ti commit>. Transaction Ti has committed.
<Ti abort>. Transaction Ti has aborted.
For log records to be useful for recovery from system and disk failures, the log
must reside in stable storage.
Deferred Database Modification:
The deferred-modification technique ensures transaction atomicity by
recording all database modifications in the log, but deferring the execution of
all write operations of a transaction until the transaction partially commits.
When a transaction partially commits, the information on the log associated
with the transaction is used in executing the deferred writes. If the system
crashes before the transaction completes its execution, or if the transaction
aborts, then the information on the log is simply ignored.
Since a failure may occur while this updating is taking place, we must ensure
that, before the start of these updates, all the log records are written out to
stable storage. Once they have been written, the actual updating takes place,
and the transaction enters the committed state.
Log-Based Recovery
Observe that only the new value of the data item is required by the deferred
modification technique. Thus, we can simplify the general update-log record
structure that we saw in the previous section, by omitting the old-value field.
Using the log, the system can handle any failure that results in the loss of
information on volatile storage. The recovery scheme uses the following
recovery procedure:
redo(Ti) sets the value of all data items updated by transaction Ti to the
new values.
The set of data items updated by Ti and their respective new values can
be found in the log.
After a failure, the recovery subsystem consults the log to determine which
transactions need to be redone. Transaction Ti needs to be redone if and only
if the log contains both the record <Ti start> and the record <Ti commit>.
To illustrate, reconsider our simplified banking system. Let T0 be a transaction
that transfers $50 from account A to account B. Let T1 be a transaction that
withdraws $100 from account C:
Log-Based Recovery
Suppose that these transactions are executed serially, in the order T0
followed by T1, and that the values of accounts A, B, and C before the
execution took place were $1000, $2000, and $700, respectively.
Figure 17.2 shows the log that results from the complete execution of T0
and T1. Fig 17.4 shows three logs where failure has been occurred in three
places.
Log-Based Recovery
Log-Based Recovery
Assume that the crash occurs just after the log record for the step
write(B) of transaction T0 has been written to stable storage. The
log at the time of the crash appears in Figure 17.4a.
When the system comes back up, no redo actions need to be taken,
since no commit record appears in the log. The values of accounts A
and B remain $1000 and $2000, respectively. The log records of the
incomplete transaction T0 can be deleted from the log.
Log-Based Recovery
Now, let us assume the crash comes just after the log record for the step
write(C) of transaction T1 has been written to stable storage. In this case,
the log at the time of the crash is as in Figure 17.4b. When the system
comes back up, the operation redo(T0) is performed, since the record <T0
commit> appears in the log on the disk.
After this operation is executed, the values of accounts A and B are $950
and $2050, respectively. The value of account C remains $700. As before,
the log records of the incomplete transaction T1 can be deleted from the
log.
For each commit record <Ti commit> found in the log, the system
performs the operation redo(Ti).
Immediate Database Modification:
The Immediate-modification technique allows database modifications to
be output to the database while the transaction is still in the active state.
Data modifications written by active transactions are called uncommitted
modifications.
Log-Based Recovery
In the event of a crash or a transaction failure, the system must use the
old-value field of the log records to restore the modified data items to the
value they had prior to the start of the transaction. The undo operation,
described next, accomplishes this restoration.
Using the log, the system can handle any failure that does not result in the
loss of information in non-volatile storage. The recovery scheme uses two
recovery procedures:
undo(Ti) restores the value of all data items updated by transaction Ti
to the old values.
redo(Ti) sets the value of all data items updated by transaction Ti to
the new values.
The set of data items updated by Ti and their respective old and new
values can be found in the log.
After a failure has occurred, the recovery scheme consults the log to
determine which transactions need to be redone, and which need to be
undone:
Log-Based Recovery
Transaction Ti needs to be undone if the log contains the record <Ti start>,
but does not contain the record <Ti commit>.
Transaction Ti needs to be redone if the log contains both the record <Ti
start> and the record <Ti commit>.
Log-Based Recovery
Checkpoints:
When a system failure occurs, we must consult the log to determine those
transactions that need to be redone and those that need to be undone. In
principle, we need to search the entire log to determine this information.
There are two major difficulties with this approach:
1. The search process is time consuming.
2. Most of the transactions that, according to our algorithm, need to be
redone have already written their updates into the database. Although
redoing them will cause no harm, it will nevertheless cause recovery to take
longer.
To reduce these types of overhead, we introduce checkpoints. the system
periodically performs checkpoints, which require the following sequence of
actions to take place:
1. Output onto stable storage all log records currently residing in main
memory.
2. Output to the disk all modified buffer blocks.
3. Output onto stable storage a log record <checkpoint>.
Log-Based Recovery
The presence of a <checkpoint> record in the log allows the system to
streamline its recovery procedure.
Consider a transaction Ti that committed prior to the checkpoint. For such a
transaction, the <Ti commit> record appears in the log before the
<checkpoint> record. Any database modifications made by Ti must have been
written to the database either prior to the checkpoint or as part of the
checkpoint itself.
Thus, at recovery time, there is no need to perform a redo operation on Ti.
After a failure has occurred, the recovery scheme examines the log to
determine the most recent transaction Ti that started executing before the
most recent checkpoint took place. It can find such a transaction by searching
the log backward, from the end of the log, until it finds the first <checkpoint>
record, then it continues the search backward until it finds the next <Ti start>
record. This record identifies a transaction Ti.
Once the system has identified transaction Ti, the redo and undo operations
need to be applied to only transaction Ti and all transactions Tj that started
executing after transaction Ti.
Log-Based Recovery
Let us denote these transactions by the set T. The remainder (earlier
part) of the log can be ignored, and can be erased whenever
desired. The exact recovery operations to be performed depend on
the modification technique being used.
As an illustration, consider the set of transactions {T0, T1, . . ., T100}
executed in the order of the subscripts. Suppose that the most
recent checkpoint took place during the execution of transaction
T67. Thus, only transactions T67, T68, . . ., T100 need to be
considered during the recovery scheme. Each of them needs to be
redone if it has committed; otherwise, it needs to be undone.
Shadow Paging
An alternative to log-based crash-recovery techniques is shadow paging. The
shadow-paging technique is essentially an improvement on the shadow-copy
technique.
Under certain circumstances, shadow paging may require fewer disk accesses
than do the log-based methods.
As before, the database is partitioned into some number of fixed-length blocks,
which are referred to as pages. The term page is borrowed from operating
systems, since we are using a paging scheme for memory management.
Assume that there are n pages, numbered 1 through n. These pages do not need
to be stored in any particular order on disk.
However, there must be a way to find the ith page of the database for any given i.
We use a page table, as in Figure 17.8, for this purpose.
The page table has n entries—one for each database page. Each entry contains a
pointer to a page on disk.
The key idea behind the shadow-paging technique is to maintain two page tables
during the life of a transaction: the current page table and the shadow page
table. When the transaction starts, both page tables are identical.
Shadow Paging
The shadow page table is never changed over the duration of the transaction.
The current page table may be changed when a transaction performs a write
operation. All input and output operations use the current page table to
locate database pages on disk.
Suppose that the transaction Tj performs a write(X) operation, and that X
resides on the ith page. The system executes the write operation as follows:
1. If the ith page (that is, the page on which X resides) is not already in main
memory, then the system issues input(X).
2. If this is the write first performed on the ith page by this transaction, then
the system modifies the current page table as follows:
a. It finds an unused page on disk. Usually, the database system has
access to a list of unused (free) pages.
b. It deletes the page found in step 2a from the list of free page frames; it
copies the contents of the ith page to the page found in step 2a.
c. It modifies the current page table so that the ith entry points to the
page found in step 2a.
3. It assigns the value of xj to X in the buffer page.
Shadow Paging
Figure 17.9 shows the shadow and current page tables for a
transaction performing a write to the fourth page of a database
consisting of 10 pages.
Shadow Paging
Intuitively, the shadow-page approach to recovery is to store the
shadow page table in non-volatile storage, so that the state of the
database prior to the execution of the transaction can be recovered in
the event of a crash, or transaction abort.
A simple way of finding it is to choose one fixed location in stable
storage that contains the disk address of the shadow page table.
The current page table may be kept in main memory (volatile storage).
We do not care whether the current page table is lost in a crash, since
the system recovers by using the shadow page table.
When the transaction commits, the system writes the current page
table to non-volatile storage. The current page table then becomes the
new shadow page table, and the next transaction is allowed to begin
execution.
Unlike our log-based schemes, shadow paging needs to invoke no undo
operations.
Shadow Paging
To commit a transaction, we must do the following:
1. Ensure that all buffer pages in main memory that have been
changed by the transaction are output to disk.
2. Output the current page table to disk. Note that we must not
overwrite the shadow page table, since we may need it for
recovery from a crash.
3. Output the disk address of the current page table to the fixed
location in stable storage containing the address of the shadow
page table. This action overwrites the address of the old shadow
page table. Therefore, the current page table has become the
shadow page table, and the transaction is committed.
Shadow paging offers several advantages over log-based techniques.
The overhead of log-record output is eliminated, and recovery from
crashes is significantly faster (since no undo or redo operations are
needed).
Shadow Paging
However, there are drawbacks to the shadow-page technique:
Commit overhead. The commit of a single transaction using
shadow paging requires multiple blocks to be output—the actual
data blocks, the current page table, and the disk address of the
current page table.
Data fragmentation. we considered strategies to ensure locality —
that is, to keep related database pages close physically on the disk.
Locality allows for faster data transfer. Shadow paging causes
database pages to change location when they are updated. As a
result, either we lose the locality property of the pages or we must
resort to more complex, higher-overhead schemes for physical
storage management.
Garbage collection. Periodically, it is necessary to find all the
garbage pages, and to add them to the list of free pages. This
process, called garbage collection, imposes additional overhead
and complexity on the system.