Transaction Processing Concepts: -Transaction System,
Testing of Serializability, Serializability of schedules, conflict
& view serializable schedule, recoverability, Recovery from
transaction failures. Log based recovery. Checkpoints
deadlock handling. Concurrency Control Techniques:
Concurrency Control, locking Techniques for concurrency
control, time stamping protocols for concurrency control,
validation based protocol, multiple granularity. Multi version
schemes, Recovery with concurrent transaction.
Introduction to Distributed databases, data mining, data
warehousing, Object Technology and DBMS, Comparative
study of OODBMS Vs DBMS . Temporal, Deductive,
Multimedia, Web & Mobile database
Transaction Management
• What is Transaction
• ACID Properties
• Transaction Management
• Concurrency and problem
• Conflict and view serialzability
Transaction Management
A transaction is a unit of program execution that accesses and possibly
updates various data items.
A transaction is a logical unit of work that contains one or more SQL
statements.
A transaction is an atomic unit. The effects of all the SQL statements in a
transaction can be either all committed (applied to the database) or all
rolled back (undone from the database).
A transaction in an executing program that forms logical unit of database
processing. A transaction includes one or more database access
operations – these can include insertion, deletion and modification or
retrieval operations. The database operations that form a transaction can
either be embedded within an application program or they can be specified
interactively via a high-level query language such as SQL.
The basic database access operations are –
1. read_item(X) : reads a database item names X into a program
variable.
2. Write_item(X) : Writes the value of program variable X into the
database.
Executing a read_item command includes following –
a. Find the address of the disk block that contains item X
b. Copy the disk block into a buffer in main memory.
c. Copy item X from the buffer to the program variable named X.
Executing a write_item command includes following –
a. Find the address of the disk block that contains item X.
b. Copy the disk block into a buffer in main memory.
c. Copy item X from the program variable named X into its
correct location in the buffer.
d. Store the update block from the buffer back to disk. (Updating
of Database)
ACID Properties
To preserve integrity of data, the database system
must ensure:
• Atomicity.
Either all operations of the transaction are
properly reflected in the database or none are.
• Consistency / Serializability.
Execution of a transaction in isolation preserves
the consistency of the database.
ACID Properties (cont…)
• Isolation.
Although multiple transactions may execute
concurrently, each transaction must be unaware
of other concurrently executing transactions.
Intermediate transaction results must be hidden
from other concurrently executed transactions.
• Durability.
After a transaction completes successfully,
the changes it has made to the database persist,
even if there are system failures.
Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:
– 1. read(A)
– 2. A := A – 50
– 3. write(A)
– 4. read(B)
– 5. B := B + 50
– 6. write(B)
• Consistency requirement – the sum of A and B is
unchanged by the execution of the transaction.
• Atomicity requirement — if the transaction fails after step
3 and before step 6, the system should ensure that its
updates are not reflected in the database, else an
inconsistency will result.
• Durability requirement — once the user has been
notified that the transaction has completed (i.e., the
transfer of the $50 has taken place), the updates to the
database by the transaction must persist despite failures.
• Isolation requirement — if between steps 3 and 6,
another transaction is allowed to access the partially
updated database, it will see an inconsistent database
(the sum A + B will be less than it should be).
Can be ensured trivially by running transactions serially,
that is one after the other. However, executing multiple
transactions concurrently has significant benefits.
Trans. Mgt. with SQL
• COMMIT statement – ends the SQL trans.;
effects permanently recorded within DB
• ROLLBACK statement – DB is rolled back to its
previous consistent state and all the changes
are aborted
• Reach end of the program successfully – similar
to COMMIT
• Program abnormally terminated – similar to
ROLLBACK
Transaction State
Active – the initial state; the transaction stays in this state while it is executing
Partially committed – after the final statement has been executed.
Failed -- after the discovery that normal execution can no longer proceed.
Aborted – after the transaction has been rolled back and the database restored to
its state prior to the start of the transaction. Two options after it has been aborted:
• restart the transaction; can be done only if no internal logical error
• kill the transaction
Committed – after successful completion.
Implementation of Atomicity and Durability
• The recovery-management component of a database
system implements the support for atomicity and
durability.
• The shadow-database scheme:
– assume that only one transaction is active at a time.
– a pointer called db_pointer always points to the current consistent
copy of the database.
– all updates are made on a shadow copy of the database, and
db_pointer is made to point to the updated shadow copy only
after the transaction reaches partial commit and all updated
pages have been saved to disk.
– in case transaction fails, old consistent copy pointed to by
db_pointer can be used, and the shadow copy can be deleted.
The shadow-database scheme:
Concurrent Executions
• Multiple transactions are allowed to run concurrently in the system.
Advantages are:
– increased processor and disk utilization, leading to better
transaction throughput: one transaction can be using the CPU
while another is reading from or writing to the disk
– reduced average response time for transactions: short
transactions need not wait behind long ones.
• Concurrency control schemes – mechanisms to achieve
isolation; that is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
Schedules
• Schedule – a sequences of instructions that specify
the chronological order in which instructions of
concurrent transactions are executed
– a schedule for a set of transactions must consist
of all instructions of those transactions
– must preserve the order in which the instructions
appear in each individual transaction.
• A transaction that successfully completes its
execution will have a commit instructions as the last
statement
• A transaction that fails to successfully complete its
execution will have an abort instructions as the last
statement
T1 T2
R(A) R(B)
W(A) W(B)
R(B) R(A)
W(B W(A)
Schedule 1
S1 S2
T1 T1 T2
R(A) R(A)
W(A) W(A)
R(B) R(B)
W(B) W(B)
T2 R(B)
R(B) R(A)
W(B)
R(A) W(B)
W(A) W(A)
Serial Schedule Non Serial Schedule
Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer
10% of the balance from A to B.
• A serial schedule in which T1 is followed by T2:
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3
• Let T1 and T2 be the transactions defined previously.
The following schedule is not a serial schedule,
Schedule 4
• The following concurrent schedule does not
preserve the value of (A + B).
Serializability
• Basic Assumption – Each transaction preserves
database consistency.
• Thus serial execution of a set of transactions
preserves database consistency.
• A (possibly concurrent) schedule is serializable if it is
equivalent to a serial schedule. Different forms of
schedule equivalence give rise to the notions of:
– 1. conflict serializability
– 2. view serializability
• We ignore operations other than read and write
instructions, and we assume that transactions may
perform arbitrary computations on data in local
buffers in between reads and writes. Our simplified
schedules consist of only read and write instructions.
Ii Ij Ii Ij S2
R(A) R(A) T1 T2
W(B) W(A)
--------------------------------------
----------------------------------------
W(B) W(A) R(A)
R(A) W(A)
R(A)
R(B)
W(B)
Ii Ij R(B)
R(A) R(A)
R(A)
--------------------------------------
R(A)
W(B)
R(A) W(A)
Non Serial Schedule
Conflicting Instructions
• Instructions li and lj of transactions Ti and Tj respectively,
conflict if and only if there exists some item Q accessed by
both li and lj, and at least one of these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
• Intuitively, a conflict between li and lj forces a (logical)
temporal order between them.
– If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they
had been interchanged in the schedule.
View Serializability
• Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the following
three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value
of Q in schedule S, then transaction Ti must, in schedule S´,
also read the initial value of Q.
2. For each data item Q if transaction Ti executes read(Q) in
schedule S, and that value was produced by transaction Tj
(if any), then transaction Ti must in schedule S´ also read the
value of Q that was produced by transaction Tj .
3. For each data item Q, the transaction (if any) that performs
the final write(Q) operation in schedule S must perform the
final write(Q) operation in schedule S´.
As can be seen, view equivalence is also based purely on reads
and writes alone.
Concurrency Control
• A database must provide a mechanism that will ensure
that all possible schedules are
– either conflict or view serializable, and
– are recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a
time generates serial schedules, but provides a poor
degree of concurrency
– Are serial schedules recoverable/cascadeless?
• Testing a schedule for serializability after it has
executed is a little too late!
• Goal – to develop concurrency control protocols that
will assure serializability.
Time Stamping Protocol
• The main aim of any database management system is to control requests
for the same data, at the same time, from multiple users.
• Each transaction is issued a timestamp when it starts.
• Concurrency control techniques based on timestamp ordering, thus
deadlock can’t occur (No transaction ever waits)
• The protocol manages concurrent execution such that the time stamps
determine the serializability order.
• If an old transaction Ti has time stamp TS(Ti), a new transaction Tj is
assigned time stamp TS(Tj) such that TS(Ti) < TS (Tj).
Timestamp--Ordering Protocol
Timestamp
• The timestamp-ordering protocol guarantees
serializability since all the arcs in the precedence graph
are of the form:
Thus, there will be no cycles in the precedence graph
• Timestamp protocol ensures freedom from deadlock as
no transaction ever waits.
• But the schedule may not be cascade-free, and may
not even be recoverable.
In order to assure such behavior, the protocol maintains
for each data
Q two timestamp values:
• W-timestamp(Q) is the largest time-stamp of any
transaction that executed write(Q) successfully.
• R-timestamp(Q) is the largest time-stamp of any
transaction that executed read(Q) successfully.
The timestamp ordering protocol ensures that any
conflicting read and write operations are executed in
timestamp order.
Suppose a transaction Ti issues a read(Q)
1. If TS(Ti ) ≤ W-timestamp(Q), then Ti needs to read a
value of Q that was already overwritten. Hence, the
read operation is rejected, and Ti is rolled back.
2. If TS(Ti ) ≥ W-timestamp(Q), then the read operation
is executed, and R-timestamp(Q) is set to the
maximum of R-timestamp(Q) and TS(Ti ).
Suppose that transaction Ti issues write(Q).
• If TS(Ti ) < R-timestamp(Q), then the value of Q that Ti
is producing was needed previously, and the system
assumed that that value would never be produced.
Hence, the write operation is rejected, and Ti is rolled
back.
• If TS(Ti ) < W-timestamp(Q), then Ti is attempting to
write an obsolete value of Q. Hence, this write operation
is rejected, and Ti is rolled back.
• Otherwise, the write operation is executed, and W-
timestamp(Q) is set to TS(Ti ).
• TS(Ti) ≥ R-timestamp(Q)
• TS(Ti) ≥ W-timestamp(Q)
Lock--Based Protocols
Lock
• A lock is a mechanism to control concurrent access to a
data item
• Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well
as written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
• Lock requests are made to the concurrency-control
manager by the programmer. Transaction can proceed only
after request is granted.
• Lock-compatibility matrix
• A transaction may be granted a lock on an item if the requested
lock is compatible with locks already held on the item by other
transactions
• Any number of transactions can hold shared locks on an item,
– But if any transaction holds an exclusive on the item no
other transaction may hold any lock on the item.
• If a lock cannot be granted, the requesting transaction is made
to wait till all incompatible locks held by other transactions have
been released. The lock is then granted.
Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
Validation--Based Protocol
Validation
• Execution of transaction Ti is done in three phases.
1. Read and execution phase: Transaction Ti writes only to
temporary local variables
2. Validation phase: Transaction Ti performs a ''validation
test'' to determine if local variables can be written without
violating serializability.
3. Write phase: If Ti is validated, the updates are applied to
the database; otherwise, Ti is rolled back.
• The three phases of concurrently executing transactions
can be interleaved, but each transaction must go through
the three phases in that order.
– Assume for simplicity that the validation and write phase
occur together, atomically and serially
• I.e., only one transaction executes validation/write at
a time.
• Also called as optimistic concurrency control since
transaction executes fully in the hope that all will go well
during validation
Each transaction Ti has 3 timestamps
Start(Ti) : the time when Ti started its execution
Validation(Ti): the time when Ti entered its validation phase
Finish(Ti) : the time when Ti finished its write phase
Transaction Log
• Keep track of all transactions that update the DB
• If failure occurs, information that was stored here
will be used for recovery
• It is triggered by ROLL BACK statement,
program abnormal termination, or system failure
• It states before-and-after data of the DB and the
tables, rows and attribute values that
participated in the transaction
Transaction Log (cont…)
• The transaction log is subject to dangers such
as disk full conditions and disk crashes
• It has to be managed like other DBs
• Transaction log will increase the processing
overhead – but it is worthwhile
Failure Classification
• Transaction failure :
– Logical errors: transaction cannot complete due to some internal error
condition
– System errors: the database system must terminate an active
transaction due to an error condition (e.g., deadlock)
• System crash: a power failure or other hardware or software
failure causes the system to crash.
– Fail-stop assumption: non-volatile storage contents are assumed to
not be corrupted by system crash
• Database systems have numerous integrity checks to prevent
corruption of disk data
• Disk failure: a head crash or similar disk failure destroys all
or part of disk storage
– Destruction is assumed to be detectable: disk drives use checksums to
detect failures
Log-Based Recovery
Transaction identifier : is the unique identifier of the
transaction that performed the write operation.
Data-item identifier : is the unique identifier of the data
item written, i.e. the location on disk of the data item.
Old Value : is the value of the data item prior to the
write.
New Value : is the value that the data item will have
after the write.
Example :-
<Ti start> “Transaction Ti has started”
<Ti, Xj, V1, V2> “Transaction Ti has performed a write on data item Xj. Xj had
value V1 before the write, and will have value V2 after the write.
<Ti commit> “Transaction Ti has Committed”
<Ti abort> “Transaction Ti has aborted”
• A log is kept on stable storage.
– The log is a sequence of log records, and maintains a record of
update activities on the database.
• When transaction Ti starts, it registers itself by writing a
<Ti start>log record
• Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where
V1 is the value of X before the write, and V2 is the value to be written to X.
– Log record notes that Ti has performed a write on data item Xj Xj had
value V1 before the write, and will have value V2 after the write.
• When Ti finishes it last statement, the log record <Ti commit> is written.
• We assume for now that log records are written directly to stable storage
(that is, they are not buffered)
Checkpoints
• Problems in recovery procedure as discussed earlier :
1. searching the entire log is time-consuming
2. we might unnecessarily redo transactions which have
already written their updates to the database.
• Streamline recovery procedure by periodically performing
checkpointing
1. Output all log records currently residing in main
memory onto stable storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable storage.
• During recovery we need to consider only the most recent
transaction Ti that started before the checkpoint, and
transactions that started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is
found.
3. Need only consider the part of log following above start
record. Earlier part of log can be ignored during recovery,
and can be erased whenever desired.
4. For all transactions (starting from Ti or later) with no <Ti
commit>, execute undo(Ti). (Done only in case of
immediate modification.)
5. Scanning forward in the log, for all transactions starting
from Ti or later with a <Ti commit>, execute redo(Ti).
Example of Checkpoints
Tc Tf
T1
T2
T3
T4
checkpoint system failure