Recovery System
Failure Classification
• Transaction failure :
– Logical errors: transaction cannot complete due to some internal error
condition
– System errors: the database system must terminate an active transaction due
to an error condition (e.g., deadlock)
• System crash: a power failure or other hardware or software failure causes the
system to crash.
– Fail-stop assumption: non-volatile storage contents are assumed to not be
corrupted by system crash
• Database systems have numerous integrity checks to prevent corruption
of disk data
• Disk failure: a head crash or similar disk failure destroys all or part of disk storage
– Destruction is assumed to be detectable: disk drives use checksums to detect
failures
Recovery Algorithms
• Database recovery is the process of restoring the database to the
most recent consistent state that existed just before the failure.
• Recovery algorithms are techniques to ensure database consistency
and transaction atomicity and durability despite failures
• Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure
enough information exists to recover from failures
2. Actions taken after a failure to recover the database contents to
a state that ensures atomicity, consistency and durability
• Purpose of Database Recovery
– To bring the database into the last consistent state, which
existed prior to the failure.
– To preserve transaction properties (Atomicity, Consistency,
Isolation and Durability).
Storage Medium
• Volatile storage:
– does not survive system crashes
– examples: main memory, cache memory
• Nonvolatile storage:
– survives system crashes
– examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
• Stable storage:
– a mythical form of storage that survives all failures
– approximated by maintaining multiple copies on distinct nonvolatile media
Data Access
• Datbase is partitioned into fixed length storage unit called blocks.
• Blocks are the unit of data transfer to and from disk.
• Physical blocks are those blocks residing on the disk.
• Buffer blocks are the blocks residing temporarily in main memory.
• The area of the memory where block resides temporarily is called disk buffer.
• Block movements between disk and main memory are initiated through the
following two operations:
– input(B) transfers the physical block B to main memory.
– output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
• Each transaction Ti has its private work-area in which local copies of all data items
accessed and updated by it are kept.
– Ti's local copy of a data item X is called xi.
Data Access
Data Access
• Transaction transfers data items between system buffer blocks and its private work-area
using the following operations :
– read(X) assigns the value of data item X to the local variable xi.
• if block BX on which X resides is not in main memory, it issues input(BX)
• it assigns to Xi the value of X from the buffer block
– write(X) assigns the value of local variable xi to data item X in the buffer block.
• if block BX on which X resides is not in main memory, it issues input(BX)
• it assigns the value of Xi to X in buffer block
– both these commands may necessitate the issue of an input(BX) instruction before the
assignment, if the block BX in which X resides is not already in memory.
• Transactions
– Perform read(X) while accessing X for the first time;
– All subsequent accesses are to the local copy.
– After last access, transaction executes write(X).
• write(X) can be executed at any time before the transaction commits
• output(BX) need not immediately follow write(X). System can perform the output
operation when it deems fit.
buffer
Buffer Block A input(A)
X A
Buffer Block B Y B
output(B)
read(X)
write(Y)
x2
x1
y1
work area work area
of T1 of T2
Recovery And Atomicity
• To ensure atomicity despite failures, we first
output information describing the
modifications to stable storage without
modifying the database itself.
– It allows us to output all the modification made by
a committed transaction.
• Commonly used techniques for recovery
Log Based Recovery(Based on Log records)
Shadow copying
Log Based Recovery
• Log records the database modification on a stable storage.
• Log is a sequence of log records, records all the update activities in the database
• An update log record has following field
– Transaction identifier
– Data item identifier
– Old value
– New Value
• Example <Ti, Xj, V1, V2> => Transaction Ti has update the data item Xj value to V2
from V1.
• When transaction Ti starts, it registers itself by writing a
<Ti start> to log record
• When Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old value), and V2 is the
value to be written to X (the new value).
• When Ti finishes it last statement, the log record <Ti commit> is written.
Log Based Recovery
• For a transaction performs write operation
– Log record is created and added to the log before the data base is
modified
• Log records allow
– Undo the changes made by an aborted transaction
– Redo the changes made by a committed transaction in case of system
crash
• There are following steps executed by a transaction while modifying a data
item
– Perform some computation in the own private main memory
– Modifies the data block in the disk buffer in main memory holding
data item
– Executes the output operation that writes the data block to disk.
• Two approaches using logs
– Deferred database modification
– Immediate database modification
Deferred Database Modification
• The deferred database modification scheme records all
modifications to the log, but defers all the writes to after
partial commit.
• Transaction starts by writing <Ti start> record to log.
• A write(X) operation results in a log record <Ti, X, V> being
written, where V is the new value for X
– Note: old value is not needed for this scheme
• The write is not performed on X at this time, but is deferred.
• When Ti partially commits, <Ti commit> is written to the log
• Finally, the log records are read and used to actually execute
the previously deferred writes.
Deferred Database Modification
• During recovery after a crash, a transaction needs to be redone if and only
if both <Ti start> and<Ti commit> are there in the log.
• Redoing a transaction Ti ( redoTi) sets the value of all data items updated
by the transaction to the new values.
• After a failure recovery subsystem consult the log record to determine
which transaction need to be redone.
• Ti needs to be redone if and only if log contains both < T i , start> and < Ti ,
commit>
• example transactions T0 and T1 (T0 executes before T1):
T0: read (A) T1 : read (C)
A = A - 50 C=C- 100
Write (A) write (C)
read (B)
B= B + 50
write (B)
Deferred Database Modification
• Below we show the log as it appears at three instances of time.
• If log on stable storage at time of crash is as in case:
(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Immediate Database Modification
• The immediate-modification scheme allows updates of an
uncommitted or active transaction to be made to the buffer, or
the disk itself, before the transaction commits
– since undoing may be needed, update logs must have both old
value and new value
• Data modification written by active transaction are called uncommitted
modifications.
• Update log record must be written before database item is written
– We assume that the log record is output directly to stable
storage
• Output of updated blocks to stable storage can take place at any
time before or after transaction commit
• Order in which blocks are output can be different from the order
in which they are written.
Immediate Database Modification
T0: read (A) T1 : read (C)
A = A - 50 C=C- 100
Write (A) write (C)
read (B)
B= B + 50
write (B)
Log Database
<T0 start>
<T0, A, 1000, 950>
<To, B, 2000, 2050>
A = 950
B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
<T1 commit>
• Note: BX denotes block containing X.
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.
Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000, and log records
<T0, B, 2000>, <T0, A, 1000>, <T0, abort> are written out
(b) redo (T0) and undo (T1): A and B are set to 950 and 2050 and C is restored to
700. Log records <T1, C, 700>, <T1, abort> are written out.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Recovery Algorithm
• A transaction is said to have committed when its commit log record is output to
stable storage
– all previous log records of the transaction must have been output already
• Writes performed by a transaction may still be in the buffer when the transaction
commits, and may be output later.
• The database modification must be preceded by
– Creation of log
– It allows the system to perform
• Undo(Ti): Restores the value of all data items updated by the transaction Ti to
old values
• Redo(Ti): Sets the value of all data items updated by the transaction Ti to new
values
• When recovering after failure:
– Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.
– Transaction Ti needs to be redone if the log contains both the record <Ti start>
and the record <Ti commit>.
• Undo operations are performed first, then redo operations.
Checkpoints
• When a system failure occurs, DBMA must consult log to determine those
transactions that need to be redone and those that undone.
• There are 2 major difficulties with this approach
1. Search process is time consuming
2. Most of the transaction that according to algorithm need to be
redone have already which have already output their updates to the
database.
• Streamline recovery procedure by periodically performing checkpointing
1. Store all log records currently residing in main memory onto stable
storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint> onto stable storage.
– All updates are stopped while doing checkpointing
Checkpoints
• During recovery we need to consider only the most recent transaction Ti
that started executing before the checkpoint, and transactions that
started after Ti.
1. Scan backwards from end of log to find the most recent
<checkpoint> record
2. Continue scanning backwards till a record <Ti start> is found.
3. For all transactions (starting from Ti or later) with no <Ti commit>,
execute undo(Ti). (Done only in case of immediate modification.)
4. Scanning forward in the log, for all transactions starting from Ti or
later with a <Ti commit>, execute redo(Ti).
Example of Checkpoints
Tc Tf
T1
T2
T3
T4
checkpoint system failure
• T1 can be ignored (updates already output to disk due to checkpoint)
• T2 and T3 redone.
• T4 undone
Shadow Paging
• Shadow paging is an alternative to log-based recovery; this scheme is useful if
transactions execute serially
• The db is partitioned into number of fixed length disk block known as page.
• Assume db is partitioned into n pages.
• The page table has n entries for each db page where ith entry points to ith db page on
disk.
• Idea: Maintain two page tables during the lifetime of a transaction
– current page table (current directory) Main Memory and
– shadow page table(shadow directory) Stable Storage
• Store the shadow page table in nonvolatile storage, such that state of the database prior
to transaction execution may be recovered.
– Shadow page table is never modified during execution
• When a transaction current directory is copied to shadow directory. Only current page
table is used for data item accesses during execution of the transaction.
• The shadow page table is never changed over the duration of transaction.
• Whenever any page is about to be written for the first time
– A copy of this page is made onto an unused page.
– The current page table is then made to point to the copy
– The update is performed on the copy
Shadow Paging (Cont.)
Shadow Paging
• To manage access of data items by concurrent transactions
two directories (current and shadow) are used.
– The directory arrangement is illustrated below. Here a page is a
data item.
Sample Page Table
Example of Shadow Paging
Shadow and current page tables after write to page 4