Checkpointing & Rollback in Distributed Systems

Uploaded by

gamevortex076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views10 pages

Checkpointing & Rollback in Distributed Systems

Uploaded by

gamevortex076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Checkpointing and

Rollback Recovery in
Distributed Computing
By,
[Link] MUTHU,
III-CSE-’B’.
INTRODUCTION
 In distributed computing, multiple processes run on different
systems and communicate through a network.
 If one process or system fails, the whole application shouldn’t
stop.
 To handle this, we use Checkpointing and Rollback Recovery
techniques.
➡ In simple words:
Checkpointing means “saving the current state”
and Rollback Recovery means “restoring from that saved state
after a failure.”
2
What is Checkpointing?
 Checkpointing means saving the current state of a process.
 If a failure happens later, we can start again from that saved
point not from the beginning.
 It’s like an auto-save option in games or documents.
Example:
Imagine you are typing a document in Google Docs.
Even if your laptop turns off suddenly, when you open it again,
your writing will be safe till the last auto-save.
That auto-save point is called a Checkpoint.
What is Rollback Recovery?
 Rollback Recovery means that when a system fails, it goes back (rolls back) to
the last checkpoint and continues execution from there.
 So only the small work done after the last checkpoint will be lost.
Example:
In an online banking transaction, if the system crashes after debiting the amount but
before confirmation,the bank’s server rolls back to the previous checkpoint-meaning
the transaction will be canceled safely, and your money won’t be lost.
Rollback Propagation (Domino Effect):
 Sometimes, when one process rolls back, it may force other connected processes
also to roll back — to maintain consistency.
 This chain reaction is called the Domino Effect.
Example:
In a bank, money is sent from Branch A → Branch B → Branch [Link] Branch B fails
because of a network problem and rolls back,then Branch A and Branch C must also
rollback to keep all account balances correct and the system consistent. 4
Types of Checkpointing Techniques
1. Uncoordinated (Independent)
Checkpointing:
Each process takes its checkpoint independently
without communicating with others.
It’s simple to implement but can cause a domino
effect because checkpoints may not match.
Example:
In a distributed weather monitoring system, if
each sensor saves data independently,
one sensor failure can disturb overall system
consistency.

5
2. Coordinated Checkpointing:
All processes coordinate and take checkpoints
together at the same time.
This ensures a consistent global state and
avoids the domino effect.
Example:
In online ticket booking, all modules like
payment, seat booking, and confirmation take a
checkpoint together.
So, if any failure occurs, the system can restore
cleanly from that consistent point.

6
3. Communication-Induced
Checkpointing:
Here, checkpoints are taken automatically when
processes communicate based on information
attached to messages.
It reduces coordination overhead and keeps the
system consistent.
Example(Collaborative Document Editing):
In Google Docs, multiple users edit the same
[Link] are automatically saved
when users make [Link] the system crashes,
the latest edits are safe and the document stays
consistent.

7
[Link]-Based Rollback Recovery:
This method uses both checkpoints and
message/event logs.
It assumes the system’s behavior is piecewise
deterministic (PWD) — meaning the same input
produces the same output.
After a failure, the system replays the logs from
the last checkpoint to restore the exact state.
Example:
In databases, every transaction (like insert,
update, delete) is logged.
If a crash happens, the database replays those
logs to recover completed transactions safely.

8
ADVANTAGES: DISADVANTAGES:
[Link] Tolerance: [Link] Overhead:
•Helps the system recover automatically •Saving checkpoints frequently uses more CPU,
after a failure without restarting completely. memory, and storage space.
[Link] Re-computation: [Link] Coordination:
•Only the work done after the last •Synchronizing checkpoints among multiple
checkpoint is lost, saving time and effort. processes is difficult.
[Link] Consistency: [Link] Effect:
•Maintains a consistent state across all •In uncoordinated checkpointing, one rollback
distributed processes after recovery. can cause many others to rollback too.
[Link] Data Loss: [Link] Requirement:
•Because states are saved periodically, very •Large storage is needed to maintain multiple
little data is lost during a crash. checkpoints and logs.
[Link] System Reliability: [Link] Delay:
•Increases overall dependability and •Checkpointing and recovery can slow down
stability of distributed applications normal system performance.

9
10

Checkpoiniting and Rollback
No ratings yet
Checkpoiniting and Rollback
13 pages
Checkpointing & Rollback Recovery in Systems
No ratings yet
Checkpointing & Rollback Recovery in Systems
3 pages
Log-Based Rollback Recovery Techniques
No ratings yet
Log-Based Rollback Recovery Techniques
34 pages
System Recovery and Error Management
No ratings yet
System Recovery and Error Management
38 pages
Recovery
No ratings yet
Recovery
14 pages
Checkpointing and Rollback Recovery in Distributed Systems
No ratings yet
Checkpointing and Rollback Recovery in Distributed Systems
11 pages
Dis Notes 4
No ratings yet
Dis Notes 4
31 pages
Fault Tolerant Checkpointing Protocols
No ratings yet
Fault Tolerant Checkpointing Protocols
35 pages
Recovery Techniques in Distributed Systems
No ratings yet
Recovery Techniques in Distributed Systems
119 pages
Lecture - Failure - Recovery
No ratings yet
Lecture - Failure - Recovery
49 pages
Recovery in Concurrent Systems
No ratings yet
Recovery in Concurrent Systems
9 pages
Advanced Recovery Techniques in OS
No ratings yet
Advanced Recovery Techniques in OS
74 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
Understanding the Domino Effect in Distributed Systems
No ratings yet
Understanding the Domino Effect in Distributed Systems
21 pages
Understanding the Domino Effect in Rollback Recovery
No ratings yet
Understanding the Domino Effect in Rollback Recovery
21 pages
Distributed Computing Recovery Strategies
No ratings yet
Distributed Computing Recovery Strategies
4 pages
Checkpointing and Rollback Recovery Techniques
No ratings yet
Checkpointing and Rollback Recovery Techniques
14 pages
Checkpointing and Rollback Recovery Guide
No ratings yet
Checkpointing and Rollback Recovery Guide
5 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Checkpointing and Rollback Recovery in Systems
No ratings yet
Checkpointing and Rollback Recovery in Systems
24 pages
Key Topics in Distributed Computing
No ratings yet
Key Topics in Distributed Computing
24 pages
Rollback and Recovery in Distributed Systems
No ratings yet
Rollback and Recovery in Distributed Systems
12 pages
Log-Based Recovery Explained with Examples
No ratings yet
Log-Based Recovery Explained with Examples
12 pages
Checkpointing and Recovery Basics
No ratings yet
Checkpointing and Recovery Basics
6 pages
Consensus and Recovery Algorithms Explained
No ratings yet
Consensus and Recovery Algorithms Explained
3 pages
Consensus and Recovery in Distributed Systems
No ratings yet
Consensus and Recovery in Distributed Systems
3 pages
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
No ratings yet
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
23 pages
Recovery and Consensus in Distributed Computing
No ratings yet
Recovery and Consensus in Distributed Computing
94 pages
Checkpointing and Recovery Techniques
No ratings yet
Checkpointing and Recovery Techniques
4 pages
Rollback Recovery in Distributed Systems
No ratings yet
Rollback Recovery in Distributed Systems
22 pages
Checkpointing and Rollback Recovery Techniques
No ratings yet
Checkpointing and Rollback Recovery Techniques
33 pages
Checkpointing Algorithms in Distributed Systems
No ratings yet
Checkpointing Algorithms in Distributed Systems
40 pages
Distributed Shared Memory & Recovery Techniques
No ratings yet
Distributed Shared Memory & Recovery Techniques
14 pages
L35 CSC-503
No ratings yet
L35 CSC-503
11 pages
Checkpointing in Distributed Systems
No ratings yet
Checkpointing in Distributed Systems
33 pages
Distributed Failure Recovery Techniques
No ratings yet
Distributed Failure Recovery Techniques
30 pages
DC Part B Completed The Dsa Topics As A Premium Source
No ratings yet
DC Part B Completed The Dsa Topics As A Premium Source
18 pages
Checkpointing and Recovery in Distributed Systems
100% (1)
Checkpointing and Recovery in Distributed Systems
26 pages
DC Unit 4 Book - PDF On Distributed Computing
No ratings yet
DC Unit 4 Book - PDF On Distributed Computing
33 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
30 pages
Coordinated Checkpointing in Distributed Systems
No ratings yet
Coordinated Checkpointing in Distributed Systems
33 pages
CS 194: Two-Phase Commit Protocol
No ratings yet
CS 194: Two-Phase Commit Protocol
15 pages
Coordinated Recovery in Distributed Systems
No ratings yet
Coordinated Recovery in Distributed Systems
5 pages
Understanding Checkpointing in Databases
No ratings yet
Understanding Checkpointing in Databases
49 pages
Coordinated Checkpointing in Recovery
No ratings yet
Coordinated Checkpointing in Recovery
32 pages
Koo-Toueg Checkpointing Algorithm Explained
No ratings yet
Koo-Toueg Checkpointing Algorithm Explained
4 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
33 pages
Synchronous Checkpoint Recovery Algorithm
No ratings yet
Synchronous Checkpoint Recovery Algorithm
4 pages
Deadlock and Recovery in Distributed Systems
No ratings yet
Deadlock and Recovery in Distributed Systems
55 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
32 pages
Issues in Failure Recovery in Systems
No ratings yet
Issues in Failure Recovery in Systems
27 pages
Database Failure Types and Recovery Methods
No ratings yet
Database Failure Types and Recovery Methods
5 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
32 pages
Checkpointing and Rollback in Distributed Systems
No ratings yet
Checkpointing and Rollback in Distributed Systems
26 pages
Checkpointing and Rollback Recovery in Distributed Systems
No ratings yet
Checkpointing and Rollback Recovery in Distributed Systems
36 pages
Failure Recovery in Distributed Computing
No ratings yet
Failure Recovery in Distributed Computing
5 pages
Koo-Toueg Checkpointing Algorithm Explained
No ratings yet
Koo-Toueg Checkpointing Algorithm Explained
8 pages
CBSE Class 10 IT Sample Paper 2024-25
No ratings yet
CBSE Class 10 IT Sample Paper 2024-25
6 pages
Sorting Strategies in DBMS Explained
No ratings yet
Sorting Strategies in DBMS Explained
4 pages
.NET API Transform and Analytics Methods
No ratings yet
.NET API Transform and Analytics Methods
4 pages
MapReduce Patterns for Big Data Analysis
No ratings yet
MapReduce Patterns for Big Data Analysis
43 pages
Managing SQL Data and Concurrency
No ratings yet
Managing SQL Data and Concurrency
31 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
21 pages
SQL Views and Indexes Guide for CSE
No ratings yet
SQL Views and Indexes Guide for CSE
3 pages
Dynamic Waste Sorter App Development Guide
No ratings yet
Dynamic Waste Sorter App Development Guide
3 pages
Topic 6.pivot, Power Pivot, Dashboard-1
No ratings yet
Topic 6.pivot, Power Pivot, Dashboard-1
6 pages
Lists vs Dictionaries vs Tuples in Python
No ratings yet
Lists vs Dictionaries vs Tuples in Python
2 pages
Pragma Edge Interview Preparation Guide
No ratings yet
Pragma Edge Interview Preparation Guide
17 pages
HBase Overview: Features & Applications
No ratings yet
HBase Overview: Features & Applications
42 pages
Understanding Views in DBMS
No ratings yet
Understanding Views in DBMS
7 pages
Azure SQL Database Migration Overview
100% (1)
Azure SQL Database Migration Overview
3,323 pages
Understanding Object-Oriented Databases
No ratings yet
Understanding Object-Oriented Databases
39 pages
Understanding OLAP Cubes and Operations
No ratings yet
Understanding OLAP Cubes and Operations
7 pages
Seven-Step U.S. Patent Search Guide
No ratings yet
Seven-Step U.S. Patent Search Guide
7 pages
Oracle Cloud Migration Specialist Resume
No ratings yet
Oracle Cloud Migration Specialist Resume
7 pages
Siddharth Kapatel's Tech Portfolio
No ratings yet
Siddharth Kapatel's Tech Portfolio
1 page
MySQL Certification Exam Insights
No ratings yet
MySQL Certification Exam Insights
110 pages
File System vs Database System Overview
No ratings yet
File System vs Database System Overview
20 pages
SQL Joins and Data Blending Explained
No ratings yet
SQL Joins and Data Blending Explained
30 pages
Understanding HDFS in Big Data
No ratings yet
Understanding HDFS in Big Data
61 pages
OpenText RightFax 10.6 Administrative Utilities Guide
100% (1)
OpenText RightFax 10.6 Administrative Utilities Guide
56 pages
Understanding Referential Integrity in DBMS
0% (1)
Understanding Referential Integrity in DBMS
12 pages
Effective Error Handling & Logging
No ratings yet
Effective Error Handling & Logging
13 pages
Rental House Management System Project
No ratings yet
Rental House Management System Project
26 pages
Database Management Systems Exam Paper
No ratings yet
Database Management Systems Exam Paper
4 pages
IS222 Assignment 2: Database Design Tasks
No ratings yet
IS222 Assignment 2: Database Design Tasks
5 pages
PySpark Databricks Interview Questions
No ratings yet
PySpark Databricks Interview Questions
4 pages

Checkpointing & Rollback in Distributed Systems

Uploaded by

Checkpointing & Rollback in Distributed Systems

Uploaded by

Checkpointing and

You might also like