0% found this document useful (0 votes)

13 views15 pages

CS 194: Two-Phase Commit Protocol

This document summarizes key concepts related to distributed commit and recovery in distributed systems: 1) Two-phase commit (2PC) is a protocol that allows processes to agree whether to commit or abort a transaction in a distributed system despite failures. It uses a coordinator and participants. 2) Stable storage is used to log actions during 2PC to allow processes to recover after crashes and determine their commit decision. 3) Checkpointing involves periodically saving process state to stable storage to enable backward or forward recovery from failures through rollback or restart from a known good state. Message logging is another recovery technique.

Uploaded by

Karthik Kannan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

CS 194: Two-Phase Commit Protocol

Uploaded by

Karthik Kannan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

CS 194: Distributed Systems Distributed Commit, Recovery

Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776

Distributed Commit

Goal: Either all members of a group decide to perform an operation, or none of them perform the operation

Assumptions

Failures:
- Crash failures that can be recovered - Communication failures detectable by timeouts

Notes:
- Commit requires a set of processes to agree - similar to the Byzantine general problem - but the solution much simpler because stronger assumptions

Two Phase Commit (2PC)

Coordinator
send VOTE_REQ to all send vote to coordinator if (vote == no) decide abort halt if (all votes yes) decide commit send COMMIT to all else decide abort send ABORT to all who voted yes halt Participants

if receive ABORT, decide abort else decide commit halt 4

2PC State Machine

a) b)

The finite state machine for the coordinator in 2PC The finite state machine for a participant

2PC: Crash Recovery Protocol

Stable storage is persistent memory that supports writes that are atomic with respect to failures Log actions: c sends VOTE_REQ write start p votes YES write yes p votes NO write abort commit point c decides commit write commit c decides abort write abort p receives decision write decision

2PC: Crash Recovery Protocol

Upon recovery a process r starts reading the values logged to stable storage. If there is a start then r was the coordinator: - If there is a subsequent abort or commit then decision was made; otherwise decide abort. Otherwise, r was a participant: - If there is abort or commit then the decision was made; - If there is no yes then decide abort. - Otherwise (i.e., there is an yes record) run termination protocol. ... when can these records be garbage collected?
7

Recovery Techniques: Checkpoints

Goal: recover a process from error Backward recovery: checkpoint the state of the process periodically
- Go to previous checkpoint, if error - Problem: same failure may repeat

Forward recovery: go to a known good state if error

- Problem: need to know in advance which error may occur

Example: Reliable Communication

Backward recovery: retransmit packet if lost

Forward recovery: use erasure coding

- Instead of sending k packets, send n > k using erasure coding - As long as receiver gets at least k packets out of n, it can reconstruct the original k packets

Recovery Techniques: Message Logging

Sender based: sender logs message before sending it out Receiver based: receiver logs message before delivering it Replay log messages between checkpoints restore state beyond most recent checkpoint

Distributed Checkpointing: Recovery Line

Recovery line: most recent snapshot

- If a process P has recorder the receipt of message m there should be a process Q that recorded sending of message m

How do you find a recover line?

Independent Checkpointing: The Domino Effect

Domino effect: cascaded rollback to find a recovery line Solutions:

- Coordinate checkpointing: use two-phase non-blocking protocol (see the book) - Logging and replaying messages
12

Message Logging and Checkpointing

Incorrect replay of messages after recovery, leading to an orphan process

Stable Storage

Storage designed to survive anything except major calamities

Use two disks to record identical information

1) Write and verify sector on disk 1 2) Write and verify sector on disk 2

Recovery
Verify all sectors If two corresponding sectors differ, copy sector from disk 1 to disk

Stable Storage Recovery

a) b) c)

Stable Storage Crash after drive 1 is updated Bad spot

System Recovery and Error Management
No ratings yet
System Recovery and Error Management
38 pages
Advanced Recovery Techniques in OS
No ratings yet
Advanced Recovery Techniques in OS
74 pages
Recovery
No ratings yet
Recovery
14 pages
Dis Notes 4
No ratings yet
Dis Notes 4
31 pages
Checkpointing and Rollback Recovery in Distributed Systems
No ratings yet
Checkpointing and Rollback Recovery in Distributed Systems
11 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
Distributed Computing Recovery Strategies
No ratings yet
Distributed Computing Recovery Strategies
4 pages
Database Failure Types and Recovery Methods
No ratings yet
Database Failure Types and Recovery Methods
5 pages
Fault Tolerance and Recovery Strategies
No ratings yet
Fault Tolerance and Recovery Strategies
10 pages
Recovery Techniques in Distributed Systems
No ratings yet
Recovery Techniques in Distributed Systems
119 pages
Recovery and Consensus in Distributed Computing
No ratings yet
Recovery and Consensus in Distributed Computing
94 pages
Log-Based Rollback Recovery Techniques
No ratings yet
Log-Based Rollback Recovery Techniques
34 pages
Fault Tolerant Checkpointing Protocols
No ratings yet
Fault Tolerant Checkpointing Protocols
35 pages
DC Unit 4 Book - PDF On Distributed Computing
No ratings yet
DC Unit 4 Book - PDF On Distributed Computing
33 pages
Rollback Recovery in Distributed Systems
No ratings yet
Rollback Recovery in Distributed Systems
22 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
30 pages
Checkpointing and Rollback Recovery Techniques
No ratings yet
Checkpointing and Rollback Recovery Techniques
33 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
20 pages
Understanding Fault Tolerance Concepts
No ratings yet
Understanding Fault Tolerance Concepts
52 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
32 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
71 pages
Coordinated Checkpointing in Recovery
No ratings yet
Coordinated Checkpointing in Recovery
32 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
33 pages
Distributed Failure Recovery Techniques
No ratings yet
Distributed Failure Recovery Techniques
30 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
37 pages
Consensus and Recovery in Distributed Systems
No ratings yet
Consensus and Recovery in Distributed Systems
32 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
33 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
23 pages
Consensus and Recovery in Distributed Systems
No ratings yet
Consensus and Recovery in Distributed Systems
3 pages
Consensus and Recovery Algorithms Explained
No ratings yet
Consensus and Recovery Algorithms Explained
3 pages
Recovery in Concurrent Systems
No ratings yet
Recovery in Concurrent Systems
9 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Fault Tolerance in Distributed Systems
100% (1)
Fault Tolerance in Distributed Systems
21 pages
Checkpointing & Rollback Recovery in Systems
No ratings yet
Checkpointing & Rollback Recovery in Systems
3 pages
Understanding the Domino Effect in Distributed Systems
No ratings yet
Understanding the Domino Effect in Distributed Systems
21 pages
Giu 2573 68 30060 2026-03-09T13 09 13
No ratings yet
Giu 2573 68 30060 2026-03-09T13 09 13
23 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
39 pages
Checkpointing and Rollback Recovery Guide
No ratings yet
Checkpointing and Rollback Recovery Guide
5 pages
Understanding the Domino Effect in Rollback Recovery
No ratings yet
Understanding the Domino Effect in Rollback Recovery
21 pages
Recovery and Consensus in Distributed Systems
No ratings yet
Recovery and Consensus in Distributed Systems
32 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
19 pages
Distributed Database Recovery Protocols
No ratings yet
Distributed Database Recovery Protocols
31 pages
Rollback Recovery & Consensus Algorithms
No ratings yet
Rollback Recovery & Consensus Algorithms
35 pages
Checkpoiniting and Rollback
No ratings yet
Checkpoiniting and Rollback
13 pages
Checkpointing and Rollback Recovery in Distributed Systems
No ratings yet
Checkpointing and Rollback Recovery in Distributed Systems
36 pages
CS3551 Unit IV: Recovery & Consensus
No ratings yet
CS3551 Unit IV: Recovery & Consensus
34 pages
Checkpointing and Rollback Recovery Techniques
No ratings yet
Checkpointing and Rollback Recovery Techniques
14 pages
Understanding Fault Tolerance in Systems
No ratings yet
Understanding Fault Tolerance in Systems
29 pages
CS3551 Unit IV: Recovery & Consensus
No ratings yet
CS3551 Unit IV: Recovery & Consensus
34 pages
Understanding Fault Tolerance Systems
No ratings yet
Understanding Fault Tolerance Systems
48 pages
Coordinated Recovery in Distributed Systems
No ratings yet
Coordinated Recovery in Distributed Systems
6 pages
Checkpointing & Rollback in Distributed Systems
No ratings yet
Checkpointing & Rollback in Distributed Systems
10 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
68 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
30 pages
Checkpointing and Recovery Techniques
No ratings yet
Checkpointing and Recovery Techniques
4 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
51 pages
Checkpointing and Rollback Recovery in Systems
No ratings yet
Checkpointing and Rollback Recovery in Systems
24 pages
Mobile Computing Question Bank
No ratings yet
Mobile Computing Question Bank
10 pages
FTP Server and Client Implementation
No ratings yet
FTP Server and Client Implementation
2 pages
Understanding Jini Middleware Protocol
0% (1)
Understanding Jini Middleware Protocol
5 pages
NIH FISMA Compliance and Security Issues
No ratings yet
NIH FISMA Compliance and Security Issues
37 pages
Integrity Models: Biba & Clark-Wilson
No ratings yet
Integrity Models: Biba & Clark-Wilson
24 pages
Chapter 7: Hybrid Policies: - Overview - Chinese Wall Model - Clinical Information Systems Security Policy - Orcon - Rbac
No ratings yet
Chapter 7: Hybrid Policies: - Overview - Chinese Wall Model - Clinical Information Systems Security Policy - Orcon - Rbac
50 pages
Bell-LaPadula Model Overview
No ratings yet
Bell-LaPadula Model Overview
31 pages
Glee Cast: Born This Way Lyrics
No ratings yet
Glee Cast: Born This Way Lyrics
2 pages
MySQL vs PostgreSQL: Key Differences
No ratings yet
MySQL vs PostgreSQL: Key Differences
8 pages
Noto'g'ri Fe'llar Jadvali PDF
No ratings yet
Noto'g'ri Fe'llar Jadvali PDF
6 pages
Art Forms of Kerala - 20250707 - 111307 - 0000
No ratings yet
Art Forms of Kerala - 20250707 - 111307 - 0000
18 pages
Guidelines for Concelebration in Mass
No ratings yet
Guidelines for Concelebration in Mass
23 pages
Ibitabo bya Alexis Kagame PDF
100% (1)
Ibitabo bya Alexis Kagame PDF
2 pages
Makna Vape bagi Wanita Pekanbaru
No ratings yet
Makna Vape bagi Wanita Pekanbaru
15 pages
SA-MP CodsMP Gameplay Log
No ratings yet
SA-MP CodsMP Gameplay Log
7 pages
Entity-Relationship Modeling Basics
No ratings yet
Entity-Relationship Modeling Basics
63 pages
Eng101 Midterm MCQs Study Guide
No ratings yet
Eng101 Midterm MCQs Study Guide
20 pages
Subtracting Functions Lesson Plan
No ratings yet
Subtracting Functions Lesson Plan
4 pages
Alliteration in Second Isaiah
No ratings yet
Alliteration in Second Isaiah
12 pages
Financial Advisor Bot for Personal Finance
No ratings yet
Financial Advisor Bot for Personal Finance
30 pages
Cryptography Applications in ElGamal and RSA
No ratings yet
Cryptography Applications in ElGamal and RSA
3 pages
EsP Intervention Plan for Student Success
No ratings yet
EsP Intervention Plan for Student Success
2 pages
Origins of the Rajputs Explained
No ratings yet
Origins of the Rajputs Explained
3 pages
DYNASTY Real Movement. Real Growth.
No ratings yet
DYNASTY Real Movement. Real Growth.
5 pages
Spanish General Proficiency Test Guide
No ratings yet
Spanish General Proficiency Test Guide
11 pages
P.5 Mathematics Lesson Notes: Sets & Numeracy
No ratings yet
P.5 Mathematics Lesson Notes: Sets & Numeracy
109 pages
Grade 8 English: Linear vs Nonlinear Texts
No ratings yet
Grade 8 English: Linear vs Nonlinear Texts
3 pages
Essential ICT Skills for Accounting
No ratings yet
Essential ICT Skills for Accounting
69 pages
Q2 MODULE 4 Creative Nonfiction
100% (10)
Q2 MODULE 4 Creative Nonfiction
14 pages
Secreta Secretorum: A Historical Overview
100% (4)
Secreta Secretorum: A Historical Overview
312 pages
Festivals Celebrated in Assam
No ratings yet
Festivals Celebrated in Assam
24 pages
Customer Contact Details by Region
No ratings yet
Customer Contact Details by Region
27 pages
Object Oriented Programming Overview
No ratings yet
Object Oriented Programming Overview
15 pages
XHTML and JavaScript Overview for CS453
No ratings yet
XHTML and JavaScript Overview for CS453
66 pages
Why Your CV is Essential for Success
No ratings yet
Why Your CV is Essential for Success
11 pages
Digital System Short Notes
100% (1)
Digital System Short Notes
6 pages
Upgrade ESXi 6.5 to 6.7 Guide
No ratings yet
Upgrade ESXi 6.5 to 6.7 Guide
4 pages

CS 194: Two-Phase Commit Protocol

Uploaded by

CS 194: Two-Phase Commit Protocol

Uploaded by

CS 194: Distributed Systems Distributed Commit, Recovery

Two Phase Commit (2PC)

if receive ABORT, decide abort else decide commit halt 4

2PC State Machine

2PC: Crash Recovery Protocol

2PC: Crash Recovery Protocol

Recovery Techniques: Checkpoints

Forward recovery: go to a known good state if error

Example: Reliable Communication

Backward recovery: retransmit packet if lost

Forward recovery: use erasure coding

Recovery Techniques: Message Logging

Distributed Checkpointing: Recovery Line

Recovery line: most recent snapshot

How do you find a recover line?

Independent Checkpointing: The Domino Effect

Domino effect: cascaded rollback to find a recovery line Solutions:

Message Logging and Checkpointing

Incorrect replay of messages after recovery, leading to an orphan process

Storage designed to survive anything except major calamities

Use two disks to record identical information

Stable Storage Recovery

Stable Storage Crash after drive 1 is updated Bad spot

You might also like