0% found this document useful (0 votes)
42 views15 pages

Understanding the Raft Algorithm

The Raft Consensus Algorithm is designed to achieve consensus in distributed systems, ensuring agreement on shared state and recovering from server failures. It features leader election, log replication, and safety mechanisms to maintain consistency even in the event of leader crashes. Raft aims to be more understandable than previous algorithms like Paxos, making it easier to implement in real-world systems.

Uploaded by

ayush gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views15 pages

Understanding the Raft Algorithm

The Raft Consensus Algorithm is designed to achieve consensus in distributed systems, ensuring agreement on shared state and recovering from server failures. It features leader election, log replication, and safety mechanisms to maintain consistency even in the event of leader crashes. Raft aims to be more understandable than previous algorithms like Paxos, making it easier to implement in real-world systems.

Uploaded by

ayush gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

The Raft Consensus Algorithm

Originally by :Diego Ongaro John Ousterhout


Stanford University

Edited using content from the paper as well.

[Link]
What is Consensus?
● Agreement on shared state (single system image)
● Recovers from server failures autonomously
 Minority of servers fail: no problem
 Majority fail: lose availability, retain consistency

Servers
● Key to building consistent storage systems

September 2014 Raft Consensus Algorithm Slide 2


Inside a Consistent System
● TODO: eliminate single point of failure
● An ad hoc algorithm
 “This case is rare and typically occurs as a result
of a network partition with replication lag.”

– OR –

● A consensus algorithm (built-in or library)


 Paxos, Raft, …
● A consensus service
 ZooKeeper, etcd, consul, …
September 2014 Raft Consensus Algorithm Slide 3
Replicated State Machines
Clients
z6
Consensus State Consensus State Consensus State
Module Machine Module Machine Module Machine
x 1 x 1 x 1
y 2 y 2 y 2
Servers
z 6 z 6 z 6
Log Log Log
x3 y2 x1 z6 x3 y2 x1 z6 x3 y2 x1 z6

● Replicated log  replicated state machine


 All servers execute same commands in same order

● Consensus module ensures proper log replication

● System makes progress as long as any majority of servers are up

● Failure model: fail-stop (not Byzantine), delayed/lost messages


September 2014 Raft Consensus Algorithm Slide 4
Paxos Protocol
● Leslie Lamport, 1989
● Nearly synonymous with consensus

“The dirty little secret of the NSDI community is that at


most five people really, truly understand every part of
Paxos ;-).” – NSDI reviewer

“There are significant gaps between the description of


the Paxos algorithm and the needs of a real-world
system…the final system will be based on an unproven
protocol.” – Chubby authors

September 2014 Raft Consensus Algorithm Slide 5


Raft’s Design for
Understandability
● We wanted the best algorithm for building real
systems
 Must be correct, complete, and perform well
 Must also be understandable
● “What would be easier to understand or explain?”
 Fundamentally different decomposition than Paxos
 Less complexity in state space
 Less mechanism

September 2014 Raft Consensus Algorithm Slide 6


Raft Overview
1. Leader election
 Select one of the servers to act as cluster leader
 Detect crashes, choose new leader
2. Log replication (normal operation)
 Leader takes commands from clients, appends them
to its log
 Leader replicates its log to other servers (overwriting
inconsistencies)
3. Safety
 How to keep log consistent even on leader crash.
Could crash half way between log replication.
 Only servers with uptodate logs can be leaders
September 2014 Raft Consensus Algorithm Slide 7
Leader election
Time out (random numbers ) on not receiving any heart beat.
First server to time out will send out leader election requests.
If gets vote from majority then he considers himself as leader and
starts sending heartbeat msgs
For every heart beat received all reset their timeout value

Log replication
Client when wants to write sends to leader.
Leader writes to its log
Sends requests to other servers to write into log
After replication done then marks that it can be used by state machine

September 2014 Raft Consensus Algorithm Slide 8


Terms

Detect obsolete leader.


One term is for one leader. When leader crashes then next term.
No global view of terms. Each server has his own understanding of term numbe
Updated when servers talk to each other and ensures he is follower if leader.
If gets client request then he wont execute and just steps down.

September 2014 Raft Consensus Algorithm Slide 9


Core Raft Review
1. Leader election
 Heartbeats and timeouts to detect crashes
 Randomized timeouts to avoid split votes
 Majority voting to guarantee at most one leader per term

2. Log replication (normal operation)


 Leader takes commands from clients, appends them to its log
 Leader replicates its log to other servers
 Built-in consistency check simplifies how logs may differ

3. Safety
 Only elect leaders with all committed entries in their logs
 New leader defers committing entries from prior terms

September 2014 Raft Consensus Algorithm Slide 10


Leader Operation
● Client sends request to leader
● Leader appends to its log and sends it to followers
to append
● Once leader gets enough response from followers
he then executes on his state machine and tells
client. For next RPCS it also informs other followers
who also executes on their SM.
● Leader keeps connecting to followers till its done if
follower crashes meanwhile

Assumption: majority of servers always up.

September 2014 Raft Consensus Algorithm Slide 11


Log entries

September 2014 Raft Consensus Algorithm Slide 12


Inconsistencies: Can this happen?

September 2014 Raft Consensus Algorithm Slide 13


Conclusions
● Consensus widely regarded as difficult
● Raft designed for understandability
 Easier to teach in classrooms
 Better foundation for building practical systems
● Paper/thesis covers much more
 Cluster membership changes (simpler in thesis)
 Log compaction (expanded tech report/thesis)
 Client interaction (expanded tech report/thesis)
 Evaluation (thesis)

September 2014 Raft Consensus Algorithm Slide 14


Questions

[Link]

September 2014 Raft Consensus Algorithm Slide 15

You might also like