0% found this document useful (0 votes)
9 views29 pages

Understanding PBFT in Distributed Systems

The document provides an overview of distributed consensus, focusing on algorithms like Paxos, Raft, and Practical Byzantine Fault Tolerance (PBFT). It explains the mechanisms of these algorithms, their advantages, and the challenges they address in achieving consensus in distributed systems. Additionally, it discusses the CAP theorem, which highlights the limitations in simultaneously achieving consistency, availability, and partition tolerance in distributed systems.

Uploaded by

Hosam Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views29 pages

Understanding PBFT in Distributed Systems

The document provides an overview of distributed consensus, focusing on algorithms like Paxos, Raft, and Practical Byzantine Fault Tolerance (PBFT). It explains the mechanisms of these algorithms, their advantages, and the challenges they address in achieving consensus in distributed systems. Additionally, it discusses the CAP theorem, which highlights the limitations in simultaneously achieving consistency, availability, and partition tolerance in distributed systems.

Uploaded by

Hosam Mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Distributed

Consensus
Supervisor

Dr. Ahmed Salem


Team Name 0 Abdullah Helmy Rashad
1
Innovision

02 Mahmoud Ibrahim Kamal

03 Abdelrahman Mohamed
Leve
Team Work l
Four ( 4 )
Abdelgwad

04 Ahmed Mahmoud Ahmed


Ibrahim

05 Abdelrahman Atef Mohamed


Course
Distributed 06 Abdelghfar Salah Abdelghfar
Systems
Definition
 Distributed consensus is a process used in distributed
systems to ensure that multiple nodes (computers or processes)
in a network agree on a single data value or decision.
Consensus Algorithms
 Paxos: A highly influential algorithm designed to achieve consensus in
a fault-tolerant manner.
 Raft: Simplifies Paxos and is widely used in systems like etcd.
 Byzantine Fault Tolerance (BFT): Handles malicious or faulty nodes,
used in systems like blockchain.
Paxos
 Paxos is a fault-tolerant distributed consensus algorithm
designed to enable a group of nodes to agree on a single
value, even in the presence of failures.
 Paxos Essentials
• We assume a client-server configuration, with initially one primary
server.
• In Paxos, the primary server is called the leader.
• To make the server more robust, we start with adding a backup server.
• To ensure that all commands are executed in the same order at both
servers, the primary server assigns unique sequence numbers to all
commands.
Paxos
 Some Paxos Terminology
• The leader sends an accept message ACCEPT(o, t) to backups when
assigning a timestamp t to command o
• A backup responds by sending a learn message: LEARN(o, t)
• When the leader notices that operation o has not yet been learned, it
retransmits ACCEPT(o, t) with the original timestamp.
Paxos
 Two-server situation
Paxos
 Two servers and one crash: problem

 Problem
• Primary crashes after executing an operation, but the backup never received
the accept message.
Paxos
 Two servers and one crash: solution

 Solution
• Never execute an operation before it is clear that is has been learned.
Paxos
 Reliable failure detection is practically impossible. A solution is to set timeouts,
but take into account that a detected failure may be false.
Paxos
 Observation
• Paxos needs at least three servers.

 Adapted fundamental rule


• In Paxos with three servers, a server S cannot execute an operation o until it
has received at least one (other) LEARN(o) message, so that it knows that a
majority of servers will execute o.
Paxos
 Three servers and two crashes: still a problem

 Scenario
• What happens when LEARN(o1) as sent by S2 to S1 is lost?
 General rule
• In Paxos, a server S cannot execute an operation o until it has received a LEARN(o)
from all other nonfaulty servers.
Paxos
 Why having 3k processes is not enough.
Paxos
 Why having 3k + 1 processes is enough.
Raft
 It is a consensus algorithm used in distributed systems to ensure
consistency of data and provide service even in the event of
breakdowns. Raft is especially designed to be easier to understand
and implement compared to other consensus algorithms such as
Paxos.
Why Raft Algorithm ? 09/25

Raft's Advantages: Simplicit


y
 Simplicity and understandability. Practical
implementation
 Practical implementation in distributed systems. Reliable
Distribut
 Strong consistency guarantees. Strong ed
consisten consens
 Flexible: Able to handle server breakdowns and cy us

change leaders easily. Flexible


How does the raft work ?
1. Leader election:
• The algorithm begins by electing a single HB

server to be the leader.


• This leader is responsible for making Followers
decisions and coordinating work among
Leader
the other servers.

(Note)
The leader in Raft provides centralized control, prevents conflicts,
improves performance, and ensures fault tolerance in distributed
systems. (*HB: heartbeat )

See how RAFT works


How does the raft work ?
• If leader is dead
Timeout
Candidate Timeout = 120ms
err(leader dead)
node Log length = 4

Candidate Timeout Timeout = 110ms


Leader node err(leader dead)
Log length = 4

Candidate Timeout Candidate


node err(leader dead) node Timeout = 100ms
Log length = 4

Candidate Timeout Candidate Timeout = 100ms


node err(leader dead) node
Log length = 2

Term:1 Term:2
How does the raft
work ?
2. Registration of Operations:
• The leader records all operations (such as adding a new
item, updating value, etc.) in a Log. Follower

• This record is copied to all other servers. nd


a
m
m
co a nd
m m Follower
co

comma
CLIENT
Request LEADER nd
Follower

comm
and

Follower

(Note)
If the follower receives a request, he forwards it to the leader.
How does the Raft
work?

3. Agreement on Operations:
• The process cannot be finalized until a majority of servers
have agreed to it.
Follower
ri es
Steps: endE nt
App
Sending: The leader broadcasts the log entry
to all followers. AppendEntries Follower
LEADER
Processing: Followers verify the log entry and
update their logs if valid, then acknowledge the Appe
leader. ndE ntrie
s Follower
Confirmation: The leader commits the log
entry once a majority of followers
acknowledge.
How does the raft
work?

[Link] to operations:
• Once all servers agree on a process, this process is
committed and cannot be undone
Paxos vs Raft

Aspect Paxos Raft


Simpler to understand and
Simplicity More complex and less intuitive
implement

Leader Election Implicit leader election Explicit leader election process

More complex log replication


Log Replication Leader-centric log replication
mechanism
Can be less efficient due to its Generally considered faster
Performance
complexity and more efficient
Practical Byzantine Fault Tolerance (PBFT)
 Refers to the ability of a distributed system to achieve consensus even if some
of the nodes exhibit Byzantine faults

 Assumptions
• A server may exhibit arbitrary failures
• Messages may be lost, delayed, and received out of order
• Messages have an identifiable sender (i.e., they are signed)
• Partially synchronous execution model

 Note
• A primary-backup approach with 3k + 1 replica servers
PBFT: four phases

• C is the client
• P is the primary
• B1, B2 , B3 are backups
• Assume B2 is faulty
PBFT: four phases(PRE-PREPARE)

• C requests operation o to be executed


• P timestamps o and sends PRE-PREPARE
• Backup Bi accepts the pre-prepare message if it is also is in v and has
not accepted a an operation with timestamp t before.
PBFT: four phases(PREPARE)

• Bi broadcasts PREPARE(t, v, o) to all (including the primary)


• Note: a nonfaulty server will eventually log 2k messages PREPARE(t, v, o)
(including its own) ⇒ consensus on the ordering of o.
• Note: it doesn’t matter what faulty B2 sends, it cannot affect joint
decisions by P, B1, B3 .
PBFT: four phases(COMMIT)

• All servers broadcast COMMIT(t, v, o)


• The commit is needed to also make sure that o can be executed now,
that is, in the current view v .
• When 2k messages have been collected, excluding its own, the server
can safely execute o then reply to the client.
CAP Theorem

 The CAP theorem states that a distributed computer system cannot guarantee
all of the following three properties at the same time.

• C: consistency, by which a shared and replicated data item appears as a


single, up-to-date copy
• A: availability, by which updates will always be eventually executed
• P: Tolerant partitioning of process group.

 Why can't the three characteristics be achieved together?


• The main reason is that these properties conflict with each other. For example, if we
want to achieve full consistency, we may have to delay some operations until we
make sure that all contracts are agreed on the same situation, which affects
availability.
Final
Slide

Thank You

Common questions

Powered by AI

Paxos has an implicit leader election process, often considered more complex, leading to its reputation as a challenging algorithm for understanding leader coordination. Raft, on the other hand, uses an explicit leader election, simplifying the process by establishing clear leader responsibilities. In terms of log replication, Paxos uses a more complex mechanism, whereas Raft is leader-centric, where the leader replicates logs to followers, ensuring simplicity and speed in committing operations upon achieving a majority agreement .

Distributed consensus algorithms such as Paxos and Raft require more than a single server to function effectively because these algorithms rely on achieving consensus among nodes, particularly in fault tolerance scenarios. Paxos expressly requires at least three servers to handle failures and maintain system operations through majority agreements on operations, which secure resilience in the face of node or communication failures. Additionally, multiple servers ensure that there is a fallback mechanism in case the leader fails, enabling seamless continuation and coordination of operations .

Paxos addresses the potential issue of a primary server crash, which may occur after executing an operation without the backup server receiving the accept message, by ensuring that an operation is never executed before it is confirmed as learned. This means that the operation execution is delayed until the leader is certain that the majority of servers have agreed to execute the operation. This is accomplished by receiving 'LEARN' messages from other servers, indicating acknowledgment from a majority .

The separation of phases in PBFT, namely Pre-Prepare, Prepare, and Commit, enhances its capacity to handle Byzantine faults by structuring the consensus process such that each phase provides successive levels of validation. Each phase requires a certain number of node agreements, mitigating the impact of any faulty nodes on consensus decisions. For example, the Prepare phase involves receiving 2k PREPARE messages to ensure that honest nodes agree on the operation's timestamp, while the Commit phase requires 2k COMMIT messages, including no faulty influences, before execution. This modular approach ensures rigorous agreement and prevents faulty nodes from compromising the system's integrity .

In Paxos, reliable failure detection is practically addressed using timeouts to assume a failure has occurred when expected responses are not received. However, this approach has inherent limitations, such as the possibility of false positives, where a timeout implies failure even if the server is running correctly but experiencing temporary network delays. This highlights the critical challenge of adjusting timeout durations wisely to mitigate incorrect failure detection while maintaining quick responsiveness to actual failures .

In the Raft algorithm, the leader node has central control responsibilities, coordinating tasks such as logging operations and ensuring followers replicate these logs. The leader ensures consistency and handles incoming requests. When a leader fails, followers detect the absence of heartbeats within a timeout period, triggering the election of a new leader. Candidate nodes solicit votes from followers, and the candidate with a majority wins. This rapid leader replacement ensures system robustness and continuity of operations without prolonged downtime .

Raft offers several advantages over Paxos, particularly in terms of simplicity and implementation. Raft is designed to be easier to understand, featuring a clear and explicit leader election process which improves system performance and fault tolerance. Its leader-centric log replication simplifies its model compared to Paxos, aiding in faster processing and decision-making. Raft's straightforward design enables more practical implementations in real-world distributed systems, making it more accessible for developers to implement and maintain .

The CAP theorem has a significant influence on the design of distributed systems by emphasizing that a system cannot simultaneously guarantee consistency, availability, and partition tolerance. Designing a system involves trading off between these properties. For instance, ensuring consistency often requires delaying availability during partition-tolerant scenarios, thus affecting system responsiveness. Similarly, prioritizing availability may lead to inconsistencies during network partitions. These limitations or trade-offs challenge designers to decide which aspects are most critical based on their specific application requirements .

The key phases of PBFT are Pre-Prepare, Prepare, and Commit. In the Pre-Prepare phase, the primary node timestamps the client's request and sends a PRE-PREPARE message to backups, ensuring the initial agreement on processing. In the Prepare phase, each backup broadcasts a PREPARE message to confirm receipt and agreement on the operation timing, helping form a consensus despite Byzantine faults. Finally, in the Commit phase, all nodes broadcast a COMMIT message to finalize and execute the operation once 2k COMMIT messages are collected, ensuring that the operation can be safely executed .

In PBFT, having '3k + 1' replicas is crucial because it ensures that the system can tolerate up to 'k' faults. The setup is designed so that while 'k' replicas can fail, the remaining '2k + 1' honest replicas can still reach a consensus. This configuration provides resilience against Byzantine faults where nodes can fail or act maliciously. By exceeding simple majority requirements, PBFT ensures that the honest replicas can override the bad or broken nodes, maintaining system integrity and reliability .

You might also like