UNIT 4
Q.1 What are the key issues in Replica Management? Explain the following with respect to
content replication and placement with suitable diagram.[9]
i) Permanent Replicas
ii) Server-Initiated Replicas
iii) Client-Initiated Replicas
→
1. Introduction
Replication is the process of maintaining multiple copies (replicas) of data or services at
different locations in a distributed system.
Replica management deals with deciding:
• Where replicas should be placed.
• How many replicas should exist.
• How replicas should be updated and maintained.
The main objective is to improve availability, reliability, fault tolerance, and performance.
2. Key Issues in Replica Management
1. Replica Placement
Determining where replicas should be stored in the network.
2. Replica Consistency
Ensuring all copies contain the same updated data.
3. Replica Selection
Choosing the best replica for serving client requests.
4. Update Propagation
Ensuring modifications are propagated to all replicas.
5. Scalability
Supporting a large number of users and replicas efficiently.
6. Fault Tolerance
Maintaining service even when some servers fail.
7. Load Balancing
Distributing requests among multiple replicas.
Content Replication and Placement
Content replication determines:
• Which content should be replicated.
• Where replicas should be placed.
• When replicas should be created or removed.
Three common approaches are:
1. Permanent Replicas
2. Server-Initiated Replicas
3. Client-Initiated Replicas
i) Permanent Replicas
Definition
Permanent replicas are fixed copies of data that are always present on designated servers.
They are created by the system administrator and remain permanently stored.
Diagram
Original Server
|
-------------------------
| | |
Replica1 Replica2 Replica3
Working
1. Original data is stored on primary server.
2. Copies are permanently maintained on several servers.
3. Clients access the nearest replica.
Advantages
• High availability.
• Improved reliability.
• Fast access for users.
Disadvantages
• Storage overhead.
• Maintenance cost.
• Consistency management required.
ii) Server-Initiated Replicas
Definition
In this approach, the server automatically creates replicas based on demand.
If a particular content becomes popular, the server generates additional replicas near users.
Diagram
Main Server
|
High Request Rate
|
v
-------------------------
| | |
Replica1 Replica2 Replica3
Working
1. Server monitors request frequency.
2. Popular content is detected.
3. Additional replicas are created automatically.
4. Requests are redirected to nearby replicas.
Advantages
• Reduces server load.
• Improves response time.
• Dynamic adaptation to workload.
Disadvantages
• Additional monitoring overhead.
• More complex management.
iii) Client-Initiated Replicas
Definition
In this approach, clients create local copies of frequently accessed data.
These replicas are usually cached near the client.
Diagram
Main Server
|
Client Request
|
Client
|
Local Replica
(Cache)
Working
1. Client requests data from server.
2. Server sends the data.
3. Client stores a local copy.
4. Future requests are served from local cache.
Advantages
• Faster access.
• Reduced network traffic.
• Lower server workload.
Disadvantages
• Cache consistency problems.
• Local storage required.
• Cached copy may become outdated.
Q.2 What is a primary-based protocol in a consistency protocol? Explain the working of
replicated write protocols with active replication.
→
Introduction
In distributed systems, multiple copies (replicas) of data are maintained to improve availability
and performance.
A Consistency Protocol ensures that all replicas remain consistent when updates occur.
One important consistency protocol is the Primary-Based Protocol, where one replica is
designated as the Primary Replica and controls all write operations.
What is a Primary-Based Protocol?
A Primary-Based Protocol is a consistency protocol in which one replica acts as the primary
server (master) and all other replicas act as backup replicas (slaves).
Main Idea
• All write operations are first performed on the primary replica.
• The primary replica propagates updates to all backup replicas.
• This ensures that all replicas remain consistent.
Diagram
Working
1. Client sends write request to primary replica.
2. Primary updates its local copy.
3. Primary propagates update to backup replicas.
4. Backups acknowledge update.
5. Client receives confirmation.
Advantages of Primary-Based Protocol
1. Simple consistency management.
2. Prevents conflicting updates.
3. Easy synchronization.
4. Maintains data consistency.
5. Suitable for distributed databases.
Active Replication
Active Replication is a replication technique in which every replica performs the same
operation in the same order.
All replicas receive identical requests and independently execute them.
As a result, every replica remains consistent.
Diagram of Active Replication
All replicas execute the operation simultaneously.
Working of Replicated Write Protocol with Active Replication
Step 1: Client Generates Write Request
Example:
Update Account Balance = 5000
Client sends request to the replication system.
Step 2: Request Ordering
A coordinator or sequencing mechanism ensures all replicas receive requests in the same
order.
Request Order:
R1 → R2 → R3
Step 3: Broadcast Request
The write request is broadcast to all replicas.
Client
|
+------> Replica 1
+------> Replica 2
+------> Replica 3
Step 4: Execute Operation
Each replica executes the same operation independently.
Example:
Balance = Balance + 5000
All replicas update their copies.
Step 5: Generate Identical State
Since all replicas execute the same operation in the same order, all reach the same state.
Replica1 = 5000
Replica2 = 5000
Replica3 = 5000
Step 6: Send Reply
One replica (or coordinator) sends the response to the client.
Operation Successful
Example
Assume three replicas maintain a bank account.
Initial Balance:
R1 = 1000
R2 = 1000
R3 = 1000
Client request:
Deposit 500
After active replication:
R1 = 1500
R2 = 1500
R3 = 1500
All replicas remain consistent.
Advantages of Active Replication
1. High Availability
2. Fault Tolerance
3. Fast Recovery
4. Consistent Data
5. Improved Reliability
Disadvantages of Active Replication
1. High Communication Cost
2. Increased Processing Overhead
3. Complex Ordering Mechanism
4. More Resource Consumption
Q.3 What is checkpointing in a distributed system? Explain the working of Coordinated
checkpointing recovery mechanism.
→
Introduction
In a distributed system, failures such as process crashes, communication failures, or node
failures may occur.
To recover from such failures without restarting the entire computation, checkpointing is
used.
Checkpointing saves the current state of processes periodically so that the system can restart
from the saved state after a failure.
What is Checkpointing?
Checkpointing is the process of saving the current state of a process or distributed application
onto stable storage at regular intervals.
The saved state is called a checkpoint.
If a failure occurs, the system rolls back to the latest checkpoint and resumes execution from
there.
Diagram
Time ----->
P1 : ---- C1 ------- C2 ------- C3 ----
P2 : ---- C1 ------- C2 ------- C3 ----
Failure occurs here
X
Recovery starts from C2
Where:
• C1, C2, C3 = Checkpoints
• X = Failure Point
Need for Checkpointing
1. Fault Recovery
Allows recovery from failures.
2. Reduces Re-computation
Execution resumes from the latest checkpoint instead of starting from the beginning.
3. Improves Reliability
Provides fault tolerance.
4. Supports Long Computations
Useful in scientific and distributed applications.
5. Maintains System Consistency
Helps restore a consistent system state.
Coordinated Checkpointing
In Coordinated Checkpointing, all processes coordinate with each other and take checkpoints
simultaneously.
The collection of checkpoints forms a consistent global checkpoint.
This eliminates inconsistencies during recovery.
Diagram
Coordinator
|
--------------------------
| | |
v v v
P1 P2 P3
All processes take checkpoint
at the same coordinated time
Working of Coordinated Checkpointing Recovery Mechanism
Step 1: Checkpoint Request
A coordinator initiates checkpointing and sends a checkpoint request to all processes.
Coordinator
|
+----> P1
+----> P2
+----> P3
Step 2: Temporary Blocking
Processes temporarily stop sending application messages.
This prevents inconsistent checkpoints.
Step 3: Local Checkpoint Creation
Each process saves its current state.
The saved information includes:
• Program counter
• Variables
• Message buffers
• Process state
P1 → Checkpoint C1
P2 → Checkpoint C1
P3 → Checkpoint C1
Step 4: Acknowledgement
Each process sends an acknowledgment to the coordinator.
P1 ---- ACK ----> Coordinator
P2 ---- ACK ----> Coordinator
P3 ---- ACK ----> Coordinator
Step 5: Global Checkpoint Formation
After receiving acknowledgments from all processes, the coordinator confirms a consistent
global checkpoint.
Global Checkpoint = {C1(P1), C1(P2), C1(P3)}
Step 6: Resume Execution
Processes continue normal execution.
Recovery After Failure
Suppose a failure occurs.
P1 ---- C1 ---- C2 ---- X
P2 ---- C1 ---- C2 ----
P3 ---- C1 ---- C2 ----
Where:
• X = Failure
Recovery Steps
1. Failure is detected.
2. All processes roll back to checkpoint C2.
3. Execution resumes from C2.
4. Lost computations after C2 are re-executed.
Recovery Diagram
Before Failure
P1 ---- C1 ---- C2 ---- X
P2 ---- C1 ---- C2 ----
P3 ---- C1 ---- C2 ----
After Recovery
P1 ---- C1 ---- C2
P2 ---- C1 ---- C2
P3 ---- C1 ---- C2
Advantages of Coordinated Checkpointing
1. Eliminates Domino Effect
2. Simple Recovery
3. Consistent Global State
4. Easy Implementation
5. Reliable Recovery
Disadvantages of Coordinated Checkpointing
1. Synchronization Overhead
2. Temporary Blocking
3. Communication Cost
4. Scalability Issues
Q.4 What are two primary reasons for replication? Explain the causal consistency model
with suitable example using distributed shared database.
→
Introduction
In distributed systems, data is often copied and stored at multiple locations. These copies are
called replicas.
Replication improves system performance, reliability, and availability by maintaining multiple
copies of the same data.
Two Primary Reasons for Replication
1. Improve Performance
Replication places copies of data closer to users.
Benefits:
• Reduced access time.
• Faster response.
• Reduced network traffic.
• Better load balancing.
Example
Users in Pune ---> Replica Server (India)
Users in USA ---> Replica Server (USA)
Users access nearby replicas instead of a distant server.
2. Improve Availability and Reliability
If one server fails, another replica can continue serving requests.
Benefits:
• Fault tolerance.
• High availability.
• Continuous service during failures.
Example
Primary Server
|
----------------
| |
Replica1 Replica2
If Primary fails,
Replica1 or Replica2 continues service.
Causal Consistency Model
Definition
Causal Consistency ensures that all processes observe causally related operations in the
same order, while concurrent operations may be seen in different orders.
In simple words:
If operation B is influenced by operation A, then every process must see A before B.
Concept of Causal Relationship
Suppose:
Write(X=10) → Read(X=10) → Write(Y=20)
Here:
• Write(X=10) causes Read(X=10)
• Read(X=10) causes Write(Y=20)
Therefore:
Write(X=10) → Write(Y=20)
This causal order must be preserved everywhere.
Example Using Distributed Shared Database
Assume a distributed shared database replicated at three sites:
Shared Database
Site A Site B Site C
Initial values:
X=0
Y=0
Step 1: User P1 Updates X
At Site A:
P1 : Write(X = 10)
Step 2: User P2 Reads X
At Site B:
P2 : Read(X = 10)
P2 now knows X = 10.
Step 3: User P2 Updates Y
Based on X value:
P2 : Write(Y = 20)
Because Y was updated after reading X:
Write(X=10) → Write(Y=20)
These operations are causally related.
Correct Causal Order
Every replica must observe:
Write(X=10)
Before
Write(Y=20)
Valid Observation
Site A
X=10
Y=20
Site B
X=10
Y=20
Site C
X=10
Y=20
This satisfies causal consistency.
Invalid Observation
Site C
Y=20
X=0
This violates causal consistency because Y depends on X.
Site C observed:
Write(Y=20)
Before
Write(X=10)
which is incorrect.
Characteristics of Causal Consistency
1. Preserves Cause-Effect Relationship
2. Allows Concurrent Operations
3. Weaker than Sequential Consistency
4. Suitable for Distributed Databases
Advantages
1. Preserves logical ordering of updates.
2. Better performance than strong consistency models.
3. Supports distributed shared databases.
4. Reduces synchronization overhead.
5. Maintains meaningful data consistency.
Disadvantages
1. More complex implementation.
2. Requires tracking causal dependencies.
3. Additional metadata overhead.
4. Not as strong as sequential consistency.
Q.5 Why replication is important and how it relates with Scalability? How are replicas kept
consistent? Explain the mechanism of Server-Initiated Replicas with suitable example.
→
Introduction
Replication is the technique of maintaining multiple copies of data or services at different
locations in a distributed system. These copies are called replicas. Replication plays a vital role
in improving performance, availability, reliability, and scalability of distributed systems. By
storing data at multiple sites, users can access information more quickly and the system can
continue functioning even if some servers fail.
Importance of Replication and its Relation with Scalability
Replication is important because it improves the performance of distributed systems by
placing copies of data closer to users. When users access a nearby replica instead of a distant
server, response time is reduced and data can be retrieved more quickly. Replication also
improves availability because if one server fails, another replica can continue providing the
required service. This increases system reliability and fault tolerance.
Replication is closely related to scalability because it allows the workload to be distributed
among multiple servers. As the number of users increases, additional replicas can be created
to handle the growing demand. Instead of a single server processing all requests, multiple
replicas share the workload, preventing bottlenecks and ensuring efficient resource
utilization. Thus, replication enables the system to scale and support a large number of users
without significant performance degradation.
How Replicas are Kept Consistent
Since multiple copies of data exist in the system, it is necessary to ensure that all replicas
contain the same updated information. This property is known as consistency. Whenever data
is modified at one replica, the update must be propagated to all other replicas. Consistency is
maintained using various consistency protocols such as primary-based protocols, active
replication, and update propagation mechanisms. In primary-based protocols, one replica acts
as the primary server and all updates are first performed on it before being propagated to
other replicas. In active replication, all replicas perform the same operation in the same order.
These mechanisms ensure that users always access the latest and correct version of data.
Server-Initiated Replicas
Server-Initiated Replication is a replication strategy in which the server automatically decides
when and where replicas should be created. The server continuously monitors the access
patterns and request rates of different data objects. When a particular file, webpage, or
service becomes highly popular and receives a large number of requests, the server detects
the increased demand and creates additional replicas at suitable locations.
These replicas are usually placed closer to the users generating the highest number of
requests. After creating the replicas, client requests are redirected to the nearest replica
server. This reduces network traffic, lowers response time, and distributes the load among
multiple servers. If the popularity of the content decreases, the server may remove
unnecessary replicas to save storage space and maintenance costs. Thus, the server
dynamically manages replicas according to workload conditions.
Suitable Example
Consider a video streaming platform where a newly released movie trailer becomes extremely
popular. Initially, all users access the trailer from the main server. As millions of requests start
arriving from different regions, the server detects the heavy workload and automatically
creates replicas of the trailer on servers located in India, Europe, and the USA. Users are then
redirected to the nearest replica server instead of accessing the original server. This reduces
access delay, balances the load, and improves the viewing experience. When the popularity
of the trailer decreases, some replicas may be removed automatically.
Diagram
Main Server
|
High Demand Detected
|
--------------------------------
| | |
v v v
Replica 1 Replica 2 Replica 3
(India) (Europe) (USA)
Users access the nearest replica server
Advantage
Improves system performance.
Reduces response time.
Supports scalability.
Balances load among servers.
Reduces network traffic.
Automatically adapts to workload changes
Disadvantages
Requires continuous monitoring of requests.
Additional storage is needed for replicas.
Maintaining consistency among replicas is difficult.
Replica management increases system complexity
Q.6 What is fault tolerance? Explain the transient, intermittent and permanent fault classes.
Explain in brief, various types of failures.
→
Introduction
A distributed system consists of multiple computers, processes, and communication networks
that work together to provide services. Since failures can occur in hardware, software, or
communication channels, the system should continue functioning correctly even when some
components fail. This capability is known as Fault Tolerance. Fault tolerance improves the
reliability, availability, and dependability of distributed systems by detecting faults and
recovering from them.
What is Fault Tolerance?
Fault Tolerance is the ability of a distributed system to continue providing services correctly
even when one or more components fail. The main objective of fault tolerance is to prevent
system failure and ensure continuous operation despite the occurrence of faults. Fault
tolerance is achieved through techniques such as replication, checkpointing, recovery
mechanisms, and redundancy. A fault-tolerant system can detect faults, isolate faulty
components, and recover from failures with minimal disruption to users.
Fault Classes
Faults are generally classified into three categories based on their occurrence pattern.
1. Transient Fault
A Transient Fault occurs only once and disappears automatically after a short period of time.
Once the fault disappears, the system starts functioning normally without requiring any repair.
These faults are usually caused by temporary environmental conditions such as power
fluctuations, electromagnetic interference, or temporary network congestion.
Example: A temporary communication error due to network congestion that disappears after
retransmission of the message.
2. Intermittent Fault
An Intermittent Fault occurs repeatedly and unpredictably. The fault appears, disappears, and
then reappears after some time. Such faults are difficult to diagnose because the system may
work correctly at one moment and fail at another.
Example: A loose network cable that sometimes disconnects and reconnects, causing
occasional communication failures.
3. Permanent Fault
A Permanent Fault continues to exist until the faulty component is repaired or replaced. The
system cannot recover from this type of fault automatically. Permanent faults are usually
caused by hardware damage or software defects.
Example: Failure of a hard disk, processor, or network interface card.
Diagram of Fault Classes
Transient Fault
|
Occurs Once
|
Disappears Automatically
Intermittent Fault
|
Appears → Disappears → Reappears
Permanent Fault
|
Occurs
|
Remains Until Repaired
Types of Failures
A failure occurs when a system component cannot perform its intended function. The major
types of failures in distributed systems are as follows:
1. Crash Failure
In a crash failure, a process or server stops functioning completely and does not produce any
further output. Once crashed, the process remains inactive until restarted.
Example: A server suddenly shuts down due to power failure.
2. Omission Failure
An omission failure occurs when a process fails to send or receive messages. The message may
be lost during transmission due to network problems.
Example: A client sends a request, but the server never receives it.
3. Timing Failure
A timing failure occurs when a process produces a response too early or too late compared to
the expected time limits. This type of failure is important in real-time systems.
Example: A banking transaction response arriving after the specified deadline.
[Link] Failure
A response failure occurs when a process produces an incorrect response even though it
responds within the expected time.
There are two forms:
a) Value Failure
The process returns an incorrect result.
Example: A calculator service returns 2 + 2 = 5.
b) State Transition Failure
The process performs an incorrect sequence of operations.
Example: A database enters an invalid state after processing a transaction.
Arbitrary (Byzantine) Failure
An arbitrary or Byzantine failure occurs when a process behaves unpredictably and produces
inconsistent or incorrect results. It may send different responses to different processes.
Example: A compromised server sending conflicting information to clients.
Advantages of Fault Tolerance
Fault tolerance provides continuous service availability,
improves system reliability,
minimizes downtime,
supports recovery from failures,
increases user confidence in distributed applications.
Q.7 iscuss and compare “Push versus Pull Protocols” propagation design issue for content
distribution
→
Introduction
In distributed systems, replicated data must be updated whenever changes occur. The process
of sending updates from one replica to other replicas is known as update propagation. Two
important approaches used for propagating updates are Push Protocol and Pull Protocol.
These protocols determine how updated information is distributed among replicas and play
an important role in maintaining consistency and performance in content distribution
systems.
Push Protocol
In the Push Protocol, the server actively sends updates to all replicas whenever the content is
modified. The replicas do not need to request updates because the server automatically
propagates the changes. This approach ensures that replicas receive updates immediately
after a modification occurs.
For example, consider a news website where breaking news is updated frequently. Whenever
the news content changes, the main server immediately sends the updated information to all
replica servers. As a result, users accessing any replica receive the latest news without delay.
Diagram
Advantages of Push Protocol
1. Updates are distributed immediately.
2. Replicas remain highly consistent.
3. Suitable for frequently changing data.
4. Users always access recent information.
Disadvantages of Push Protocol
1. High communication overhead.
2. Unnecessary updates may be sent.
3. Increased network traffic.
4. Less efficient when updates are rare.
Pull Protocol
In the Pull Protocol, replicas periodically check with the server to determine whether any
updates are available. The server does not send updates automatically. Instead, replicas
request updates whenever they need them.
For example, consider a software update service. Client systems periodically contact the
server to check whether a new software version is available. If an update exists, it is
downloaded; otherwise, no data is transferred.
Diagram
Advantages of Pull Protocol
1. Reduces unnecessary communication.
2. Efficient when updates are infrequent.
3. Lower network traffic.
4. Better scalability for large systems.
Disadvantages of Pull Protocol
1. Updates may be delayed.
2. Replicas may temporarily contain outdated data.
3. Consistency is weaker compared to push protocols.
4. Requires periodic polling.
Comparison between Push and Pull Protocols
Feature Push Protocol Pull Protocol
Update Initiation Server initiates updates Replica initiates updates
Consistency High consistency Lower consistency
Network Traffic Higher Lower
Update Delay Very low May be high
Suitable For Frequently changing data Rarely changing data
Communication Overhead High Low
Scalability Moderate Better
Q.8 Describe the following independent axes for defining inconsistencies with examples:
i) Deviation in numerical values between replicas.
ii) Deviation in staleness between replicas.
iii) Deviation with respect to ordering of update operations
→
Introduction
In distributed systems, multiple replicas of the same data are maintained at different
locations. Due to communication delays, network failures, and update propagation delays,
replicas may not always contain identical information. Such differences are known as
inconsistencies. Inconsistencies can be measured using three independent axes: deviation in
numerical values, deviation in staleness, and deviation in the ordering of update operations.
i) Deviation in Numerical Values Between Replicas
Deviation in numerical values refers to the difference in the actual data values stored at
different replicas. This type of inconsistency occurs when an update has been applied to one
replica but has not yet reached other replicas.
For example, consider a bank account replicated at two servers. Initially, both replicas contain
a balance of ₹10,000. If a deposit of ₹2,000 is made and only one replica is updated, then the
first replica will show ₹12,000 while the second replica will still show ₹10,000. The difference
of ₹2,000 represents the numerical deviation between the replicas.
Replica A = ₹12,000
Replica B = ₹10,000
Numerical Deviation = ₹2,000
This type of inconsistency directly affects the correctness of the data seen by users.
ii) Deviation in Staleness Between Replicas
Staleness refers to how outdated a replica is compared to the most recent version of the data.
A replica is considered stale if it has not yet received the latest updates.
For example, suppose a news website updates a headline at 10:00 AM on the primary server.
Replica A receives the update immediately, but Replica B receives it at 10:05 AM. During this
five-minute interval, Replica B contains outdated information and is considered stale.
Primary Server Updated : 10:00 AM
Replica A Updated : 10:00 AM
Replica B Updated : 10:05 AM
Staleness = 5 Minutes
Staleness is usually measured in terms of time delay or the number of missing updates.
iii) Deviation with Respect to Ordering of Update Operations
This type of inconsistency occurs when replicas receive the same updates but apply them in
different orders. Since some operations depend on the order in which they are executed,
different ordering can lead to different final states.
For example, consider two updates on a bank account:
U1: Deposit ₹1000
U2: Withdraw ₹500
Replica A applies the updates in the order:
U1 → U2
while Replica B applies them in the order:
U2 → U1
Although both replicas receive the same updates, different execution orders may temporarily
produce different results. In distributed databases, maintaining the correct ordering of
updates is essential for ensuring consistency.
Replica A : U1 → U2
Replica B : U2 → U1
This inconsistency is related to the sequence in which updates are processed rather than the
values themselves.
Q.9 What is client-centric consistency model? Explain monotonic read consistency with
suitable example.
→
Introduction
In distributed systems, data is often replicated at multiple servers to improve performance
and availability. Due to replication delays, different replicas may temporarily contain different
versions of the same data. The Client-Centric Consistency Model focuses on providing
consistency guarantees from the viewpoint of an individual client rather than ensuring strict
consistency among all replicas.
Client-centric consistency is particularly useful in systems where users frequently move
between different replica servers, such as web services, cloud storage systems, and distributed
databases.
What is Client-Centric Consistency Model?
A Client-Centric Consistency Model guarantees that a client observes data in a consistent
manner while interacting with different replicas. It ensures that the client's view of the data
becomes more consistent over time, even if the entire system is only eventually consistent.
The main objective of this model is to provide a predictable and correct view of data to
individual users despite the presence of multiple replicas and update delays.
The four important client-centric consistency models are:
1. Monotonic Read Consistency
2. Monotonic Write Consistency
3. Read Your Writes Consistency
4. Writes Follow Reads Consistency
Monotonic Read Consistency
Definition
Monotonic Read Consistency guarantees that once a client has read a particular version of a
data item, the client will never see an older version of that data item in subsequent reads.
In simple words, a client can move only forward in time and can never read stale data after
reading a newer version.
Working of Monotonic Read Consistency
When a client reads data from a replica, the system records the version that has been seen by
the client. If the client later accesses another replica, the system ensures that the new replica
contains at least the same version or a newer version of the data. If the replica is outdated, it
is first updated before serving the request. This prevents the client from seeing older
information.
Suitable Example
Easy Example of Monotonic Read Consistency
Consider WhatsApp messages.
Suppose you open WhatsApp on your phone and read a chat. You see the latest message:
"Meeting at 5 PM"
This message is stored at Replica Server A.
Later, you travel to another city and your phone connects to Replica Server B.
Monotonic Read Consistency ensures that you will not see an older version of the chat, such
as:
"Meeting at 4 PM"
because you have already seen the newer message "Meeting at 5 PM".
Before showing the chat, Server B must update itself and provide the same or newer
messages.
Diagram
Advantages of Monotonic Read Consistency
1. Prevents Reading Old Data
2. Improves User Experience
3. Suitable for Mobile Users
4. Easy to Understand
Disadvantages of Monotonic Read Consistency
1. Additional Synchronization Required
2. Increased Communication Overhead
3. May Increase Response Time.
Q.10 What is the distribution commit problem? Discuss how this problem is solved using
the two-phase commit protocol with suitable diagram.
→
Introduction
In a distributed system, a transaction may involve multiple sites or databases. For example, a
bank transaction may update accounts stored on different servers. The main challenge is to
ensure that either all participating sites commit the transaction or all sites abort it. If some
sites commit while others abort, the database becomes inconsistent. This problem is known
as the Distributed Commit Problem.
What is the Distributed Commit Problem?
The Distributed Commit Problem is the problem of ensuring that all participating processes
in a distributed transaction reach the same decision regarding commit or abort. The system
must guarantee atomicity, which means that the transaction is either completed successfully
at all sites or rolled back completely at all sites.
For example, during an online bank transfer of ₹5000 from Account A to Account B, money
must be deducted from Account A and added to Account B. If one operation succeeds and the
other fails, the database becomes inconsistent. Therefore, all participating servers must agree
on a common decision.
Two-Phase Commit (2PC) Protocol
The Two-Phase Commit Protocol (2PC) is a distributed transaction protocol used to solve the
distributed commit problem. It ensures that all participating sites either commit or abort the
transaction together.
The protocol consists of:
1. Coordinator Process
2. Participant Processes
The execution occurs in two phases:
Phase 1: Voting Phase (Prepare Phase)
Phase 2: Commit Phase (Decision Phase)
Diagram
Coordinator
|
--------------------------
| | |
v v v
P1 P2 P3
(Participants)
Phase 1: Voting (Prepare Phase)
In the first phase, the coordinator asks all participants whether they are ready to commit the
transaction.
The coordinator sends a PREPARE message to all participants. Each participant checks
whether it can successfully complete the transaction. If it is ready, it sends a YES vote. If it
encounters any problem, it sends a NO vote.
Coordinator
|
|---- PREPARE ----> P1
|---- PREPARE ----> P2
|---- PREPARE ----> P3
P1 ---- YES ---->
P2 ---- YES ---->
P3 ---- YES ---->
If all participants vote YES, the transaction can proceed to the commit phase.
Phase 2: Commit (Decision Phase)
After receiving votes from all participants, the coordinator makes the final decision.
Case 1: All Participants Vote YES
If all participants send YES votes, the coordinator sends a COMMIT message to all participants.
Each participant commits the transaction and sends an acknowledgment back to the
coordinator.
Coordinator
|
|---- COMMIT ----> P1
|---- COMMIT ----> P2
|---- COMMIT ----> P3
P1 ---- ACK ---->
P2 ---- ACK ---->
P3 ---- ACK ---->
The transaction is successfully completed.
Case 2: Any Participant Votes NO
If any participant sends a NO vote, the coordinator sends an ABORT message to all
participants. All participants roll back the transaction.
Coordinator
|
|---- ABORT ----> P1
|---- ABORT ----> P2
|---- ABORT ----> P3
Thus, no partial updates remain in the system.
Working Example
Consider a bank transfer transaction:
Transfer ₹5000
Account A → Account B
The transaction involves two servers:
• Server A (Debit Account A)
• Server B (Credit Account B)
The coordinator sends PREPARE messages to both servers. If both servers reply YES, the
coordinator sends COMMIT messages and the transfer is completed. If either server replies
NO, the coordinator sends ABORT messages and the transaction is cancelled. Thus, both
databases remain consistent.
Advantages of Two-Phase Commit Protocol
1. Ensures Atomicity
2. Maintains Consistency
3. Reliable Transaction Management
4. Widely Used
Disadvantages of Two-Phase Commit Protocol
1. Blocking Problem
2. Communication Overhead
3. Slow Performance
4. Coordinator as Single Point of Failure
Q.11 Discuss the two important reasons for wanting to replicate data and how does
replication relate to scalability
→
Introduction
Replication is the process of maintaining multiple copies of the same data at different
locations in a distributed system. These copies are called replicas. Replication is an important
technique used to improve the performance, availability, and reliability of distributed systems.
It also plays a significant role in achieving scalability by distributing workload among multiple
servers.
1. Improving Reliability and Availability
One of the most important reasons for replicating data is to improve the reliability and
availability of the system. In a distributed environment, servers may fail because of hardware
faults, software crashes, or network problems. If data is stored on only one server, users will
not be able to access it when that server fails. By maintaining multiple replicas at different
locations, the system can continue providing services even if one or more servers become
unavailable. Users can access the required data from another replica, ensuring uninterrupted
service. Thus, replication increases fault tolerance and makes the system more dependable.
Example
Consider an online banking system where customer account information is stored on three
different servers. If one server crashes, customers can still access their account information
through the other servers. Therefore, the banking service remains available despite the
failure.
2. Improving Performance
Another important reason for replication is to improve system performance. In distributed
systems, users may be located in different geographical regions. If all users access a single
central server, network delays and server overload can occur. By placing replicas closer to
users, data can be accessed more quickly, reducing response time and communication costs.
Replication also distributes requests among multiple servers, reducing the workload on any
single server and improving overall system efficiency.
Example
A video streaming platform such as YouTube stores copies of popular videos on servers located
in different countries. When a user requests a video, the nearest server provides the content,
resulting in faster loading and smoother playback.
Relationship Between Replication and Scalability
Scalability refers to the ability of a distributed system to handle increasing numbers of users,
requests, or workloads without significant performance degradation. Replication contributes
directly to scalability because it distributes workload among multiple servers. As the number
of users grows, additional replicas can be created to share the increased load. Instead of
sending all requests to a single server, requests are distributed among several replicas,
preventing bottlenecks and improving system capacity.
Replication also supports geographical scalability by allowing servers to be deployed in
different regions around the world. Users can access nearby replicas, reducing latency and
network congestion. As demand increases, more replicas can be added without major changes
to the system architecture.
Example
Suppose a website initially serves 10,000 users using one server. As the number of users grows
to 1 million, the server may become overloaded. By creating multiple replicas of the website
on different servers, user requests can be distributed among them. This enables the system
to handle the increased workload efficiently.
Diagram
Clients
|
----------------------------------
| | |
v v v
Replica 1 Replica 2 Replica 3
| | |
----------------------------------
|
Shared Data
In this system, requests are distributed among multiple replicas, improving performance,
availability, and scalability.
Advantages of Replication
1. Improves availability of data.
2. Provides fault tolerance.
3. Reduces response time.
4. Balances workload among servers.
5. Supports scalability.
6. Improves system performance and reliability.
Q.12 What is a primary-based protocol in a consistency protocol? Explain the working of
primary-backup protocol with suitable diagram
→
Introduction
In distributed systems, multiple copies of data are maintained at different locations to improve
availability and performance. However, whenever data is updated, all replicas must remain
consistent. A Primary-Based Protocol is a consistency protocol that ensures consistency by
assigning one replica as the primary replica. All write operations are first performed on the
primary replica, and then the updates are propagated to backup replicas.
What is a Primary-Based Protocol?
A Primary-Based Protocol is a consistency protocol in which one replica is designated as the
primary server (master) and all other replicas act as backup servers (slaves). The primary
replica is responsible for processing all write operations. After updating its own copy, it
propagates the changes to all backup replicas. Read operations may be performed on either
the primary or backup replicas depending on the system design.
This approach ensures that updates occur in a controlled manner and prevents conflicting
modifications from occurring at different replicas.
Primary-Backup Protocol
The Primary-Backup Protocol is the most common implementation of a primary-based
protocol. In this protocol, one server acts as the primary server while the remaining servers
maintain backup copies of the data. Whenever a client wants to modify data, the request is
sent to the primary server. The primary updates its local copy and then sends the updated
information to all backup servers. Once the backups acknowledge the update, the transaction
is considered complete.
Diagram
Client
|
|
Write Request
|
v
Primary Server
|
-------------------------
| |
v v
Backup Server 1 Backup Server 2
Working of Primary-Backup Protocol
When a client wants to update data, it sends a write request to the primary server. The primary
server first performs the update on its local copy of the data. After successfully updating its
own copy, the primary server propagates the updated data to all backup servers. Each backup
server updates its local replica and sends an acknowledgment back to the primary server.
Once acknowledgments are received from the backups, the primary server confirms
successful completion of the operation to the client. In this way, all replicas remain consistent.
Example
Consider a bank account replicated on three servers:
Primary Server : Balance = ₹10,000
Backup 1 : Balance = ₹10,000
Backup 2 : Balance = ₹10,000
A customer deposits ₹2,000.
Step 1
The client sends the deposit request to the primary server.
Deposit ₹2,000
Step 2
The primary server updates its balance.
Primary Server = ₹12,000
Step 3
The primary sends the update to Backup 1 and Backup 2.
Backup 1 = ₹12,000
Backup 2 = ₹12,000
Step 4
The backups acknowledge the update.
Step 5
The primary server informs the client that the transaction has been completed successfully.
Now all replicas contain the same updated balance.
Advantages of Primary-Backup Protocol
1. Simple Consistency Management
2. Prevents Conflicting Updates
3. High Availability
4. Easy Recovery
5. Reliable Data Management
Disadvantages of Primary-Backup Protocol
1. Primary Server Bottleneck
2. Single Point of Failure
3. Communication Overhead
4. Increased Update Latency