0% found this document useful (0 votes)

4 views30 pages

Logical Time and Global State in Systems

This document discusses logical time and global state in distributed systems, covering concepts such as scalar and vector time, message ordering, and snapshot recording algorithms. It details the rules governing logical clocks, physical clock synchronization using NTP, and various message ordering paradigms including FIFO and causal order. Additionally, it explores group communication, rendezvous systems, and the implications of asynchronous executions on program order and communication efficiency.

Uploaded by

subashkarthikeyan31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views30 pages

Logical Time and Global State in Systems

Uploaded by

subashkarthikeyan31

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT II

LOGICAL TIME AND GLOBAL STATE

Logical Time: Physical Clock Synchronization: NTP – A Framework for a

System of Logical Clocks- Scalar Time – Vector Time; Message Ordering and Group
Communication: Message Ordering Paradigms – Asynchronous Execution with
Synchronous Communication – Synchronous Program Order on Asynchronous System –
Group Communication – Causal Order – Total Order; Global State and Snapshot
Recording Algorithms: Introduction – System Model and Definitions – Snapshot
Algorithms for FIFO Channels
Protocol:
The protocol ensures that a process’s logical clock, and thus its view of the global time, is managed
consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and other
operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the global time and
global progress. It dictates what information about the logical time is piggybacked in a message and how
this information is used by the receiving process to update its view of the global time.

2.1.1 SCALAR TIME

Scalar time is designed by Lamport to synchronize all the events in distributed systems. A
Lamport logical clock is an incrementing counter maintained in each process. When a process receives a
message, it resynchronizes its logical clock with that sender maintaining causal relationship.
The Lamport’s algorithm is governed using the following rules:
• The algorithm of Lamport Timestamps can be captured in a few rules:
• All the process counters start with value 0.
• A process increments its counter for each event (internal event, message sending, message receiving) in
that process.
• When a process sends a message, it includes its (incremented) counter value with the message.
• On receiving a message, the counter of the recipient is updated to the greater of its current counter and the
timestamp in the received message, and then incremented by one.

• If Ci is the local clock for process Pi then,

• if a and b are two successive events in Pi, then Ci(b) = Ci(a) + d1, where d1 > 0
• if a is the sending of message m by Pi, then m is assigned timestamp tm = Ci(a)
if b is the receipt of m by Pj, then Cj(b) = max{Cj(b), tm + d2}, where d2 > 0
• Rules of Lamport’s clock

Rule 1: Ci(b) = Ci(a) + d1, where d1 > 0

Rule 2: The following actions are implemented when pi receives a message m with
timestamp Cm:
a) C = max(C , C
i i m)
b)
Execute Rule 1
c)
deliver the message

Fig 1.20: Evolution of scalar time

[Link] counting
If an event e has timestamp vh[i], vh[j] denotes the number of events executed by process pj that
causally precede e.

2.2 PHYSICAL CLOCK SYNCHRONIZATION: NEWTWORK TIME PROTOCOL (NTP)

Centralized systems do not need clock synchronization, as they work under a common clock.
But the distributed systems do not follow common clock: each system functions based on its own
internal clock and its own notion of time. The time in distributed systems is measured in the following
contexts:
• The time of the day at which an event happened on a specific machine in the network.
• The time interval between two events that happened on different machines in the network.
• The relative ordering of events that happened on different machines in the network.
Clock synchronization is the process of ensuring that physically distributed processors have a
common notion of time.
Due to different clocks rates, the clocks at various sites may diverge with time, and periodically a
clock synchronization must be performed to correct this clock skew in distributed systems. Clocks are
synchronized to an accurate real-time standard like UTC (Universal Coordinated Time). Clocks that
must not only be synchronized with each other but also have to adhere to physical time are termed
physical clocks. This degree of synchronization additionally enables to coordinate and schedule
actions between multiple computers connected to a common network.
Basic terminologies:
If Ca and Cb are two different clocks, then:
• Time: The time of a clock in a machine p is given by the function Cp(t),where Cp(t)= tfor a perfect
clock.
• Frequency: Frequency is the rate at which a clock progresses. The frequency at time t of clock Ca is
Ca’(t).
• Offset: Clock offset is the difference between the time reported by a clock and the real time. The
offset of the clock Ca is given by Ca(t)− t. The offset of clock C a relative toCb at time t ≥ 0 is given
by Ca(t)- Cb(t)
• Skew: The skew of a clock is the difference in the frequencies of the clock and theperfect clock. The
skew of a clock Ca relative to clock Cb at timet is Ca’(t)- Cb’(t).
• Drift (rate): The drift of clock Ca the second derivative of the clockvalue with respectto time. The

drift is calculated as:

small, the clock offset and roundtrip delay of B relative to A at time T4 are approximately given by the
following:

Each NTP message includes the latest three timestamps T1, T2, andT3, while T4 is determined upon
arrival.

2.3 MESSAGE ORDERING AND GROUP COMMUNICATION

As the distributed systems are a network of systems at various physical locations, the
coordination between them should always be preserved. The message ordering means the order of
delivering the messages to the intended recipients. The common message order schemes are First in
First out (FIFO), non FIFO, causal order and synchronous order. In case of group communication with
multicasting, the causal and total ordering scheme is followed. It is also essential to define the
behaviour of the system in case of failures. The following are the notations that are widely used in
this chapter:
• Distributed systems are denoted by a graph (N, L).
• The set of events are represented by event set {E}
• Message is denoted as mi: send and receive events as si and ri respectively.
• Send (M) and receive (M) indicates the message M send and received.
• a b denotes a and b occurs at the same process
• The send receive pairs ={(s, r) Ei x Ej corresponds to r}

2.3.1MESSAGE ORDERING PARADIGMS

The message orderings are
(i) non-FIFO
(ii) FIFO
(iii) causal order
(iv) synchronous order

There is always a trade-off between concurrency and ease of use and implementation.

Asynchronous Executions
An asynchronous execution (or A-execution) is an execution (E, ) for which the causality
relation is a partial order.
• There cannot be any causal relationship between events in asynchronous execution.
• The messages can be delivered in any order even in non FIFO.
• Though there is a physical link that delivers the messages sent on it in FIFO order due to the
physical properties of the medium, a may be formed as a composite of physical links and
multiple paths may exist between the two end points of the logical link.
Realizable Synchronous Communication (RSC)
A-execution can be realized under synchronous communication is called a realizable with
synchronous communication (RSC).
An execution can be modeled to give a total order that extends the partial order (E, ≺).
In an A-execution, the messages can be made to appear instantaneous if there exist a linear
extension of the execution, such that each send event is immediately followed by its corresponding
receive event in this linear extension.
Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺) such that for
each pair (s, r) ∈ T, the interval { x∈ E s ≺ x ≺ r } is empty.
A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear extension of
the partial order (E, ≺).
In the non-separated linear extension, if the adjacent send event and its corresponding receive event are
viewed atomically, then that pair of events shares a common past and a common future with each other.
Crown
Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of pairs of
corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1, sk−1 ≺ r0.
The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies may exist in a
crown. The crown criterion states that an A-computation is RSC, i.e., it can be realized on a system with
synchronous communication, if and only if it contains no crown.
Timestamp criterion for RSC execution
An execution (E, ≺) is RSC if and only if there exists a mapping from E to T (scalar timestamps)
such that

Hierarchy of ordering paradigms

The orders of executions are:

 Synchronous order (SYNC)
 Causal order (CO)
 FIFO order (FIFO)
 Non FIFO order (non-FIFO)
The Execution order have the following results
 For an A-execution, A is RSC if and only if A is an S-execution.
 RSC ⊂ CO ⊂ FIFO ⊂ A

 This hierarchy is illustrated in Figure 2.3(a), and example executions of each
class are shown side-by-side in Figure 2.3(b)
 The above hierarchy implies that some executions belonging to a class X will not
belong to any of the classes included in X. The degree of concurrency is most in A
and least in SYNC.
 A program using synchronous communication is easiest to develop and verify.
 A program using non-FIFO communication, resulting in an A execution, is hardest
to design and verify.

Fig (a) Fig (b)

Fig 2.3: Hierarchy of execution classes

Simulations
 The events in the RSC execution are scheduled as per some non-separated linear
extension, and adjacent (s, r) events in this linear extension are executed sequentially
in the synchronous system.
 The partial order of the asynchronous execution remains unchanged.
 If an A-execution is not RSC, then there is no way to schedule the events to make
them RSC, without actually altering the partial order of the given A-execution.
 However, the following indirect strategy that does not alter the partial order can be
used.
 Each channel Ci,j is modeled by a control process Pi,j that simulates the channel buffer.
 An asynchronous communication from i to j becomes a synchronous communication
from i to Pi,j followed by a synchronous communication from Pi,j to j.
 This enables the decoupling of the sender from the receiver, a feature that is essential
in asynchronous systems.

Fig 2.4: Modeling channels as processes to simulate an execution using

asynchronous primitives on synchronous system
Synchronous programs on asynchronous systems
 A (valid) S-execution can be trivially realized on an asynchronous system by
scheduling the messages in the order in which they appear in the S-execution.
 The partial order of the S-execution remains unchanged but the communication occurs
on an asynchronous system that uses asynchronous communication primitives.
 Once a message send event is scheduled, the middleware layer waits for
acknowledgment; after the ack is received, the synchronous send primitive completes.

2.5 SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS

SYSTEM

Non deterministic programs

The partial ordering of messages in the distributed systems makes the repeated runs of the
same program will produce the same partial order, thus preserving deterministic nature.
But sometimes the distributed systems exhibit non determinism:

• A receive call can receive a message from any sender who has sent a message, if the
expected sender is not specified.
• Multiple send and receive calls which are enabled at a process can be executed in an
interchangeable order.
• If i sends to j, and j sends to i concurrently using blocking synchronous calls, there
results a deadlock.
• There is no semantic dependency between the send and the immediately following
receive at each of the processes. If the receive call at one of the processes can be
scheduled before the send call, then there is no deadlock.

Rendezvous
Rendezvous systems are a form of synchronous communication among an arbitrary
number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems are
possible:

• Binary rendezvous: When two processes agree to synchronize.

• Multi-way rendezvous: When more than two processes agree to synchronize.

Features of binary rendezvous:

• For the receive command, the sender must be specified. However, multiple recieve
commands can exist. A type check on the data is implicitly performed.

• Send and received commands may be individually disabled or enabled. A command is

disabled if it is guarded and the guard evaluates to false. The guard would likely contain
an expression on some local variables.
• Synchronous communication is implemented by scheduling messages under the covers
using asynchronous communication.
• Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not alter
the partial order of the execution.

2.3.2 Binary rendezvous algorithm

If multiple interactions are enabled, a process chooses one of them and tries to synchronize with the p
process. The problem reduces to one of scheduling messages satisfying the following constraints:

• Schedule on-line, atomically, and in a distributed manner.

• Schedule in a deadlock-free manner (i.e., crown-free).
• Schedule to satisfy the progress property in addition to the safety property.

Steps in Bagrodia algorithm

1. Receive commands are forever enabled from all processes.

2. A send command, once enabled, remains enabled until it completes, i.e., it is not possible that a
send command gets before the send is executed.

3. To prevent deadlock, process identifiers are used to introduce asymmetry to break potential
crowns that arise.

4. Each process attempts to schedule only one send event at any time.
(4) Message M arrival at Pi from a higher priority process Pj:

At the time a message M is processed by Pi, process Pi executes RECEIVE(M) (which is assumed to
be always enabled) and then send(ack(M)) to Pj .

(5) Processing when Pi is unblocked:

When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it as a
message arrival (as per rules 3 or 4).

Fig 2.5: Bagrodia Algorithm

2.6 GROUP COMMUNICATION

Group communication is done by broadcasting of messages. A message broadcast is the sending
of a message to all members in the distributed system. The communication may be
• Multicast: A message is sent to a certain subset or a group.
• Unicasting: A point-to-point message communication.
The network layer protocol cannot provide the following functionalities:
▪ Application-specific ordering semantics on the order of delivery of messages.
▪ Adapting groups to dynamically changing membership.
▪ Sending multicasts to an arbitrary set of processes at each send event.
▪ Providing various fault-tolerance semantics.
▪ The multicast algorithms can be open or closed group.
Differences between closed and open group algorithms

Closed group algorithms Open group algorithms

If sender is also one of the receiver in the If sender is not a part of the communication
multicast algorithm, then it is closed group group, then it is open group algorithm.
algorithm.
They are specific and easy to implement. They are more general, difficult to design and
expensive.
It does not support large systems where client It can support large systems.
processes have short life.

2.7 CAUSAL ORDER (CO)

In the context of group communication, there are two modes of communication:
causal order and total order. Given a system with FIFO channels, causal order needs to be explicitly
enforced by a protocol. The following two criteria must be met by a causal ordering protocol:
• Safety: In order to prevent causal order from being violated, a message M that arrives at a
process may need to be buffered until all system wide messages sent in the causal past of the
send (M) event to that same destination have already arrived. The arrival of a message is
transparent to the application process. The delivery event corresponds to the receive event in
the execution model.
• Liveness: A message that arrives at a process must eventually be delivered to the process.
The Raynal–Schiper–Toueg algorithm
• Each message M should carry a log of all other messages sent causally before M’s send event,
and sent to the same destination dest(M).
•The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of several algorithms
that reduces the size of the local space and message space overhead by various techniques.
• This log can then be examined to ensure whether it is safe to deliver a message.

 Each message M should carry a log of all other messages sent causally before M’s
send event, and sent to the same destination dest(M).
 The Raynal–Schiper–Toueg algorithm canonical algorithm is a representative of
several algorithms that reduces the size of the local space and message space
overhead by various techniques.
 This log can then be examined to ensure whether it is safe to deliver a message.
 All algorithms aim to reduce this log overhead, and the space and time overhead of
maintaining the log information at the processes.
 To distribute this log information, broadcast and multicast communication is used.
 The hardware-assisted or network layer protocol assisted multicast cannot efficiently
provide features:
 Application-specific ordering semantics on the order of delivery of messages.
 Adapting groups to dynamically changing membership.
 Sending multicasts to an arbitrary set of processes at each send event.
 Providing various fault-tolerance semantics

2.1 Causal Order (CO)

An optimal CO algorithm stores in local message logs and propagates on messages,
information of the form d is a destination of M about a message M sent in the causal past, as
long as and only as long as:
Propagation Constraint I: it is not known that the message M is delivered to d.
Propagation Constraint II: it is not known that a message has been sent to d in the causal
future of Send(M), and hence it is not guaranteed using a reasoning based on transitivity that
the message M will be delivered to d in CO.
Fig 2.6: Conditions for causal ordering

“d ∈ [Link]” must not be stored or propagated, even to remember that (I) or (II) has been
The Propagation Constraints also imply that if either (I) or (II) is false, the information

falsified:

not in the causal future of e k, c where d ∈Mk,cDests and there is no

 not in the causal future of Deliverd(M1, a)


other message sent causally between Mi,a and Mk, c to the same
destination d.

Information about messages:

(i) not known to be delivered
(ii) not guaranteed to be delivered in CO, is explicitly tracked by the algorithm using
(source, timestamp, destination) information.
Information about messages already delivered and messages guaranteed to be delivered in
CO is implicitly tracked without storing or propagating it, and is derived from the explicit
information. The algorithm for the send and receive operations is given in Fig. 2.7 a) and b).
Procedure SND is executed atomically. Procedure RCV is executed atomically except for a
possible interruption in line 2a where a non-blocking wait is required to meet the Delivery
Condition.
Fig 2.7 a) Send algorithm by Kshemkalyani–Singhal to optimally implement causal
ordering

Fig 2.7 b) Receive algorithm by Kshemkalyani–Singhal to optimally implement causal

ordering

The data structures maintained are sorted row–major and then column–major:
1. Explicit tracking:
 Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed to be delivered in CO, is done explicitly using the [Link]
field of entries in local logs at nodes and [Link] field of entries in messages.
 Sets li,aDestsand oi,a. Dests contain explicit information of destinations to which M i,ais not

 The information about d ∈Mi,a .Dests is propagated up to the earliest events on all causal paths
guaranteed to be delivered in CO and is not known to be delivered.

from (i, a) at which it is known that M i,a is delivered to d or is guaranteed to be delivered to d

in CO.

2. Implicit tracking:
 Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be delivered
in CO, is performed implicitly.
 The information about messages (i) already delivered or (ii) guaranteed to be delivered in
CO is deleted and not propagated because it is redundant as far as enforcing CO is
concerned.
 It is useful in determining what information that is being carried in other messages and is
being storedin logs at other nodes has become redundant and thus can be purged.
 These mantics are implicitly stored and propagated. This information about messages that
are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked without
explicitly storing it.
 The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining only
oi,aDests or li,aDests, which is a part of the explicit information.

Fig 2.8: Illustration of propagation constraints

Constraints Multicasts M5,1and M4,1

Message M5,1 sent to processes P4 and P6 contains the piggybacked information M5,1.

inserted in the local log Log5. When M5,1 is delivered to P6, the (new) piggybacked information P4 ∈
Dest= {P4, P6}. Additionally, at the send event (5, 1), the information M5,[Link] = {P4,P6} is also
M5,1 .Dests is stored in Log6 as M5,[Link] ={P4} information about P6 ∈ M5,[Link] which was
needed for routing, must not be stored in Log6 because of constraint
I.

information P6 ∈ M5,1 .Dests is inserted in Log4 as M5,[Link] =P6which is later propagated during
In the same way when M5,1 is delivered to process P4 at event (4, 1), only the new piggybacked

multicast M4,2.

At event (4, 3), the information P6 ∈M5,[Link] in Log4 is propagated on multicast M4,3only to
Multicast M4,3

process P6 to ensure causal delivery using the Delivery Condition. The piggybacked information on
message M4,3sent to process P3must not contain this information because of constraint II. As long as

in causal order w.r.t. M5,1. And as M5,1 is already delivered to P4, the information M5,1Dests = ∅ is
any future message sent to P6 is delivered in causal order w.r.t. M4,3sent to P6, it will also be delivered

piggybacked on M4,3 sent to P 3. Similarly, the information P6 ∈ M5,1Dests must be deleted from
Log4 as it will no longer be needed, because of constraint II. M5,1Dests = ∅ is stored in Log4 to
remember that M5,1 has been delivered or is guaranteed to be delivered in causal order to all its
destinations.

Learning implicit information at P2 and P3

When message M4,2is received by processes P2 and P3, they insert the (new) piggybacked information in
their local logs, as information M5,[Link] = P6. They both continue to store this in Log2 and Log3 and
propagate this information on multicasts until they learn at events(2, 4) and (3, 2) on receipt of messages
M3,3and M4,3, respectively, that any future message is expected to be delivered in causal order to process
P6, w.r.t. M5,1sent toP6. Hence by constraint II, this information must be deleted from Log2 andLog3. The

• When M4,3 with piggybacked information M5,1Dests = ∅ is received byP3at (3, 2), this is inferred
flow of events is given by;

to be valid current implicit information about multicast M5,1because the log Log3 already contains
explicit informationP6 ∈M5,[Link] about that multicast. Therefore, the explicit information in
Log3 is inferred to be old and must be deleted to achieve optimality. M5,1Dests is set to ∅ in Log3.
• The logic by which P2 learns this implicit knowledge on the arrival of M3,3is identical.

Processing at P6
When message M5,1 is delivered to P6, only M5,[Link] = P4 is added to Log6. Further, P6 propagates
only M5,[Link] = P4 on message M6,2, and this conveys the current implicit information M5,1 has
been delivered to P6 by its very absence in the explicit information.

• When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests = P6 it is

used only to ensure causal delivery of M4,3 using the Delivery
Condition,and is not inserted in Log6 (constraint I) – further, the presence of M5,1 .Dests = P4
in Log6 implies the implicit information that M5,1 has already been delivered to P6. Also, the
absence of P4 in M5,1 .Dests in the explicit piggybacked information implies the implicit

and, therefore, M5,1. Dests is set to ∅ in Log6.

information that M5,1 has been delivered or is guaranteed to be delivered in causal order to P4,

• When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests = {P4,
P6} it is used only to ensure causal delivery of M4,3 using the Delivery Condition, and is not
inserted in Log6 because Log6 contains M5,1 .Dests = ∅, which gives the implicit information
that M5,1 has been delivered or is guaranteed to be delivered in causal order to both P4 and P6.

Processing at P1
• When M2,2arrives carrying piggybacked information M5,[Link] = P6 this (new)information
is inserted in Log1.

•
information M5,1has been delivered to P6 by the very absence of explicit information P6 ∈
When M6,2arrives with piggybacked information M5,[Link] ={P4}, P1learns implicit

M5,[Link] in the piggybacked information, and hence marks information P6 ∈ M5,1Dests

for deletion from Log1

The information “P6 ∈M5,[Link] piggybacked on M2,3,which arrives at P 1, is inferred to

be outdated using the implicit knowledge derived from M5,[Link]= ∅” inLog1.

2.2 TOTAL ORDER

For each pair of processes Pi and Pj and for each pair of messages Mx and My that are delivered to
both the processes, Pi is delivered Mx before My if and only if Pj is delivered Mxbefore My.

Centralized Algorithm for total ordering

Each process sends the message it wants to broadcast to a centralized process, which
relays all the messages it receives to every other process over FIFO channels.

Complexity: Each message transmission takes two message hops and exactly n
messages in a system of n processes.

Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.

Three phase distributed algorithm

Three phases can be seen in both sender and receiver side.
Sender side

Phase 1
 In the first phase, a process multicasts the message M with a locally unique tag
and the local timestamp to the group members.

Phase 2
 The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
 The await call is non-blocking.

Phase 3
 The process multicasts the final timestamp to the group.

Fig 2.9: Sender side of three phase distributed algorithm

Receiver Side

Phase 1
 The receiver receives the message with a tentative timestamp. It updates the
variable priority that tracks the highest proposed timestamp, then revises the
proposed timestamp to the priority, and places the message with its tag and the
revised timestamp at the tail of the queue temp_Q. In the queue, the entry is marked
as undeliverable.

Phase 2
 The receiver sends the revised timestamp back to the sender. The receiver then waits
in a non-blocking manner for the final timestamp.

Phase 3
 The final timestamp is received from the multicaster. The corresponding
message entry in temp_Q is identified using the tag, and is marked as deliverable
after the revised timestamp is overwritten by the final timestamp.

 The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under
consideration, that message entry has to be placed in its sorted position in the
queue.
 If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from
temp_Q, and enqueued in deliver_Q.

Complexity

This algorithm uses three phases, and, to send a message to n − 1 processes, it

uses 3(n – 1) messages and incurs a delay of three message hops

2.9 GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS

• A distributed computing system consists of processes that do not share a common
memory and communicate asynchronously with each other by message passing.
• Each component of has a local state. The state of the process is the local memory
and a history of its activity.
• The state of a channel is characterized by the set of messages sent along the
channel less the messages received along the channel. The global state of a
distributed system is a collection of the local states of its components.
• If shared memory were available, an up-to-date state of the entire system would be
available to the processes sharing the memory.
• The absence of shared memory necessitates ways of getting a coherent and
complete view of the system based on the local states of individual processes.
• A meaningful global snapshot can be obtained if the components of the distributed
system record their local states at the same time.
• This would be possible if the local clocks at processes were perfectly
synchronized or if there were a global system clock that could be instantaneously
read by the processes.
• If processes read time from a single common clock, various in determinate
transmission delays during the read operation will cause the processes to identify
various physical instants as the same time.

2.9.1System Model
• The system consists of a collection of n processes, p1, p2,…,pn that are
connectedby channels.
• Let Cij denote the channel from process pi to process pj.
• Processes and channels have states associated with them.
• The state of a process at any time is defined by the contents of processor
registers, stacks, local memory, etc., and may be highly dependent on the local
context of the distributed application.
• The state of channel Cij, denoted by SCij, is given by the set of messages in
transit in the channel.
• The events that may happen are: internal event, send (send (mij)) and receive
(rec(mij)) events.
• The occurrences of events cause changes in the process state.
• A channel is a distributed entity and its state depends on the local states of the
processes on which it is incident.

• The transit function records the state of the channel Cij.

• In the FIFO model, each channel acts as a first-in first-out message queue and,
thus, message ordering is preserved by a channel.
• In the non-FIFO model, a channel acts like a set in which the sender process
adds messages and the receiver process removes messages from it in a random
order.

2.9.2A consistent global state

The global state of a distributed system is a collection of the local states of the
processes and the channels. The global state is given by:
The two conditions for global state are:

Condition 1 preserves law of conservation of messages. Condition C2 states that in

thecollected global state, for every effect, its cause must be present.

Law of conservation of messages: Every message m ijthat is recorded as sent in the local state of
a process pi must be captured in the state of the channel C ij or in the collected local state of the
receiver process pj.

➢ In a consistent global state, every message that is recorded as received is also

recorded as sent. Such a global state captures the notion of causality that a
message cannot be received if it was not sent.
➢ Consistent global states are meaningful global states and inconsistent global states
are not meaningful in the sense that a distributed system can never be in an
inconsistent state.

2.9.3Interpretation of cuts
• Cuts in a space–time diagram provide a powerful graphical aid in representing and
reasoning about the global states of a computation. A cut is a line joining an
arbitrary point on each process line that slices the space–time diagram into a
PAST and a FUTURE.
• A consistent global state corresponds to a cut in which every message received in
the PAST of the cut has been sent in the PAST of that cut. Sucha cut is known as a
consistent cut.
• In a consistent snapshot, all the recorded local states of processes are concurrent;
that is, the recorded local state of no process casually affects the recorded local
state of anyother process.

Issues in recording global state

The non-availability of global clock in distributed system, raises the following
issues:
Issue 1:
How to distinguish between the messages to be recorded in the snapshot from
those not to be recorded? Answer:
• Any message that is sent by a process before recording its snapshot, must
berecorded in the global snapshot (from C1).
• Any message that is sent by a process after recording its snapshot, must
not berecorded in the global snapshot (from C2).

Issue 2:
How to determine the instant when a process takes its snapshot? The answer
Answer:
A process pj must record its snapshot before processing a message mij that was sent
byprocess pi after recording its snapshot

2.9.4 SNAPSHOT ALGORITHMS FOR FIFO CHANNELS

Each distributed application has number of processes running on different
physical servers. These processes communicate with each other through
messaging channels.

A snapshot captures the local states of each process along with the state of each communication
channel.

Snapshots are required to:

• Check pointing
• Collecting garbage
• Detecting deadlocks
• Debugging
Chandy–Lamport algorithm
 The algorithm will record a global snapshot for each process channel.
 The Chandy-Lamport algorithm uses a control message, called a marker.
 After a site has recorded its snapshot, it sends a marker along all of its outgoing
channels before sending out any more messages.
 Since channels are FIFO, a marker separates the messages in the channel into those
to be included in the snapshot from those not to be recorded in the snapshot.
 This addresses issue I1. The role of markers in a FIFO system is to act as delimiters
for the messages in the channels so that the channel state recorded by the process
at the receiving end of the channel satisfies the condition C2.
Fig 2.10: Chandy–Lamport algorithm

Initiating a snapshot
 Process Pi initiates the snapshot
 Pi records its own state and prepares a special marker message.
 Send the marker message to all other processes.
 Start recording all incoming messages from channels Cij for j not equal to i.

Propagating a snapshot
 For all processes Pjconsider a message on channel Ckj.
 If marker message is seen for the first time:
 Pjrecords own sate and marks Ckj as empty

Send the marker message to all other processes.
 Record all incoming messages from channels Clj for 1 not equal to j or k.

Else add all messages from inbound channels.

Terminating a snapshot
 All processes have received a marker.
 All process have received a marker on all the N-1 incoming channels.
 A central server can gather the partial state to build a global snapshot.

Correctness of the algorithm

 Since a process records its snapshot when itreceives the first marker
on any incoming channel, no messages that followmarkers on the
channels incoming to it are recorded in the process’s snapshot.
 A process stops recording the state of an incoming channel whena
marker is received on that channel.
 Due to FIFO property of channels, itfollows that no message sent after
the marker on that channel is recorded inthe channel state. Thus,
condition C2 is satisfied.
 When a process pj receives message mij that precedes the marker on
channel Cij, it acts as follows: ifprocess pj has not taken its snapshot
yet, then it includes mij in its recorded snapshot. Otherwise, it
records mij in the state of the channel Cij. Thus,condition C1 is
satisfied.
Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is the
diameter of the network.

Properties of the recorded global state

Fig) Timing diagram of two possible executions of the banking

examples

1. (Markers shown using dashed-and-dotted arrows.) Let site S1 initiate the

algorithm just after t1. Site S1 records its local state (account A=$550) and
sends a marker to site S2. The marker is received by site S2 after t4. When
site S2 receives the marker, it records its local state
(account B=$170), the state of channel C12 as $0, and sends a marker along
channel C21. When site S1 receives this marker, it records the state of
channel C21 as $80. The $800 amount in the system is conserved in the
recorded global state,
A=$550 B=$170 C12 =$0 C21 =$80

2. (Markers shown using dotted arrows.) Let site S1 initiate the algorithm just
after t0 and before Sending the $50 for S2. Site S1 records its local state
(account A = $600) and sends a marker to S2. The marker is received by site
S2 between t2 and t3. When site S2 receives the marker, it records its local
state (account B = $120), the state of channel C12 as $0, and sends a marker
along channel C21. When site S1 receives this marker, it records the state of
channel C21 as $80.
The $800 amount in the system is conserved in the recorded
global state, A=$600 B=$120 C12 =$0 C21 =$80

The recorded global state may not correspond to any of the global states that
occurred during the computation.
This happens because a process can change its state asynchronously before the
markers it sentare received by other sites and the other sites record their states.
But the system could have passed through the recorded global states in some equivalent
executions. The recorded global state is a valid state in an equivalent execution and if a
stable property (i.e., a property that persists) holds in the system before the snapshot
algorithm begins, it holds in the recorded global snapshot. Therefore, a recorded global
state is useful in detecting stable properties.

Physical and Logical Clock Synchronization
No ratings yet
Physical and Logical Clock Synchronization
75 pages
Logical Time in Distributed Computing
No ratings yet
Logical Time in Distributed Computing
8 pages
Logical Time in Distributed Computing
No ratings yet
Logical Time in Distributed Computing
43 pages
Logical Time and Synchronization in Distributed Systems
No ratings yet
Logical Time and Synchronization in Distributed Systems
8 pages
Logical Time and Global State in Systems
No ratings yet
Logical Time and Global State in Systems
9 pages
Understanding Logical Clocks in Distributed Systems
No ratings yet
Understanding Logical Clocks in Distributed Systems
27 pages
Causality in Distributed Systems
No ratings yet
Causality in Distributed Systems
27 pages
Logical Clocks in Distributed Systems
No ratings yet
Logical Clocks in Distributed Systems
33 pages
Logical Time and Message Ordering in Distributed Systems
No ratings yet
Logical Time and Message Ordering in Distributed Systems
4 pages
Logical Time and Global State in Systems
No ratings yet
Logical Time and Global State in Systems
29 pages
Message Ordering in Distributed Systems
No ratings yet
Message Ordering in Distributed Systems
19 pages
Synchronisation and Coordination in Systems
No ratings yet
Synchronisation and Coordination in Systems
9 pages
Logical Clocks in Distributed Systems
No ratings yet
Logical Clocks in Distributed Systems
27 pages
Logical Time in Distributed Systems
No ratings yet
Logical Time in Distributed Systems
24 pages
Chap 7 Notes - Sync1
No ratings yet
Chap 7 Notes - Sync1
9 pages
Understanding Logical Clocks in Distributed Systems
No ratings yet
Understanding Logical Clocks in Distributed Systems
6 pages
Clock Synchronization in Distributed Systems
No ratings yet
Clock Synchronization in Distributed Systems
56 pages
Logical Time and Global State in Systems
No ratings yet
Logical Time and Global State in Systems
113 pages
Logical Time and Clock Synchronization in Distributed Systems
No ratings yet
Logical Time and Clock Synchronization in Distributed Systems
39 pages
Logical Time in Distributed Computing
No ratings yet
Logical Time in Distributed Computing
77 pages
Logical Time in Distributed Systems
No ratings yet
Logical Time in Distributed Systems
42 pages
Understanding Logical Clocks in Distributed Systems
No ratings yet
Understanding Logical Clocks in Distributed Systems
23 pages
Distributed System Clock Synchronization
No ratings yet
Distributed System Clock Synchronization
71 pages
Clock Synchronization in Distributed Systems
No ratings yet
Clock Synchronization in Distributed Systems
39 pages
Clock Synchronization in Distributed Systems
No ratings yet
Clock Synchronization in Distributed Systems
55 pages
CS3551 Unit II: Logical Time Concepts
No ratings yet
CS3551 Unit II: Logical Time Concepts
35 pages
Time Synchronization in Distributed Systems
No ratings yet
Time Synchronization in Distributed Systems
14 pages
Election Algorithms and Logical Time in Distributed Computing
No ratings yet
Election Algorithms and Logical Time in Distributed Computing
72 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
40 pages
Distributed Computing Course Overview
No ratings yet
Distributed Computing Course Overview
9 pages
Logical Time and Global State in Systems
No ratings yet
Logical Time and Global State in Systems
69 pages
Understanding Clock Synchronization Techniques
No ratings yet
Understanding Clock Synchronization Techniques
40 pages
Understanding NTP and Clock Synchronization
No ratings yet
Understanding NTP and Clock Synchronization
30 pages
Clock Synchronization and Coordination Techniques
No ratings yet
Clock Synchronization and Coordination Techniques
56 pages
Process Communication Models in Distributed Systems
No ratings yet
Process Communication Models in Distributed Systems
10 pages
Causal Ordering with Logical and Vector Clocks
No ratings yet
Causal Ordering with Logical and Vector Clocks
26 pages
Logical Time and Message Ordering in Distributed Systems
No ratings yet
Logical Time and Message Ordering in Distributed Systems
3 pages
Cristian's Algorithm in Distributed Systems
No ratings yet
Cristian's Algorithm in Distributed Systems
6 pages
Understanding Logical Clocks in Distributed Systems
No ratings yet
Understanding Logical Clocks in Distributed Systems
5 pages
Synchronizing Clocks in Distributed Systems
No ratings yet
Synchronizing Clocks in Distributed Systems
28 pages
Lecture07 Coordination CH6 PartII
No ratings yet
Lecture07 Coordination CH6 PartII
33 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
8 pages
Clock Synchronization in Distributed Systems
No ratings yet
Clock Synchronization in Distributed Systems
61 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
46 pages
Logical Clocks in Distributed Computing
No ratings yet
Logical Clocks in Distributed Computing
27 pages
Distributed Computing Syllabus Overview
No ratings yet
Distributed Computing Syllabus Overview
43 pages
Understanding Lamport's Algorithm
No ratings yet
Understanding Lamport's Algorithm
3 pages
Synchronizing Clocks in Distributed Systems
No ratings yet
Synchronizing Clocks in Distributed Systems
16 pages
Time and State in Distributed Systems
No ratings yet
Time and State in Distributed Systems
35 pages
Understanding Distributed Systems Concepts
No ratings yet
Understanding Distributed Systems Concepts
38 pages
Understanding Clocks in Distributed Systems
No ratings yet
Understanding Clocks in Distributed Systems
13 pages
Understanding Logical Time in Distributed Systems
No ratings yet
Understanding Logical Time in Distributed Systems
31 pages
Chandy-Lamport Algorithm Overview
No ratings yet
Chandy-Lamport Algorithm Overview
67 pages
Synchronization in Distributed Systems
No ratings yet
Synchronization in Distributed Systems
79 pages
Chapter 5 Synchronization
No ratings yet
Chapter 5 Synchronization
8 pages
Vector Clocks in Distributed Systems
No ratings yet
Vector Clocks in Distributed Systems
14 pages
Understanding Matrix Clocks in Systems
No ratings yet
Understanding Matrix Clocks in Systems
55 pages
Lecture # 9-10 (Synchronization) DS - Spring24 - by Faraz
No ratings yet
Lecture # 9-10 (Synchronization) DS - Spring24 - by Faraz
36 pages
Time and Global States in Distributed Systems
No ratings yet
Time and Global States in Distributed Systems
17 pages
VEX V5 Robotics Competition Manual 2024
No ratings yet
VEX V5 Robotics Competition Manual 2024
135 pages
Stellar Water Heating Solutions
No ratings yet
Stellar Water Heating Solutions
36 pages
Exotic India Art Reviews and Offerings
No ratings yet
Exotic India Art Reviews and Offerings
8 pages
Axolotl Hug
No ratings yet
Axolotl Hug
1 page
Water Quality Index in Thrace Lakes
No ratings yet
Water Quality Index in Thrace Lakes
11 pages
Class 8 Physics Revision Worksheet
No ratings yet
Class 8 Physics Revision Worksheet
8 pages
Classification of Logic Families in Electronics
No ratings yet
Classification of Logic Families in Electronics
23 pages
Key Topics in Orthopaedic Trauma Surgury
82% (11)
Key Topics in Orthopaedic Trauma Surgury
318 pages
Tos93 If E4
No ratings yet
Tos93 If E4
348 pages
Overview of the Polavaram Project
No ratings yet
Overview of the Polavaram Project
3 pages
Edexcel International GCSE History Exam
No ratings yet
Edexcel International GCSE History Exam
36 pages
Petrovis Event Overview 2024
No ratings yet
Petrovis Event Overview 2024
10 pages
Community Development Insights
No ratings yet
Community Development Insights
18 pages
Evolution of the Sewing Machine
No ratings yet
Evolution of the Sewing Machine
1 page
Cellular Respiration and ATP Synthesis
100% (1)
Cellular Respiration and ATP Synthesis
15 pages
MATLAB Interpolation and Curve Fitting
No ratings yet
MATLAB Interpolation and Curve Fitting
25 pages
Drugged Driving Risks and Laws
No ratings yet
Drugged Driving Risks and Laws
44 pages
Trunking Theory and Grade of Service
No ratings yet
Trunking Theory and Grade of Service
41 pages
Orca Survival and Food Web Dynamics
No ratings yet
Orca Survival and Food Web Dynamics
10 pages
Rufouz Hitek Engineers Overview
No ratings yet
Rufouz Hitek Engineers Overview
40 pages
DaeRyuk: Expert Anti-Corrosion Coatings
No ratings yet
DaeRyuk: Expert Anti-Corrosion Coatings
5 pages
Petronas Twin Towers: Engineering Marvels
No ratings yet
Petronas Twin Towers: Engineering Marvels
3 pages
Welding Shield and Face Shield Diagrams
No ratings yet
Welding Shield and Face Shield Diagrams
1 page
Rewrite Sentences with 'Have Done'
No ratings yet
Rewrite Sentences with 'Have Done'
4 pages
EKS Setup with ArgoCD, Prometheus, Terraform
No ratings yet
EKS Setup with ArgoCD, Prometheus, Terraform
21 pages
Usha Sewing Machines: Empowering Lives
No ratings yet
Usha Sewing Machines: Empowering Lives
1 page
BPS+ +the+Biofunctional+Prosthetic+System
No ratings yet
BPS+ +the+Biofunctional+Prosthetic+System
20 pages
Metric Thread and Drill Size Chart
No ratings yet
Metric Thread and Drill Size Chart
12 pages
Applications of Communication Satellites
No ratings yet
Applications of Communication Satellites
17 pages
Switchboard Room Foundations Details
No ratings yet
Switchboard Room Foundations Details
6 pages

Logical Time and Global State in Systems

Uploaded by

Logical Time and Global State in Systems

Uploaded by

UNIT II

LOGICAL TIME AND GLOBAL STATE

Logical Time: Physical Clock Synchronization: NTP – A Framework for a

2.1.1 SCALAR TIME

• If Ci is the local clock for process Pi then,

Rule 1: Ci(b) = Ci(a) + d1, where d1 > 0

Fig 1.20: Evolution of scalar time

2.2 PHYSICAL CLOCK SYNCHRONIZATION: NEWTWORK TIME PROTOCOL (NTP)

drift is calculated as:

2.3 MESSAGE ORDERING AND GROUP COMMUNICATION

2.3.1MESSAGE ORDERING PARADIGMS

Hierarchy of ordering paradigms

The orders of executions are:

Fig (a) Fig (b)

Fig 2.3: Hierarchy of execution classes

Fig 2.4: Modeling channels as processes to simulate an execution using

2.5 SYNCHRONOUS PROGRAM ORDER ON AN ASYNCHRONOUS

Non deterministic programs

• Binary rendezvous: When two processes agree to synchronize.

Features of binary rendezvous:

• Send and received commands may be individually disabled or enabled. A command is

2.3.2 Binary rendezvous algorithm

• Schedule on-line, atomically, and in a distributed manner.

Steps in Bagrodia algorithm

1. Receive commands are forever enabled from all processes.

(5) Processing when Pi is unblocked:

Fig 2.5: Bagrodia Algorithm

2.6 GROUP COMMUNICATION

Closed group algorithms Open group algorithms

2.7 CAUSAL ORDER (CO)

2.1 Causal Order (CO)

not in the causal future of e k, c where d ∈Mk,cDests and there is no

Information about messages:

Fig 2.7 b) Receive algorithm by Kshemkalyani–Singhal to optimally implement causal

from (i, a) at which it is known that M i,a is delivered to d or is guaranteed to be delivered to d

Fig 2.8: Illustration of propagation constraints

Constraints Multicasts M5,1and M4,1

Learning implicit information at P2 and P3

• When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests = P6 it is

and, therefore, M5,1. Dests is set to ∅ in Log6.

M5,[Link] in the piggybacked information, and hence marks information P6 ∈ M5,1Dests

The information “P6 ∈M5,[Link] piggybacked on M2,3,which arrives at P 1, is inferred to

2.2 TOTAL ORDER

Centralized Algorithm for total ordering

Three phase distributed algorithm

Fig 2.9: Sender side of three phase distributed algorithm

This algorithm uses three phases, and, to send a message to n − 1 processes, it

2.9 GLOBAL STATE AND SNAPSHOT RECORDING ALGORITHMS

• The transit function records the state of the channel Cij.

2.9.2A consistent global state

Condition 1 preserves law of conservation of messages. Condition C2 states that in

➢ In a consistent global state, every message that is recorded as received is also

Issues in recording global state

2.9.4 SNAPSHOT ALGORITHMS FOR FIFO CHANNELS

Snapshots are required to:

Correctness of the algorithm

Properties of the recorded global state

Fig) Timing diagram of two possible executions of the banking

1. (Markers shown using dashed-and-dotted arrows.) Let site S1 initiate the

You might also like