DISTRIBUTED MUTEX AND DEADLOCK
Distributed Mutual exclusion Algorithms: Introduction – Preliminaries – Lamport’salgorithm
–Ricart- Agrawala’s Algorithm –– Token-Based Algorithms – Suzuki-Kasami’s Broadcast
Algorithm;Deadlock Detection in Distributed Systems: Introduction – System Model –
Preliminaries – Models of Deadlocks – Chandy-Misra-Haas Algorithm for the AND model and
OR Mode
[Link] Mutual Exclusion Algorithms: Introduction
Mutual exclusion is a fundamental problem in distributed computing systems. Mutual
exclusion ensures that concurrent access of processes to a shared resource ordata is
serialized, that is, executed in mutually exclusive manner.
Figure 1: Three processes accessing a shared resource (critical section) simultaneously.
Mutual exclusion in a distributed system states that only one process is allowed
toexecute the critical section (CS) at any given time.
Message passing is the sole means for implementing distributed mutual exclusion.
There are three basic approaches for implementing distributed mutual exclusion:
1. Token based approach
2. Non-token based approach
3. Quorum based approach
1. In the token-based approach, a unique token (also known as the PRIVILEGE message) is
shared among the sites. A site is allowed to enter its CS if it possesses the token and it
continues to hold the token until the execution of the CS is over. Mutual exclusion is ensured
because the token is unique.
2. In the non-token based approach, two or more successive rounds of messages are
exchanged among the sites to determine which site will enter the CS next. A site enters the
Critical Section (CS) when an assertion, defined on its local variables, becomes true.
Mutual Exclusion is enforced because the assertion becomes true only at one site at any given
time.
3. In the quorum-based approach, each site requests permission to execute the CS from a
subset of sites (called a quorum). The quorums are formed in such a way that when two sites
concurrently request access to the CS, one site receives both the requests and which
isresponsible to make sure that only one request executes the CS at any time.
2. Preliminaries
We describe here,
1. System model,
2. Requirements that mutual exclusion algorithms
3. Metrics we use to measure the performance of mutual exclusion algorithms.
1. System Model
The system consists of N sites, S1, S2, ..., SN. We assume that a single process is
running on each site.
The process at site Si is denoted by pi.
A process wishing to enter the CS, requests all other or a subset of processes by
sending REQUEST messages, and waits for appropriate replies before entering the
[Link] waiting the process is not allowed to make further requests to enter the CS.
A site can be in one of the following three states:
1. Requesting the Critical Section.
2. Executing the Critical Section.
3. Neither requesting nor executing the CS (i.e., idle).
In the ‘requesting the CS’ state, the site is blocked and can not make further
requestsfor the CS. In the ‘idle’ state, the site is executing outside the CS.
In the token-based algorithms, a site can also be in a state where a site holding the
token is executing outside the CS. Such state is referred to as the idle token state.
At any instant, a site may have several pending requests for CS. A site queues up
theserequests and serves them one at a time.
2. Requirements Of Mutual Exclusion Algorithms
A mutual exclusion algorithm should satisfy the following properties:
a. Safety Property: The safety property states that at any instant, only one process can
execute the critical section. This is an essential property of a mutual exclusion algorithm.
b. Liveness Property: This property states the absence of deadlock and starvation. Two
or more sites should not endlessly wait for messages which will never arrive. In addition, a
site must not wait indefinitely to execute the CS while other sites are repeatedly executing the
CS. That is, every requesting site should get an opportunity to execute the CS in finite time.
c. Fairness: Fairness in the context of mutual exclusion means that each process gets a fair
chance to execute the CS.
3. Performance Metrics
The performance of mutual exclusion algorithms is generally measured by the following fourmetrics:
a. Message complexity: It is the number of messages that are required per CS execution by
asite.
b. Synchronization delay: After a site leaves the CS, it is the time required and before the
nextsite enters the CS (sees Figure 9.1).
[Link]’s Algorithm
The algorithm is fair in the sense that a request for CS is executed in the order of
theirtimestamps and time is determined by logical clocks.
When a site processes a request for the CS, it updates its local clock and assigns therequest a
timestamp.
The algorithm executes CS requests in the increasing order of timestamps. Every
site Si keeps a queue, request_queuei,
This algorithm requires communication channels to deliver messages the FIFO order.
The Algorithm
1. Requesting the critical section:
• When a site Si wants to enter the CS, it broadcasts a REQUEST(tsi, i) message to all
other sites and places the request on request_queuei. ((tsi, i) denotes the timestamp
ofthe request.)
• When a site Sj receives the REQUEST(tsi, i) message from site Si, places site
Si’s Request on request_queuej and it returns a time stamped REPLY message
to Si.
2. Executing the critical section:
Site Si enters the CS when the following two conditions hold:
L1: Si has received a message with timestamp larger than (tsi, i) from all other sites.
L2: Si’s request is at the top of request_queuei.
3. Releasing the critical section:
Site Si, upon exiting the CS, removes its request from the top of its request queue
andbroadcasts a time stamped RELEASE message to all other sites.
When a site Sj receives a RELEASE message from site Si, it removes Si’s request
from its request queue. When a site removes a request from its request queue, its own
request may come at the top of the queue, enabling it to enter the CS. Clearly, when a
site receives a REQUEST, REPLY or RELEASE message, it updates its clock using
the timestamp in the message.
An Example
Step 1:
sites S1andS2 are making requests for the CS and send out REQUEST messages to other
sites. The time stamps of the requests are (1, 1) and (1, 2), respectively.
Step 2:
The sites S1 and S2 have received REPLY messages from all other sites. S1 has its request at the top of
its request_queue but site S2 does not have its request at the top of its request_queue. Consequently, site
S1 enters the CS.
Step 3:
S1 exits and sends RELEASE messages to all other sites.
Step 4:
site S2 has received REPLY from all other sites and also received a RELEASE message from
siteS1. Site S2 updates its request_queue and its request is now at the top of its request_queue.
Consequently, it enters the CS next.
Performance
for each CS invocation
(N-1)
REQUEST
(N- 1)
REPLY
(N-1) RELEASE,
Total 3(N-1) messages, synchronization delay Sd = average delay
4. Ricart-Agrawala Algorithm
The Ricart-Agrawala algorithm assumes the communication channels are FIFO.
The algorithm uses two types of messages: REQUEST and REPLY.
A process sends a REQUEST message to all other processes to request their
permissionto enter the critical section.
A process sends a REPLY message to a process to give its permission to that process.
Processes use Lamport-style logical clocks to assign a timestamp to critical section
requests. Timestamps are used to decide the priority of requests in case
of conflict – if a process pi that is waiting to execute the critical section, receives a
REQUEST message from process pj, then if the priority of pj’s request is lower, pi
defers the REPLY to pj and sends a REPLY message to pj only after executing the CS
for it spendingrequest.
Otherwise, pi sends a REPLY message to pj immediately, provided it is currently not
executing the CS. Thus, if several processes are requesting execution of the CS, the
highest priority request succeeds in collecting all the needed REPLY messages and
getsto execute the CS.
Each process pi maintains the Request-Deferred array, RDi, the size of which is the
same as the number of processes in the system. Initially, ∀i ∀j: RDi[j]=0. Whenever pi defer
the request sent by pj, it sets RDi[j]=1 and after it has sent a REPLY message to pj, it sets
RDi[j]=[Link]: Deferred – Postponed the request / waiting
Algorithm
1. Requesting the critical section:
(a) When a site Si wants to enter the CS, it broadcasts a time stamped REQUEST message
toall other sites.
(b) When site Sj receives a REQUEST message from site Si, it sends a REPLY message
to Site Si if site Sj is neither requesting nor executing the CS, or if the site Sj is
requesting And Si’s request’s timestamp is smaller than site Sj’s own request’s
timestamp. otherwise, the reply is deferred and Sj sets RDj[i]=1
2. Executing the critical section:
(c) Site Si enters the CS after it has received a REPLY message from every site it
senta REQUEST message to.
3. Releasing the critical section:
∀j ifRDi[j]=1, then send a REPLY message to Sj and set RDi[j]=0.
(d) When site Si exits the CS, it sends all the deferred REPLY messages:
When a site receives a message, it updates its clock using the timestamp in the message. Also,
when a site takes up a request for the CS for processing, it updates its local clock and assigns
a timestamp to the request. In this algorithm, a site’s REPLY messages are blocked only by
sites which are requesting the CS with higher priority (i.e., smaller timestamp).Thus, when a
site sends out differed REPLY messages, site with the next highest priority request receives
the last needed REPLY message and enters the CS. Execution of the CS requests in this
algorithm is always in the order of their timestamps.
An Example
Step 1:
sites S1 and S2 are making requests for the CS and send out REQUEST messages to other
sites. Thetimestamps of the requests are (2, 1) and (1, 2), respectively.
Step 2:
S2 has received REPLY messages from all other sites and consequently, it enters the CS.
Step 3:
S2 exitsthe CS and sends a REPLY message to site S1.
Step 4: site S1 has received REPLY from all other sites and enters the CS next.
Performance
For each CS execution, Ricart-Agrawala algorithm requires (N − 1) REQUEST messages and
(N−1) REPLY messages. Thus, it requires 2(N−1) messages per CS execution.
Synchronizationdelay in the algorithm is T.
[Link]-Kasami’s Broadcast Algorithm
• Suzuki–Kasami algorithm is a token-based algorithm for achieving
mutual exclusionin distributed systems.
• In token-based algorithms, A site is allowed to enter its critical
section if it possessesthe unique token.
• Non-token-based algorithms uses timestamp to order requests for
the critical sectionwhere as sequence number is used in token based
algorithms.
• Each request for critical section contains a sequence number. This
sequence numberis used to distinguish old and current requests.
In Suzuki-Kasami’s algorithm if a site that wants to enter the CS, does not have the token, it broadcasts a
REQUEST message for the token to all other sites. A site which possesses the token sends it to the requesting site
upon the receipt of its REQUEST message. If a site receives a REQUEST message when it is executing the CS, it
sends the token only after it has completed the execution of the CS
ALGORITHM
1. Requesting the critical section
(a) If requesting site Si does not have the token, then it increments its sequence
number, RNi[i], and sends a REQUEST(i, sn) message to all other sites. (‘sn’ is the
updated value of RNi[i].)
(b) When a site Sj receives this message, it sets RNj[i] to max(RNj[i], sn). If Sj has
the idle token, then it sends the token to Si if RNj[i]=LN[i]+1.
2. Executing the critical section
(c) Site Si executes the CS after it has received the token.
3. Releasing the critical section Having finished the execution of the CS, site Si takes
thefollowing actions:
(d) It sets LN[i] element of the token array equal to RNi[i].
(e) For every site Sj whose id is not in the token queue, it appends its id to the token
queue if RNi[j]=LN[j]+1.
(f) If the token queue is nonempty after the above update, Si deletes the top site id
from the token queue and sends the token to the site indicated by the id. Thus, after
executing the CS, a site gives priority to other sites with outstanding requests for the
CS (over its pending requests for the CS). Note that Suzuki- Kasami’s algorithm is not
symmetric because a site retains the token even if it does not have a request for the
CS, which is contrary to the spirit of Ricart and Agrawala’s definition of symmetric
algorithm: “no site possesses the right to access its CS when it has not been
requested”.
Performance
Beauty of Suzuki-Kasami algorithm lies in its simplicity and efficiency. No message is
needed and the synchronization delay is zero if a site holds the idle token at the time of its
request. If a site does not hold the token when it makes a request, the algorithm requires N
messages to obtain the token. Synchronization delay in this algorithm is 0 or T.
5. Deadlock Detection In Distributed Systems : Introduction
A deadlock is a condition where a process cannot proceed because it needs to obtain a
resource held by another process and it itself is holding a resource thatthe other process
needs.
We can consider two types of deadlock:
• Communication deadlock occurs when process A is trying to send a
message to process B, which is trying to send a message to process C
which is trying to send a message to A.
• A resource deadlock occurs when processes are trying to get exclusive access to
devices, files, locks, servers, or other resources. We will not differentiate between
these types of deadlock since we can consider communication channels to be
resources without loss of generality.
“A deadlock can be defined as a condition where a set of processes request resources that
are held by other processes in the set.”
Deadlock deals with various components like deadlock prevention, deadlock avoidance
other then deadlock detection.
Deadlock prevention is commonly achieved by either having a process acquire all
the needed resources simultaneously before it begins execution or by pre- empting a
process that hold the needed resource.
In the deadlock avoidance approach to distributed system, a resource is granted to
aprocess if the resulting global system is safe.
Deadlock detection requires an examination of the status of the process- resources
interaction for the presence of a deadlock condition. To resolve the deadlock, we have
to abort a deadlocked process.
7. System Model
A distributed system consists of a set of processors that are connected by a
communication network.
The communication delay is finite but unpredictable.
A distributed program is composed of a set of n asynchronous processes p1, p2, .
. . , pi, . . . , pn that communicates by message passing over the communication
network.
Without loss of generality we assume that each process is running on a
differentprocessor.
The processors do not share a common global memory and communicate solely
bypassing messages over the communication network.
The communication medium may deliver messages out of order, messages may be
lost garbled or duplicated due to timeout and retransmission, processors may fail
andcommunication links may go down.
The system can be modelled as a directed graph in which vertices represent the
processes and edge represent unidirectional communication channels.
We make the following assumptions:
• The systems have only reusable resources.
• Processes are allowed to make only exclusive access to resources.
• There is only one copy of each resource.
A process can be in two states: running or blocked. In the running state (also called active
state), a process has all the needed resources and is either executing or is ready for
[Link] the blocked state, a process is waiting to acquire some resource.
Wait-For-Graph (WFG)
In distributed systems, the state of the system can be modelled by directed
graph,called a wait for graph (WFG).
In a WFG, nodes are processes and there is a directed edge from node P1 to mode
P2if P1 is blocked and is waiting for P2 to release some resource.
A system is deadlocked if and only if there exists a directed cycle or knot in the WFG.
where process P11 of site 1 has an edge to process P21 of site 1 and P32 of site 2 is waiting
for a resource which is currently held by process P21. At the same time process P32 is
waiting on process P33 to release a resource. If P21 is waiting on process P11, then processes
P11, P32 and P21 form a cycle and all the four processes are involved in a deadlock
depending upon the request model.
[Link]
Deadlock Handling Strategies
There are three strategies for handling deadlocks,
i. Deadlock Prevention,
ii. Deadlock Avoidance,
iii. Deadlock Detection.
Issues in Deadlock Detection
Deadlock handling using the approach of deadlock detection entails addressing two
basic issues:
1. Detection of existing deadlocks
2. Resolution of detected deadlocks.
1. Detection of Deadlocks
Detection of deadlocks involves addressing two issues: maintenance of the WFG
and searching of the WFG for the presence of cycles (or knots).
Since in distributed systems, a cycle or knot may involve several sites, the search
for cycles greatly depends upon how the WFG of the system is represented
across the system.
Depending upon the way WFG information is maintained and search for cycles is
carried out, there are centralized, distributed, and hierarchical algorithms for
deadlock detection in distributed systems .
2. Resolution of a Detected Deadlock
Deadlock resolution involves breaking existing wait-for dependencies between the
processes to resolve the deadlock.
It involves rolling back one or more deadlocked processes and assigning their
resources to blocked processes so that they can resume execution.
when a wait-for dependency is broken, the corresponding information should be
immediately cleaned from the system.
If this information is not cleaned in timely manner, it may result in detection of
phantom deadlocks.
9. Models Of Deadlocks
The models of deadlocks are explained based on their hierarchy.
The diagrams illustrate the working of the deadlock models. Pa, Pb, Pc, Pdare passive
processes that had already acquired the resources. Pe is active process that is requesting the
resource.
Single Resource Model
A process can have at most one outstanding request for only one unit of a
resource.
The maximum out-degree of a node in a WFG for the single resource model can
be 1,
the presence of a cycle in the WFG shall indicate that there is a deadlock.
AND Model
In the AND model, a passive process becomes active (i.e., its activation condition is
fulfilled) only after a message from each process in its dependent set has arrived.
In the AND model, a process can request more than one resource simultaneously and the
request is satisfied only after all the requested resources are granted to the process.
The requested resources may exist at different locations.
The out degree of a node in the WFG for AND model can be more than 1.
The presence of a cycle in the WFG indicates a deadlock in the AND model.
Each node of the WFG in such a model is called an AND node.
In the AND model, if a cycle is detected in the WFG, it implies a deadlock but not vice
versa. That is, a process may not be a part of a cycle, it can still be deadlocked.
The OR Model
A process shall move from an idle to an active state on receiving a grant message
from any of the processes in its dependent set.
A process is permanently blocked if it never receives a grant message from any of the
Processes in its dependent set.
A set of processes S is deadlocked if all the processes in S are permanently blocked.
In short, a processis deadlocked or permanently blocked, if the following conditions
are met:
1. Each of the process is the set S is blocked.
2. The dependent set for each process in S is a subset of S.
3. No grant message is in transit between any two processes in set S.
This is a variation of AND-OR model.
This allows a request to obtain any k available resources from a pool of n resources.
Both the models are the same in expressive power.
This favours more compact formation of a request.
Every request in this model can be expressed in the AND-OR model and vice-versa.
Unrestricted Model
In the unrestricted model, no assumptions are made regarding the underlying
structure of resource requests.
In this model, only one assumption that the deadlock is stable is made and hence
it is the most general model.
Algorithms can be used to detect other stable properties as they deal with this
general model.
10. Chandy-Misra-Haas Algorithm For The And Model
Chandy-Misra-Haas’s distributed deadlock detection algorithm for AND model
that is based on edge-chasing.
The algorithm uses a special message called probe, which is a triplet (i, j, k),
denoting that it belongs to a deadlock detection initiated for process Pi and it is
being sent by the home site of process Pj to the home site of process Pk.
A probe message travels along the edges of the global WFG graph, and a
deadlock is detected when a probe message returns to the process that initiated it.
A process Pj is said to be dependent on another process Pk if there exists a
sequence of processes Pj,Pi1, Pi2, ..., Pim, Pk such that each process except Pk in
the sequence is blocked and each process, except the Pj, holds a resource for
which the previous process in the sequence is waiting.
Process Pj is said to be locally dependent upon process Pk if Pj is dependent
upon Pk and both the processes are on the same site.
Data Structures
Each process Pi maintains a boolean array, dependent i, where dependent i(j) is true only
if Pi knows that Pj is dependent on it. Initially, dependent i(j) is false for all i and j.
Therefore, a probe message is continuously circulated along the edges of the global
WFG graph and a deadlock is detected when a probe message returns to its initiating
process.
Performance Analysis
In the algorithm, one probe message (per deadlock detection initiation) is sent on
every edge of the WFG which that two sites.
Thus, the algorithm exchanges at most m(n − 1)/2 messages to detect a deadlock
that involves m processes and that spans over n sites.
The size of messages is fixed and is very small (only 3 integer words). Delay in
detecting a deadlock is O(n).
11. Chandy-Misra-Haas Algorithm For The Or Model
Chandy-Misra-Haas distributed deadlock detection algorithm for OR model that
is based on the approach of diffusion-computation.
A blocked process determines if it is deadlocked by initiating a diffusion
computation.
Two types of messages are used in a diffusion computation: query(i, j, k) and
reply(i, j, k), denoting that they belong to a diffusion computation initiated by a
process Pi and are being sent from process Pj to process Pk.
Basic Idea
A blocked process initiates deadlock detection by sending query messages to all processes
in its dependent set (i.e., processes from which it is waiting to receive a message).
If an active process receives a query or reply message, it discards it. When a blocked
process Pk receives a query(i, j,k) message, it takes the following actions:
If this is the first query message received by Pk for the deadlock detection initiated by Pi
(called the engaging query), then it propagates the query to all the processes in its
dependent set and sets a local variable numk(i) to the number of query messages sent.
If this is not the engaging query, then Pk returns a reply message to it immediately
provided Pk has been continuously blocked since it received the corresponding engaging
query. Otherwise, it discards the query.
Process Pk maintains a boolean variable waitk(i) that denotes the fact that it has been
continuously blocked since it received the last engaging query from process Pi.
When a blocked process Pk receives a reply(i, j, k) message, it decrements numk(i) only if
waitk(i) holds.
A process sends a reply message in response to an engaging query only after it has
received a reply to every query message it had sent out for this engaging query.
The initiator process detects a deadlock when it receives reply messages to all the query
messages it had sent out.
Performance Analysis
For every deadlock detection, the algorithm exchanges e query messages and e reply messages,
where e=n(n-1) is the number of edges.