Module 5
Distributed Mutual Exclusion
Process 2
Process 1 Process 3
…
Shared Process n
resource
• Mutual exclusion very important
– Prevent interference
– Ensure consistency when accessing the resources
2
Distributed Mutual Exclusion
Mutual exclusion useful when the server managing the resources don’t use
locks
• Critical section
Enter() enter critical section – blocking
•
Access shared resources in critical section
•
•
Exit() Leave critical section
3
Distributed Mutual Exclusion
• Distributed mutual exclusion: no shared
variables, only message passing
Properties:
[ME1]Safety: At most one process may execute in the critical
section at a time
[ME2]Liveness: Requests to enter and exit the critical section
eventually succeed
No deadlock and no starvation
[ME3]Ordering: If one request to enter the CS happened-before
another, then entry to the CS is granted in that order
4
Mutual Exclusion Algorithms
Central Server Algorithm
Ring-Based Algorithm
Maekawa’s Voting Algorithm
5
Central Server Algorithm
• server keeps track of a token---permission to
enter critical region
• a process requests the server for the token
• the server grants the token if it has the token
• a process can enter if it gets the token,
otherwise waits
• when done, a process sends release and exits
Central Server Algorithm
Server
Queue of
Holds the token
requests 4
2
2 3) Grant
token
1) Request 2) Release
token token P4
P1 Waiting
P2 P3
Holds the token
7
Properties
• safety,
• liveness, are met by this algorithm
• Happened -before ordering not guaranteed.
Performance
• enter overhead: two messages (request and
grant)
• enter delay: time between request and grant
• exit overhead: one message (release)
• exit delay: none
• synchronization delay: between release and
grant
Ring-Based Algorithm
P1 Enter()
P2 • Critical
•
• Section
Pn Exit()
P3
P4
Token navigates
around the ring
10
Ring-Based Algorithm
Arrange processes in a logical ring to rotate a token
– Wait for the token if it requires to enter the critical section
– The ring could be unrelated to the physical configuration
• pi sends messages to p(i+1) mod N
– when a process requires to enter the critical section, waits
for the token, when a process holds the token
• If it requires to enter the critical section, it can enter
– when a process releases a token (exit), it sends to its
neighbor
• If it doesn’t, just immediately forwards the token to its
neighbor
Properties
• – safety
• – liveness
• – HB ordering not guaranteed, why?
Performance
• This algorithm continuously consumes network bandwidth (except
when a process is inside the critical section): the processes send
messages around the ring even when no process requires entry to
the critical section.
• The delay experienced by a process requesting entry to the critical
section is between 0 messages (when it has just received the token)
and N messages (when it has just passed on the token).
• To exit the critical section requires only one message.
• The synchronization delay between one process’s exit from the
critical section and the next process’s entry is anywhere from 1 to N
message transmissions.
Using Multicast and logical clocks
• Mutual exclusion between N peer processes based upon multicast.
• Processes that require entry to a critical section multicast a request message, and
can enter it only when all the other processes have replied to this message.
• The condition under which a process replies to a request are designed to ensure
ME1 ME2 and ME3 are met.
• Each process pi keeps a Lamport clock. Message requesting entry are of the
form<T, pi>.
• Each process records its state of either RELEASE, WANTED or HELD in a
variable state.
– If a process requests entry and all other processes is RELEASED, then all
processes reply immediately.
– If some process is in state HELD, then that process will not reply until it is
finished.
– If some process is in state WANTED and has a smaller timestamp than the
incoming request, it will queue the request until it is finished.
– If two or more processes request entry at the same time, then whichever bears
the lowest timestamp will be the first to collect N-1 replies.
Ricart and Agrawala’s algorithm
On initialization
state := RELEASED;
To enter the section
state := WANTED;
Multicast request to all processes; request processing deferred here
T := request’s timestamp;
Wait until (number of replies received = (N – 1));
state := HELD;
On receipt of a request <Ti, pi> at pj (i ≠ j)
if (state = HELD or (state = WANTED and (T, pj) < (Ti, pi)))
then
queue request from pi without replying;
else
reply immediately to pi;
end if
To exit the critical section
state := RELEASED;
reply to any queued requests;
Mutual Exclusion using Multicast
and Logical Clocks
Waiting
queue 19 P3
19
2
P1 OK
OK 23
OK
Enter() 23 P1 and P2 request
• entering the critical
• OK section simultaneously
• 19 23
Exit()
P2
Critical Section
15
P1 and P2 request CS concurrently. The timestamp of P1 is 19
and for P2 is 23. When P3 receives their requests, it replies
immediately.
When P1 receives P2’s request, it finds its own request has
the lower timestamp, and so does not reply, holding P2 request
in queue. However, P2 will reply. P1 will enter CS.
After P1 finishes, P1 reply P2 and P2 will enter CS.
Granting entry takes 2(N-1) messages, N-1 to multicast
request and N-1 replies. Bandwidth consumption is high.
Client delay is again 1 round trip time
Synchronization delay is one message transmission time.
Maekawa’s voting algorithm
• It is not necessary for all of its peers to grant it access.
• Processes need only obtain permission to enter from
subsets of their peers, as long as the subsets used by any
two processes overlap.
• processes as voting for one another to enter the critical
section.
• A ‘candidate’ process must collect sufficient votes to enter.
• Processes in the intersection of two sets of voters ensure
the safety property ME1, that at most one process can enter
the critical section, by casting their votes for only one
candidate.
Maekawa’s approach
– subsets Vi,Vj for process Pi, Pj
• Pi ∈Vi, Pj ∈ Vj
• Vi ∩ Vj ≠ ∅, there is at least one common
member
• subset |Vi|=K, to be fair, each process should
have the same size
– Pi cannot enter the critical section until it has
received all K reply messages
Maekawa’s voting algorithm
On initialization
state := RELEASED;
voted := FALSE;
For pi to enter the critical section
state := WANTED;
Multicast request to all processes in Vi;
Wait until
(number of replies received = K);
state := HELD;
On receipt of a request from pi at pj
if (state = HELD or voted = TRUE)
then
queue request from pi without replying;
else
send reply to pi;
voted := TRUE;
For pi to exit the critical section
state := RELEASED;
Multicast release to all processes in Vi;
On receipt of a release from pi at pj
if (queue of requests is non-empty)
then
remove head of queue – from pk, say;
send reply to pk;
voted := TRUE;
else
voted := FALSE;
end if
Maekawa’s algorithm
Example. Let there be seven processes 0, 1, 2, 3, 4, 5, 6
V0 = {0, 1, 2}
V1 = {1, 3, 5}
V2 = {2, 4, 5}
V3 = {0, 3, 4}
V4 = {1, 4, 6}
V5 = {0, 5, 6}
V6 = {2, 3, 6}
Maekawa’s algorithm
V0 = {0, 1, 2}
ME1. At most one process can enter its critical
section at any time. V1 = {1, 3, 5}
V2 = {2, 4, 5}
Let i and j attempt to enter their Critical Sections
Vi ∩ Vj ≠ ∅ implies there is a process k ∊ Vi ⋂ Vj V3 = {0, 3, 4}
Process k will never send ack to both.
V4 = {1, 4, 6}
So it will act as the arbitrator and establishes ME1
V5 = {0, 5, 6}
V6 = {2, 3, 6}
Maekawa’s algorithm
ME2. No deadlock. Unfortunately deadlock is V0 = {0, 1, 2}
possible! Assume 0, 1, 2 want to enter their
V1 = {1, 3, 5}
critical sections.
V2 = {2, 4, 5}
From V0= {0,1,2}, 0,2 send ack to 0, but 1 sends ack to 1;
V3 = {0, 3, 4}
From V1= {1,3,5}, 1,3 send ack to 1, but 5 sends ack to 2;
From V2= {2,4,5}, 4,5 send ack to 2, but 2 sends ack to 0; V4 = {1, 4, 6}
Now, 0 waits for 1 (to send a release), 1 waits for 2 (to send a V5 = {0, 5, 6}
release), , and 2 waits for 0 (to send a release), . So
deadlock is possible! V6 = {2, 3, 6}
QUESTION
• Does Maekawa's algorithm access the critical
section according to the increasing order of
timestamps? Justify with an example.
• Consider the following situation, with three
sites:
• V1 = { P1, P2 }
V2 = { P2, P3 }
V3 = { P1, P3 }
• These satisfy the conditions for Maekawa's
algorithm.
• Let the clocks at sites 1, 2, and 3 be C1 = 10, C2 = 20, and C3 = 30,
respectively. Then:
P2 sends REQUEST(2, 20) to P2 and P3
P2 receives REQUEST(2,20) from P2
P2 sends REPLY(2, 21) to P2
P2 receives REPLY(2, 21) from P2
P3 sends REQUEST(3, 30) to P1 and P3
P3 receives REQUEST(3,30) from P3
P3 sends REPLY(3, 31) to P3
P3 receives REPLY(3, 31) from P3
P1 receives REQUEST(3, 30) from P3
P1 sends REPLY(1, 31) to P3
P3 receives REPLY(1, 31) from P1
• At this point, P3 enters the critical section even
though its request has a timestamp greater
than that of P2.
• This works because Maekawa's algorithm
sends a REPLY to the first message that a
process receives. If a later request comes with
a lower timestamp, either a FAILED message is
sent or the REPLY is held.
Election Algorithms
Objective: Elect one process pi from a group of processes p1…pN
Utility: Elect a primary manager, a master process, a coordinator or a central server
Each process pi maintains the identity of the elected in the variable Electedi (NIL if it isn’t defined yet)
Properties to satisfy: pi,
Safety: Electedi = NIL or ElectedA =non-crashed
P process with the
largest identifier
Liveness: pi participates and sets Electedi NIL, or crashes
28
Election Algorithms
Ring-Based Election Algorithm
Bully Algorithm
29
Ring-Based Election Algorithm (1)
5
5
16
16
9
25
Process 5 starts 25
the election
25
30
Ring-Based Election Algorithm
Initialization
Participanti := FALSE;
Electedi := NIL
Pi starts an election
Participanti := TRUE;
Send the message <election, pi> to its neighbor
Receipt of a message <elected, pj > at pi
Participanti := FALSE;
If pi pj
Then Send the message <elected, pj> to its neighbor
31
Ring-Based Election Algorithm
Receipt of the election’s message <election, pi> at pj
If pi > pj
Then Send the message <election, pi> to its neighbor
Participantj := TRUE;
Else If pi < pj AND Participantj = FALSE
Then Send the message <election, pj> to its neighbor
Participantj := TRUE;
Else If pi = pj
Then Electedj := TRUE;
Participantj := FALSE;
Send the message <elected, pj> to its neighbor
32
Ring-Based Election: Example
33
17
Initiator
4
•In the example: The election was
started by process 17. 24
The highest process identifier
encountered so far is 24. 9
(final leader will be 33)
1
•The worst-case scenario occurs
15
when the counter-clockwise
neighbor (@ the initiator) has the 28 24
highest id.
Ring-Based Election: Analysis
The worst-case scenario occurs
when the counter-clockwise 33
neighbor has the highest id. 17
4
In a ring of N processes, in the
worst case:
24
A total of N-1 messages are
required to reach the new
coordinator-to-be (election 9
messages).
Another N messages are 1
required until the new coordinator-
to-be ensures it is the new
15
coordinator (election messages –
no changes). 24
28
Another N messages are required
to circulate the elected messages.
Total Message Complexity = 3N-1
Turnaround time = 3N-1
Bully Algorithm
Each process knows which processes have higher identifiers, and it can
communicate with all such processes
Three types of messages:
Election: starts an election
OK: sent in response to an election message
Coordinator: announces the new coordinator
Election started by a process when it notices, through timeouts, that the coordinator has failed
35
Bully Algorithm
2
Co
3 or 6
d in
C oo to a
rdcti n
ElO
K i no r
EOle ator
eCcKotio
ornd.
Process 5 detects it
Coordina
first OK tor
Election
n
5 7 New Coordinator
tio
c
Ele
tor
r d.
n
a
ord in tio r
C oo
Co
Ele
to
ec
a
El
c
i n
tio
1 r d 4
n
C oo
8 Coordinator
Coordinator failed
36
Bully Algorithm
• If P has the highest process id, it sends a Coordinator message to
all other processes and becomes the new Coordinator. Otherwise, P
broadcasts an Election message to all other processes with higher
process IDs than itself.
• If P receives no Answer after sending an Election message, then it
broadcasts a Coordinator message to all other processes and
becomes the Coordinator.
• If P receives an Answer from a process with a higher ID, it sends no
further messages for this election and waits for a Coordinator
message. (If there is no Coordinator message after a period of time,
it restarts the process at the beginning.)
• If P receives an Election message from another process with a lower
ID it sends an OK message back and starts the election process at
the beginning, by sending an Election message to higher-numbered
processes.
• If P receives a Coordinator message, it treats the sender as the
coordinator.
Bully Algorithm
Initialization
Electedi := NIL
pi starts the election
Send the message (Election, pi) to pj , i.e., pj > pi
Waits until messages (OK, pj) from pj are received;
If no message (OK, pj) arrives during T
Then Elected := pi;
Send the message (Coordinator, pi) to pj , i.e., pj < pi
Else waits until receipt of the message (coordinator)
(if it doesn’t arrive during another timeout T’, it begins another election)
38
Bully Algorithm
Receipt of the message (Coordinator, pj)
Elected := pj;
Receipt of the message (Election, pj ) at pi
Send the message (OK, pi) to pj
Start the election unless it has begun one already
When a process is started to replace a crashed process: it begins an election
39