0% found this document useful (0 votes)
4 views94 pages

Cs3551 - Distributed Computing - Part B & C

The document provides an overview of distributed systems, detailing their definitions, challenges, and differences from parallel systems. Key issues include heterogeneity, transparency, security, and scalability, while it also discusses various transparencies such as access, location, and failure transparency. Additionally, it contrasts communication models like message passing and shared memory, and outlines trends in distributed systems such as mobile computing and cloud computing.

Uploaded by

sivavbtechit
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views94 pages

Cs3551 - Distributed Computing - Part B & C

The document provides an overview of distributed systems, detailing their definitions, challenges, and differences from parallel systems. Key issues include heterogeneity, transparency, security, and scalability, while it also discusses various transparencies such as access, location, and failure transparency. Additionally, it contrasts communication models like message passing and shared memory, and outlines trends in distributed systems such as mobile computing and cloud computing.

Uploaded by

sivavbtechit
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT – I INTRODUCTION

PART – B & C
1. Define Distributed Systems. What are the significant issues and challenges of the distributed
systems? (Apr/May 2018) [UN]
Designing a distributed system does not come as easy and straight forward. A number of challenges
need to be overcome in order to get the ideal system.

1. Heterogeneity
2. Transparency
3. Openness
4. Concurrency
5. Security
6. Scalability
7. Failure Handling
2. Explain how a parallel system differs from a distributed system. [UN]
Parallel and distributed computing are important technologies that have key differences in their primary
function. Read on to learn more about them and how they differ.
Parallel computing, also known as parallel processing speeds up a computational task by dividing it
into smaller jobs across multiple processors inside one computer. Distributed computing, on the other
hand, uses a distributed system, such as the internet, to increase the available computing power and
enable larger, more complex tasks to be executed across multiple machines.

Figure 1: A distributed computing system compared to a parallel computing system.


What is Parallel Computing?
Parallel computing is the process of performing computational tasks across multiple processors at once
to improve computing speed and efficiency. It divides tasks into sub-tasks and executes them
simultaneously through different processors.
There are three main types, or “levels,” of parallel computing: bit, instruction, and task.
 Bit-level parallelism: Uses larger “words,” which is a fixed-sized piece of data handled as a unit
by the instruction set or the hardware of the processor, to reduce the number of instructions the
processor needs to perform an operation.
 Instruction-level parallelism: Employs a stream of instructions to allow processors to execute
more than one instruction per clock cycle (the oscillation between high and low states within a
digital circuit).
 Task-level parallelism: Runs computer code across multiple processors to run multiple tasks at
the same time on the same data.
What is Distributed Computing?
Distributed computing is the process of connecting multiple computers via a local network or wide area
network so that they can act together as a single ultra-powerful computer capable of performing
computations that no single computer within the network would be able to perform on its own.
3. Elaborate the challenges to be considered in the design of distributed system. (Apr / May 2023)
[CR]

1. Heterogeneity: Heterogeneity is applied to the network, computer hardware, operating system


and implementation of different developers.
2. Openness: The openness of the distributed system is determined primarily by the degree to which
new resource-sharing services can be made available to the users.
3. Scalability: Scalability of the system should remain efficient even with a significant increase in
the number of users and resources connected. It shouldn’t matter if a programme has 10 or 100
nodes; performance shouldn’t vary. A distributed system’s scaling requires consideration of a number
of elements, including size, geography, and management.
4. Security: Security of information system has three components confidentially, integrity and
availability. Encryption protects shared resources, keeps sensitive information secrets when
transmitted.
5. Failure Handling: When some faults occur in hardware and the software program, it may produce
incorrect results or they may stop before they have completed the intended computation so corrective
measures should to implement to handle this case.
6. Concurrency: There is a possibility that several clients will attempt to access a shared resource at
the same time. Multiple users make requests on the same resources, i.e read, write, and update.
7. Transparency: Transparency ensures that the distributes system should be perceived as a single
entity by the users or the application programmers rather than the collection of autonomous systems,
which is cooperating.
4. Write detailed notes on transparencies with respect to distributed systems. (Apr / May 2023)
[UN]
There are different kinds of transparencies that the distributed system has to incorporate. The following
are the different transparencies encountered in the distributed systems.
1. Access Transparency: Clients should be unaware of the distribution of the files. Applications written
for the local file should be able to be executed even for the remote files. The examples illustrating this
property are the File system in Network File System (NFS), SQL queries, and Navigation of the web.
2. Location Transparency: A location transparent name contains no information about the named
object’s physical location. The location and access transparencies together are sometimes referred as
Network transparency. The examples are File system in NFS and the pages of the web.
3. Concurrency Transparency: The concurrency control and its implementation is a hard task. The
examples are NFS, Automatic Teller machine (ATM) network. CSE 6306 Advanced Operating Systems.
4. Replication Transparency: This kind of transparency should be mainly incorporated for the
distributed file systems, which replicate the data at two or more sites for more reliability. The client
generally should not be aware that a replicated copy of the data exists. The clients should also expect
operations to return only one set of values. The examples are Distributed DBMS and Mirroring of Web
pages.
5. Failure Transparency: Enables the concealment of faults, allowing user and application programs to
complete their tasks despite the failure of hardware or software components. Fault tolerance is provided
by the mechanisms that relate to access transparency.
6. Migration Transparency: This transparency allows the user to be unaware of the movement of
information or processes within a system without affecting the operations of the users and the applications
that are running.
7. Performance Transparency: Allows the system to be reconfigured to improve the performance as the
load varies.
8. Scaling Transparency: A system should be able to grow without affecting application algorithms.
Graceful growth and evolution is an important requirement for most enterprises. A system should also be
capable of scaling down to small environments where required, and be space and/or time efficient as
required. The best-distributed system example implementing this transparency is the World Wide Web.
5. Compare the blocking and non blocking in distributed system. [AN]
A blocking RECEIVE does not RETURN until the message has arrived and been copied into the buffer of
the receiver process. A non-blocking RECEIVE can RETURN before the message has arrived and been
copied into the buffer of the receiver process.

In general, a process can send a communication in one of two ways: blocking or non blocking. After
sending a blocking communication, the process goes in to the waiting state until it receives are sponge.
Non blocking communication allows the process to continue execution after sending the communication.
Both types of communication are useful.
6. Illustrate the difference between Message passing and shared memory process Communication
model. (Nov /Dec 2022) [UN]
What is Shared Memory?
The fundamental model of inter-process communication is the shared memory system. In a shared
memory system, the collaborating communicates with each other by establishing the shared memory
region in the address space region.
If the process wishes to initiate communication and has data to share, create a shared memory region in
its address space. After that, if another process wishes to communicate and tries to read the shared data,
it must attach to the starting process's shared address space.
What is Message Passing?
In this message passing process model, the processes communicate with others by exchanging messages.
A communication link between the processes is required for this purpose, and it must provide at least
two operations: transmit (message) and receive (message). Message sizes might be flexible or fixed.
The main differences between the Shared Memory and the Message Passing are as follows:

Shared Memory Message Passing

It is mainly used for data communication. It is mainly used for communication.

It offers a maximum speed of computation because It takes a huge time because it is performed via
communication is completed via the shared the kernel (system calls).
memory, so the system calls are only required to
establish the shared memory.

The code for reading and writing the data from the No such code is required in this case because
shared memory should be written explicitly by the the message passing feature offers a method for
developer. communication and synchronization of
activities executed by the communicating
processes.
It is used to communicate between the single It is most commonly utilized in a distributed
processor and multiprocessor systems in which the setting when communicating processes are
processes to be communicated are on the same spread over multiple devices linked by a
machine and share the same address space. network.

It is a faster communication strategy than the It is a relatively slower communication strategy


message passing. than the shared memory.

Make sure that processes in shared memory aren't It is useful for sharing little quantities of
writing to the same address simultaneously. data without causing disputes.

7. Explain the need of Distributed systems its characteristics with example. [UN]
 Fault tolerance
 Consistency
 Security
 Reliability
 Concurrent transactions
8. Elaborate the examples of distributed systems. [CR]
To place distributed systems in a realistic context through examples: the Internet, an intranet and mobile
computing.
1. The Internet:
A vast interconnected collection of computer networks of many different [Link] message by
employing a common means of communication (Internet Protocol). The web is not equal to the Internet.
2. Intranet: An intranet is a private network that is contained within an enterprise.
It may consist of many interlinked local area networks and also use leased lines in the Wide Area
Network. It separately administrated and enforces local security policies. It is connected to the Internet
via a router. It uses firewall to protect an Intranet by preventing unauthorized messages leaving or
entering. Some are isolated from the Internet. Users in an intranet share data by means of file services.

3. Mobile and Ubiquitous Computing


a. Distributed systems techniques are equally applicable to mobile computing involving laptops, PDAs
and wearable computing devices.
b. Mobile computing (nomadic computing) - perform of computing tasks while moving (nomadic
computing)
c. Ubiquitous computing - small computers embedded in appliances
i. harness of many small, cheap computation devices
ii. It benefits users while they remain in a single environment such as home.
The goal of this section is to provide motivational examples of contemporary distributed systems and
the great diversity of the associated applications. As mentioned in the introduction,
networks are everywhere and underpin many everyday services that we now take for granted: the
Internet and he associated World Wide Web, web search, online gaming, email, social networks, E-
Commerce, etc.
9. Summarize the challenges to be considered in the design of DS. [UN]
 Heterogeneity
 Openness
 Security
 Scalability
 Failure handling
 Concurrency
 Transparency
 Quality of service
10. Survey the different trends in distributed Systems. (Apr / May 2018) [AP]
1. Trends in distributed systems
2. Pervasive technology – Modern Internet – Collection of internet worked devices- wired &
wireless – Pervasive resources and devices can be connected at any time and in anyplace
3. Trends in distributed systems intranet ISP desktop computer: backbone satellite link server:
network link: A typical portion of the Internet
4. Trends in distributed systems • Mobile & ubiquitous computing – Small and portable devices are
possible to be used within distributed systems • E.g. laptop computers, handheld devices,
wearable devices, devices embedded in appliances
5. Mobile computing: portability of the devices and the ability to connect to networks in different
places – Ubiquitous computing: small computing devices that available everywhere and are
easily attached to networks
6. Trends in distributed systems Portable & hand held devices in a distributed system
7. Distributed multimedia systems–The use of multimedia contents in distributed systems
8. Multimedia support – Major benefits of multimedia support
9. Distributed multimedia computing can be accessed through desktop or mobile devices. E.g. live
TV broadcast, video-on-demand, IP telephony, web casting, etc.
10. Cloud computing
11. Design the procedure for causality in an asynchronous execution with a suitable example.
(Nov / Dec 2022). [CR]
Asynchronous Executions an asynchronous execution (or A-execution) is an execution (E, ≺) for
which the causality relation is a partial order.
Though there is a physical link that delivers the messages sent on it in FIFO order due to the physical
properties of the medium, a logical link may be formed as a composite of physical links and multiple
paths may exist between the two end points of the logical link.
FIFO executions

Two send events s and s’ are related by causality ordering (not physical time ordering), then a causally
ordered execution requires that their corresponding receive events r and r’ occur in the same order at all
common destinations.
 If s and s’ are not related by causality, then CO is vacuously (blankly)satisfied.
 Causal order is used in applications that update shared data, distributed shared memory, or fair
resource allocation.
 The delayed message m is then given to the application for processing. The event of an application
processing an arrived message is referred to as a delivery event.
12. Discuss the Model of Distributed Computations. [CR]
A Distributed Program is composed of a set of n asynchronous processes, p1, p2, ..., pi , ..., pn. The
processes do not share a global memory and communicate solely by passing messages. The processes
do not share a global clock that is instantaneously accessible to these processes. Process execution and
message transfer are asynchronous. Without loss of generality, we assume that each process is running
on a different processor. Let Cij denote the channel from process pi to process pj and let mij denote a
message sent by pi to pj . The message transmission delay is finite and unpredictable.
A Model of Distributed Executions The execution of a process consists of a sequential execution of its
actions. The actions are atomic and the actions of a process are modeled as three types of events,
namely, internal events, message send events, and message receive events.
A send event changes the state of the process that sends the message and the state of the channel on
which the message is sent. A receive event changes the state of the process that receives the message
and the state of the channel on which the message is received.
The execution of process pi produces a sequence of events e 1 i , e 2 i , ..., e x i , e x+1 i , ... and is
denoted by Hi where Hi = (hi , →i) hi is the set of events produced by pi and binary relation →i defines
a linear order on these events. Relation →i expresses causal dependencies among the events of pi .
A Model of Distributed Executions The evolution of a distributed execution is depicted by a space-time
diagram. A horizontal line represents the progress of the process; a dot indicates an event; a slant arrow
indicates a message transfer. Since we assume that an event execution is atomic (hence, indivisible and
instantaneous), it is justified to denote it as a dot on a process line. In the Figure 2.1, for process p1, the
second event is a message send event, the third event is an internal event, and the
fourth event is a message receive event.
13. Build the Models of Communication Networks. [CR]
There are several models of the service provided by communication networks, namely, FIFO, Non-
FIFO, and causal ordering. In the FIFO model, each channel acts as a first-in first-out message queue
and thus, message ordering is preserved by a channel. In the non-FIFO model, a channel acts like a set
in which the sender process adds messages and the receiver process removes messages from it in a
random order
The “causal ordering” model is based on Lamport’s “happens before” relation. A system that supports
the causal ordering model satisfies the following property: CO: For any two messages mij and mkj , if
send(mij) −→ send(mkj), then rec(mij) −→ rec(mkj). This property ensures that causally related
messages destined to the same destination are delivered in an order that is consistent with their
causality relation.
Causally ordered delivery of messages implies FIFO message delivery. (Note that CO ⊂ FIFO ⊂ Non-
FIFO.) Causal ordering model considerably simplifies the design of distributed algorithms because it
provides a built-in synchronization.
The three main types of communication models in distributed systems are:
FIFO (first-in, first-out): each channel acts as a FIFO message queue.
Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and receiver
removes messages in random order.
Causal Ordering (CO): It follows Lamport’s law. N- FIFO  FIFO o The relation between the
three models is given by CO A system that supports the causal ordering model satisfies the following
property:
GLOBAL STATE
Distributed Snapshot represents a state in which the distributed system might have been in. A snapshot
of the system is a single configuration of the system. The global state of a distributed system is a
collection of the local states of its components, namely, the processes and the communication channels.
The state of a process at any time is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application. The state of a channel is
given by the set of messages in transit in the channel.
The state of a channel is difficult to state formally because a channel is a distributed entity and its state
depends upon the states of the processes it connects.
A distributed snapshot should reflect a consistent state. A global state is consistent if it could have been
observed by an external observer. For a successful Global State, all states must be consistent: If we
have recorded that a process P has received a message from a process Q, then we should have also
recorded that process Q had actually send that message.
Otherwise, a snapshot will contain the recording of messages that have been received but never sent.
The reverse condition (Q has sent a message that P has not received) is allowed. The notion of a global
state can be graphically represented by what is called a cut. A cut represents the last event that has been
recorded for each process. The history of each process if given by: Each event either is an internal
action of the process. We denote by si k the state of process pi immediately before the kth event occurs.
The state si in the global state S corresponding to the cut C is that of pi immediately after the last event
processed by pi in the cut – eici. The set of events eici is called the frontier of the cut.
Fig : Types of cuts Consistent states: The states should not violate causality. Such states are called
consistent global states and are meaningful global states. Inconsistent global states: They are not
meaningful in the sense that a distributed system can never be in an inconsistent state.

14. Justify the Global State of a Distributed System. [EV]


“A collection of the local states of its components, namely, the processes and the communication
channels.” The state of a process is defined by the contents of processor registers, stacks, local memory,
etc. and depends on the local context of the distributed application. The state of channel is given by the
set of messages in transit in the channel. The occurrence of events changes the states of respective
processes and channels.
An internal event changes the state of the process at which it occurs. A send event changes the state of
the process that sends the message and the state of the channel on which the message is sent. A receive
event changes the state of the process that or receives the message and the state of the channel on which
the message is received.
Notations L Sxi denotes the state of process pi after the occurrence of event e x i and before the event e
x+1 i . LS0 i denotes the initial state of process pi .LSxi is a result of the execution of all the events
executed by process pi till e x i . Let send(m)≤LSxi denote the fact that ∃y:1≤y≤x :: e y i =send(m). Let
rec(m)6≤LSx i denote the fact that ∀y:1≤y≤x :: e y i 6=rec(m)
A Channel State The state of a channel depends upon the states of the processes it connects. Let SCx,yij
denote the state of a channel Cij . The state of a channel is defined as follows: SCx,yij ={mij | send(mij)
≤ e x i V rec(mij) 6≤ e y j } Thus, channel state SCx,yij denotes all messages that pi sent upto event e x
i and which process pj had not received until event e y j
Global State The global state of a distributed system is a collection of the local states of the processes
and the channels.

In Figure 2.2: A global state GS1 = {LS1 1 , LS3 2 , LS3 3 , LS2 4 } is inconsistent because the state of
p2 has recorded the receipt of message m12, however, the state of p1 has not recorded its send. A
global state GS2 consisting of local states {LS2 1 , LS4 2 , LS4 3 , LS2 4 } is consistent; all the
channels are empty except C21 that contains message m21.
15. Why global states are essential in distributed computing systems? Elaborate with an
example.
[CR]
Thus, the ability to construct a global state and evaluate a predicate over such a state constitutes the
core of solutions to many problems in distributed computing. The global state of a distributed system
is the union of the states of the individual processes.
The global state of a distributed system is a collection of the local states of the processes and the
channels.
● A global state computed along a consistent cut is correct.
● The global state of a consistent cut comprises the local. state of each process at the time the cut
event happens.
Any process in the distributed system can initiate this global state recording algorithm using a special
message called MARKER. This marker traverse the distributed system across all communication
channel and cause each process to record its own state. In the end, the state of entire system (Global
state) is recorded.
Distributed applications/services execute concurrently on multiple machines.
• A Snapshot of the distributed application, i.e. a global picture is useful ○ Checkpointing: can restart
distributed application on failure ○ Garbage collection of objects: objects at servers that don’t have
any other objects (at any servers) with pointers to them ○ Deadlock detection: Useful in database
transaction systems ○ Termination of computation: Useful in batch computing systems like
Folding@Home, SETI@HomeGlobal Snapshot = Global State Individual state of each process in the
distributed system + Individual state of each communication channel in the distributed system.
• Capture the instantaneous state of each process
• Capture the instantaneous state of each communication channel, i.e., messages in transit on the
channels.
The notions of global time and global state are closely related.
● But, merely synchronizing clocks and taking local snapshots is not enough
● Need to account for messages in transit
● A process can (without freezing the whole computation) compute the best possible approximation
of a global state [Chandy & Lamport 85]
● A global state that could have occurred
● No process in the system can decide whether the state did really occur
● Guarantee stable properties (i.e. once they become true, they remain true)
UNIT – II LOGICAL TIME AND GLOBAL STATE
PART – B & C
1. Explain Physical clock synchronization algorithm. [UN]
Every computer contains a clock which is an electronic device that counts the oscillations in a
crystal at a particular frequency. Synchronization of these physical clocks to some known high
degree of accuracy is needed. This helps to measure the time relative to each local clock to
determine order between events.
i. Christian’s Algorithm

[Link] Berkley Algorithm


 This is an active time server approach where the time server periodically broadcasts its
clock time and the other nodes receive the message to correct their own clocks.
 The time server readjusts its own clock to this value and instead of sending the current time
to other computers it sends the amount of time each computer needs for readjustment. This
can be positive or negative value and is calculated based on the knowledge the time server
has about the propagation of message.
2. Distributed algorithms
Distributed algorithms overcome the problems of centralized by internally synchronizing for better
accuracy. One of the two approaches can be used:
i. Global Averaging Distributed Algorithms
ii. Localized Averaging Distributes Algorithms
2. Demonstrate the Christian’s method for synchronizing Clocks [UN]
Christian’s Algorithm is a clock synchronization algorithm is used to synchronize time with a
time server by client processes.
Algorithm:
filter_nonebright
ness_4
#Python3 program imitating a clock server
import socket
import date time
#function used to initiate the Clock Server definitiate Clock Server():
s=[Link]()
print("Socket successfully created")
#Server port port = 8000
[Link](('',port))
#Start listening to requests
[Link](5)
print("Socketislistening...")
#Clock Server Running forever
while True:
# Establish connection with client connection, address=s. accept()print('Server connected to',
address)
#Respond the client with server clock time
connection. send
(str (date time. Date time. now()).encode())
#Close the connection with the client process
[Link]()
#Driver function
If name =='main':
#Trigger the Clock Server
initiate Clock Server()
Output:
Socket successfully created Socket is listening...
3. Elaborate the Logical time and logical clocks. [CR]
Instead of synchronizing clocks, event ordering can be used
If two events occurred at the same process pi(i= 1,2,…N) then they occurred in the order observed
by pi, that is order→i
When a message, misspent between two processes, send(m)happened before receive(m)
Lamport[1978] generalized these two relationships into the happened-before relation: e →i
e'HB1:ife→ie'inprocesspi,thene→e'
HB2:for anymessage m,send(m)→receive(m)
HB3:ife→ e'and e'→ e'',thene→ e''

Lamport‘s logical clocks


Each process pi has a logical clock LiA monotonically increasing software counter not related to a
physical clock Apply Lamport times tamps to events with happened-before relation LC1: Li is
incremented by 1 before each event at process pi LC2:When process pi sends message m, it piggy
back st=Li When pj receives(m,t),itsetsLj:=max(Lj,t)and applies LC1 before time stamping the
event receive(m) e→e‗implies L(e)<L(e‗),but L(e)<L(e') doesnot implye→e‗
Totally ordered logical clocks
Some pairs of distinct events, generated by different processes, may have numerically identical
Lamport time stamps. Different processes may have same Lamport time Totally ordered logical
clocks.
If e is an event occurring at pi with local time stamp Ti, and if e‗is an event occurring at pj with
local time stamp Tj
Define global logical timestamps for the events to be (Ti, i) and (Tj, j)
Define (Ti,i)<(Tj,j) iff
Ti<TjorTi=Tjandi<j
Vector clocks
Short coming of Lamport clocks:
L(e)<L(e')doesn'timplye→e'
Vector clock: an array of N integers for a system of N processes each process keeps its own vector
clock Vi to timestamp local events Piggyback vector time stamps on messages
Rules for updating vector clocks:
Vi[i]]is the number of events that pi has time stamped
Viji](j≠i)is the number of events at pj that pi has been affected byVC1: Initially,
Vi[j]:=0forpi,j=1..N(N processes)
VC2: before pi time stamps an event, Vi[i]:=Vi[i]+1VC3:pipiggybackst=Vi on every message it
sends
VC4: when pi receives a time stamp t, it sets Vi[j]:=max(Vi[j ],t[j])for
j=1. N (merge operation)

Compare vector time stamps V=V‗if f V[j]=V‗[j]for j=1..N


V>=V‗iff V[j]<=V‗[j] for j=1..NV<V‗iff V<= V‗^V!=V‗ a→f since V(a)<V(f)
c||e since neither V(c) <=V(e)nor V(e) <=V(c)
4. Explain the types of group communications used in distributed system. (Nov / Dec 2022)
[UN]
Goal: Control how we communicate with a group of processes
Introduction

Most of our communications between computers is point-to-point. This is known as unicast. It is


what we use to communicate between a client and a server. It‘s what we use for implementing
higher-level protocols, such as remote procedure calls (RPC) and web services.
A variant of unicast communication is anycast. Anycast was created for IPv6 networking and is also
point-to-point communication (as in unicast) but the receiver is the nearest one of multiple receivers
with the same address.
For example, IPv6 uses this to allow a host to update the routing table of the nearest host.
5. What are the four different types of ordering the messages? Explain. (Nov / Dec 2022)
[UN]
Causal ordering of messages is one of the four semantics of multicast communication
namely unordered, totally ordered, causal, and sync-ordered communication.
Distributed program logic greatly depends on the order of delivery of messages. Several orderings
on messages have been defined: (i) non-FIFO, (ii) FIFO, (iii) causal order, and (iv) synchronous
order.
In the non-FIFO model, a channel acts like a set in which the sender process adds messages and the
receiver process removes messages from it in a random order. The ―causal ordering‖ model is based
on Lamport's ―happens before‖ relation.
FIFO order means messages are shown in the order they are sent. Casual order means messages are
shipped in a random order after receiving another message. Total order means all communications
are sent to all group members in the same order.
The causal ordering of messages describes the causal relationship between a message send event
and a message receive event. For example, if send(M1) -> send(M2) then every recipient of both
the messages M1 and M2 must receive the message M1 before receiving the message M2.
In synchronous communications, multiple parties continually listen for and act upon replies from
each other. One way to visualize the concept of synchronous communications is to imagine a real-
time online chat system designed for a retailer's customer support.
6. Elucidate on the Total and Casual Order in distributed system with a neat diagram.
(Nov / Dec 2022) [EV]
Types of Ordering
There are two ways to order.
1. Total Order
2. Partial Order
Total Order
Total order is used in single-machine systems where it is easy to figure out the order of events
because they happen one after the other. For any events that happen at the same time, the machine's
clock can be used since it‘s a single global clock for all processes.
Total ordering, on the other hand, is not possible in distributed systems because different machines
have different clocks and concurrent events cannot be ordered.
Casual Order
The causal ordering of messages describes the causal relationship between a message send event
and a message receive event. For example, if send(M1) -> send(M2) then every recipient of both
the messages M1 and M2 must receive the message M1 before receiving the message M2.
7. Design the procedure for casuality in a asynchronous execution with a suitable example.
(Nov / Dec 2022) [CR]
Asynchronous Definition
In general, asynchronous is a Greek word, which is made up of two words; "asyn," meaning "not
with," and "Chronos," meaning "time". It means asynchronous defines those objects or events that
are not synced or coordinated with time. The asynchronous word is used in different fields with
different aspects, but the general meaning always remains the same. In this topic, we will
understand the asynchronous definition and the various fields in which this term is being used.
What is Asynchronous?
o The term Asynchronous means not occurring or happening at the same time. It is the
opposite of synchronous, which means the events which are occurring at the same time.
o It defines the relationship between two or more objects that interact with each other within
the same environment or system, but at different time phases or time instant, and also does
not affect each other such as Offline Learning. In offline learning, the same content is
available for all the candidates, but each candidate takes the sessions at any random time
that does not match with other candidates.
o We can also understand it with a simple example of an email, as in this, people respond as
per their convenience; hence it is an asynchronous event.
Usage of Asynchronous in different fields
Asynchronous is a very broad term that is used in different fields and has various applications.
Some common fields are given below:
Asynchronous in Programming
In computer programming, it is used as an asynchronous process. The asynchronous process defines
a process that is operated independently with other processes. In computer programming, some
languages such as Python, JavaScript, and C#, also use asynchronous methods, which are also
known as the nonblocking code.
Asynchronous in Telecommunication.
Asynchronous transfer mode (ATM) is a data communication protocol used in telecommunications
to allow the asynchronous transmission of data without need of signals to pass through a mainframe
or centralized clock.
Asynchronous in Electrical
The asynchronous term is used in Electrical in the form of Asynchronous machine, Asynchronous
electrical machine, or asynchronous motor.
o Asynchronous electric machine: In such electric machines, the magnetic field and
the rotation are not the same.
o Asynchronous machines: These machines do not operate at synchronous speed,
which means their speed varies with the load.
o Asynchronous motor: It is an AC motor whose speed decreases with increasing
torque.
Asynchronous in Digital World
The asynchronous term is also used in the digital world.
o The main use is synchronous and asynchronous learning.
Asynchronous learning is opposite to synchronous learning. It means it takes place at
different times, and hence the instructor, the learner, and other participants are not engaged
in the learning process simultaneously.
o There is no real-time interaction with other people.
We can understand it with an example of corporate training, which is about providing online
content with pre-recorded videos or publishing an on-demand online exam.
The benefits of asynchronous learning
o Each candidate can learn on their own time and schedule.
o It needs less work for trainers as well as HR managers.
8. Elaborate any two logical clock categories in distributed systems with an example.
(Nov / Dec 2021) [CR]

o Distributed systems offer us the ability to solve problems we couldn‘t possibly solve with a
single machine. However, I have recently become painfully aware that every solution to a
problem brings with it a series of new problems and considerations to be made.

o Within the realm of distributed systems, one of the largest problems we gain over a single
machine system is attempting to keep a correct timeline of events. This is needed for a
variety of reasons — one of the most crucial is understanding the ordering and causality of
events within your system. This is what allows analysis to be performed when evaluating
why your system executed a specific action.
o For instance, if you want to know why a particular value was written to your database, you
may search for the series of events leading up to the point of the INSERT request being
created. You can then place these events in chronological order using their timestamp,
traversing from latest → oldest in order to understand the causality of actions leading up to
the final value being written [Figure 1].

9. Discuss the purpose of message ordering paradigms and provide example for
asynchronous execution communication in detail. Nov / Dec 2021) [CR]

o For any two events a and b, where each can be either a send or a receive event, the
notation.

o a ~ b denotes that a and b occur at the same process, i.e., a ∈Ei and b ∈Ei for some process
i. The send and receive event pair for a message called pair of corresponding events.

o For a given execution E, let the set of all send–receive event pairs be denoted as = {(s,r)
∈Ei × Ej | s corresponds to r}. 2.1 Message ordering paradigms

o Distributed program logic greatly depends on the order of delivery ofmessages.

o Several orderings on messages have been defined: (i) non-FIFO, (ii) FIFO, (iii) causal
order, and (iv) synchronous order. Asynchronous executions Definition 6.1 (A-execution):
An asynchronous execution (or A-execution) is an execution (E,≺) for which the causality
relation is a partial order.

o On a logical link between two nodes (is formed as multiple paths may exist) in the system,
if the messages are delivered in any order then it is known as non-FIFO executions.
Example: IPv4.
10. Explain global states and consistent cuts with example Time and Global States. [UN]
There are two formal models of distributed systems: synchronous and asynchronous.
Synchronous distributed systems have the following characteristics:
Generally, timing is a challenging an important issue in building distributed systems. Consider a
couple of examples:
 Suppose we want to build a distributed system to track the battery usage of a bunch of
laptop computers and we'd like to record the percentage of the battery each has remaining at
exactly 2pm.
 Suppose we want to build a distributed, real time auction and we want to know which of two
bidders submitted their bid first.
 Suppose we want to debug a distributed system and we want to know whether variable x1in
process p1 ever differs by more than 50 from variable x2 in process 2.
Clock Synchronization
Every computer has a physical clock that counts oscillations of a crystal. This hardware clock is
used by the computer's software clock to track the current time. However, the hardware clock is
subject to drift -- the clock's frequency varies and the time becomes inaccurate. As a result, any two
clocks are likely to be slightly different at any given time. The difference between two clocks is
called their skew.
There are several methods for synchronizing physical clocks.
1. External synchronization means that all computers in the system are synchronized with an
external source of time (e.g., a UTC signal).
2. Internal synchronization means that all computers in the system are synchronized with one
another, but the time is not necessarily accurate with respect to UTC.
The second process sets its clock to t + (max+min)/2wheremax and min are the upper and lower
bounds for the message transmission time respectively. This guarantees that the skew is at
most(max-min)/2.
Instead, a process p1 requests the current time from another process p2and measures the RTT
(Tround) of the request/reply. When p1 receives the time t from p2 it sets its time to t+T round/2.
The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an
internal synchronization mechanism t hat works by electing a master to coordinate the
synchronization. The master polls the other computers (called slaves) for their times, computes an
average, and tells each computer by how much it should adjust its clock.
Global States
It is often desirable to determine whether a particular property is true of a distributed system as it
executes.
In general, this problem is referred to as Global Predicate Evaluation. "A global state predicate is a
function that maps from the set of global state of processes in the system ρ to {True,False}."
Cuts
Because physical time cannot be perfectly synchronized in a distributed system it is not possible to
gather the global state of the system at a particular time. Cuts provide the ability to" assemble a
meaningful global state from local states recorded at different times".
Definitions:
 Ρ is a system of N processes pi (i=1,2, ...,N)
 History (pi)= hi= <e i0,e i1,...>
 Hi k=< ei0,e i1,...,e ik>-a finite prefix of the process's history
 Si k is the state of the process pi immediately before the k th event occurs
 All processes record sending and receiving of messages. If a process pi records the sending
of message m to process pj and pj has not recorded receipt of the message, then m is part of
the state of the channel between pi and pj.
 A cut is consistent if, for all events e and e':
o(e∈Cande'→e)⇒e '∈C

Distributed Debugging
To further examine how you might produce consistent cuts, we'll use the distributed
debuggingexample. Recall that we have several processes, each with a variable x i. "The safety
condition required in this example is |xi-xj|<=δ(i,j=1,2,...,N)."
The algorithm we'll discuss is a centralized algorithm that determines post hoc whether thesafety
condition was ever violated. The processes in the system, p1, p2, ..., pN, send their states to a
passive monitoring process, p0.p0 is not part of the system. Based on the states collected, p0 can
evaluate the safety condition.

11. Explain an algorithm using multicast and logical clocks for mutual exclusion. [UN]
Mutual exclusion is a concurrency control property which is introduced to prevent race conditions.
It is the requirement that a process cannot enter its critical section while another con current process
is currently present or executing in its critical section i.e only one process is allowed to execute the
critical section at any given instance of time.
In Distributed systems, we neither have shared memory nor a common physical clock and there for
we cannot solve mutual exclusion problem using shared variables.
A site in Distributed System do not have complete information of state of the system due to lack of
shared memory and a common physical clock.
Requirements of Mutual exclusion Algorithm:
 No Deadlock:
Two or more site should not endlessly wait for any message that will never arrive.
 No Starvation:
Every site who wants to execute critical section should get an opportunity to execute it in
finite time. Any site should not wait indefinitely to execute critical section while other site
are repeatedly executing critical section
 Fairness:
Each site should get a fair chance to execute critical section. Any request to execute critical
section must be executed in the order they are made i.e Critical
sectionexecutionrequestsshouldbeexecutedintheorderoftheirarrivalin the system.
 Fault Tolerance:
In case of failure, it should be able to recognize it by itself in order to continue functioning
without any disruption.
1. Token Based Algorithm:
o A unique token is shared among all the sites.
o If a site possesses the unique token, it is allowed to enter its critical section
o This approach uses sequence number to order requests for the critical section.
o Each requests for critical section contains a sequence number. This sequence number
is used to distinguish hold and current requests.
o This approach in Mutual exclusion as the token is unique
o Example:
o Suzuki-Kasami‘s Broadcast Algorithm
2. Non-token based approach:
o A site communicates with other sites in order to determine which sites should
execute critical section next. This requires exchange of two or more successive
round of messages among sites.
o This approach use timestamps instead of sequence number to order requests for the
critical section.
o Whenever a site make request for critical section, it gets a time stamp. Time stamp is
also used to resolve any conflict between critical section requests.
o All algorithm which follows non-token based approach maintains a logical clock.
Logical clocks get updated according to Lamport‘s scheme
o Example:
o Lamport's algorithm, Ricart – Agrawala algorithm
3. Quorum based approach:
o Instead of requesting permission to execute the critical section from all other sites,
Each site requests only a subset of sites which is called a quorum.
o Any two subsets of sites or Quorum contain a common site. This common site is
responsible to ensure mutual exclusion.
12. Write short notes on locks with suitable example. [UN]
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
 [Link],thedataitemcanonlyreadbythetransaction.
 Itcanbesharedbetweenthetransactionsbecausewhenthetransactionholdsalock,thenitcan'tupdat
ethe data on the data item.
2. Exclusive lock:
 In the exclusive lock, the data item can be both reads as well as written by the transaction.
This lock is exclusive, and in this lock, multiple transactions do not modify the same
Data simultaneously.
There are four types of lock protocols available:
1. Simplistic clock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all
the transactions to get the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.
2. Pre - Claiming Lock Protocol
 Pre- Claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
 Before initiating an execution of the transaction, it requests DBMS for all the lock on all
those data items.
 If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
 If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.

3. Two-phase locking(2PL)
 The two-phase lockingprotocoldividestheexecutionphaseofthetransactionintothree parts.
 In the first part, when the execution of the transaction starts, it seeks permission for the lock
it requires.

There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but
no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
13. Explain in detail about asynchronous execution with synchronous communication. [UN]
 Synchronous communication: The calling party requests a service, and waits for the service
to complete. Only when it receives the result of the service it continues with its work.
 Asynchronous communication: The calling party initiates a service call, but does not wait
for the result. The caller immediately continues with its work without caring for the result.
 A non-blocking GUI usually has nothing to do with the low-level communication contracts
and can be achieved by different means, e.g. parallel processing of the inter active and
communication tasks. The truth is that synchronous communication on a certain level of
abstraction can be implemented with asynchronous interfaces on another level of abstraction
and vice versa, if needed.
File based communication is often considered to be asynchronous. One party writes a file but does
not care if the other party is active, fetches the file or is able to process it. However it is possible to
implement another layer of functionality so that the second (reading) party gives feedback, e.g. by
writing a short result file, so that the first (writing) party can wait and poll for the result of the file
processing.
Communication over a database often is implemented by one party writing execution orders into a
special table and the other party reads this table periodically and
processesnewentries,markingthemas―done‖or―failed‖[Link]
communication pattern.
Synchronous services are easy to implement, since they keep the complexity of the communication
low by providing immediate feedback. They avoid the need to keep the context of a call on the
client and server side, including e.g. the caller‘s address or a message id, beyond the life time of the
request

 Connections with a lack of stability (e.g. a dial-in network connection is not available all the
time).
 The caller is not interested in the result of the call or cannot wait for the result for some
reason, [Link] must free its resources.
The simplest asynchronous message exchange pattern is called fire-and-forget and means that a
message is sent but no feedback is required (at least on that level of abstraction!).

14. How Casual order and Total Order is implemented in Synchronization. [UN]
Causal ordering
Causal ordering is a vital tool for thinking about distributed systems.
Messages sent between machines may arrive zero or more times at any point after they are sent
This is the sole reason that building distributed systems is hard.
For example, because of this property it is impossible for two computers communicating
Over a network to agree on the exact time. You can send me a message saying" it is now know how
long it took for that message to arrive. We can send messages back and forth all day but we will
never know for sure that we are synchronized.
If we can't agree on the time then we can't always agree on what order things happen in. Suppose I
say "my user logged on at10:00:00" and you say "my user logged on at10:00:01". Maybe mine was
first or maybe my clock is just fast relative to yours.
This concept is called causal ordering and is written like this:
A->B(event A is causally ordered before event B)
Let's define it a little more formally. We model the world as follows: We have a number of
machines on which we observe a series of events. These events are either specific to one machine
(eg user input) or are communications between machines.
If A and B happen on the same machine and A happens before B then A -> B If I send
you some message M and you receive it then(send M)->(recv M)
If A->B and B->C then A->C
In contrast, causal ordering is only a partial order – some times events happen with no possible
causal relationship i.e. not(A-> B or B->A).
If not(A->B)then A cannot possibly have caused B
The lack of a total global order is not just an accidental property of computer systems, it is a
fundamental property of the laws of physics. I claimed that understanding causal order makes many
other concepts much simpler. Let's skim over some examples.
Vector Clocks
Lamport‘s clocks and Vector clocks are data-structures which efficiently approximate the causal
ordering and so can be used by programs to reason about causality.
If A->B then LC_A<LC_B
If VC_A<VC_B then A->B
Different types of vector clock trade-off compression Vs accuracy by storing smaller or larger
portions of the causal history of an event.
Consistency
A distributed system is consistent exactly when the outside world can never observe two different
serialisations.
CAP Theorem
The CAP (Consistency-Availability-Partition) theorem also boils down to causality. When a
machine in a distributed system is asked to perform an action that depends on its current state it
must decide that state by choosing a serialisation of the events it has seen. It has two options:
 Choose a serialization of its current events immediately
 Wait until it is sure it has seen all concurrent events before choosing a serialisation
Ordering requires waiting
Even your hardware cannot escape this law. It provides the illusion of synchronous access to
memory at the cost of availability. If you want to write fast parallel programs then you need to
understand the messaging model used by the underlying hardware.
Eventual Consistency
A system is eventually consistent if the final state of each machine is the same regardless of how we
choose to serialize update events. An eventually consistent system allows us to sacrifice consistency
for availability without having the state of different machines diverge irreparably.
15. What is group communication? What are the Key areas of applications of group
communication? Explain the programming model for group communication.(Apr /May 2018)
[UN]
Group Communication.
A group is an operating system abstraction for a collective of related processes. A set of cooperative
processes may, for example, form a group to provide an extendable, efficient, available and reliable
service.
Multicast algorithms can be built on top of lower-level communication primitives such as point-to-
point sends and receives or perhaps by availing of specific network mechanisms designed for this
purpose.
The recipients would have to buffer messages until safe for delivery in case they were called on for
this role.
Ordered Reliable Multicasts
A FIFO ordered protocol guarantees that messages by the same sender are delivered in the order
that they were sent.
Every process p executes the following:-
multicast(R, m):
Tag m with sender (m) and seq
#(m)send(m) to all group including p
The receive(R, m) occurs as follows: up on
arrival(m)do
If p has not previously executed receive(R, m)then if
sender(m)<>p then
send(m)to all group
receive(R, m)
It is easy to use Reliable Multicast to build a FIFO Multicast algorithm. To F-multicast a message
m, a process q simply R-multicasts m. The difference is at the receiver which orders the delivery.
Multicast (F, m):
multicast (R, m)
If m is the it h message from q, then m is tagged sender (m)=q and seq#(m)=i.
For each q, every process p maintains a counter next[q] that indicates the sequence number of the
next F-multicast from q that p is willing to F-deliver.
16. Discuss in detail about Snapshot algorithms for FIFO channels. (Apr / May2021).
[CR]
Each distributed application has number of processes running on different physical servers. These
processes communicate with each other through messaging channels. A snapshot captures the local
states of each process along with the state of each communication channel.
Snapshots are required to:
Check pointing
Collecting garbage
Detecting deadlocks
Debugging
Chandy – Lamport algorithm
The algorithm will record a global snapshot for each process channel.
 The Chandy-Lamport algorithm uses a control message, called a marker.
 After a site has recorded its snapshot, it sends a marker along all of its outgoing channels
before sending out any more messages.
 Since channels are FIFO, a marker separates the messages in the channel into those to be
included in the snapshot from those not to be recorded in the snapshot.
 This addresses issue I1. The role of markers in a FIFO system is to act as delimiters for
the messages in the channels so that the channel state recorded by the process at the
receiving end of the channel satisfies the condition C2.
Initiating a snapshot
Process Pi initiates the snapshot
Pi records its own state and prepares a special marker message.
Send the marker message to all other processes.
Start recording all incoming messages from channels Cij for j not equal to i.
UNIT – III DISTRIBUTED MUTEX AND DEADLOCK
PART – B & C

1. Discuss in detail the requirements that mutual exclusion algorithms should satisfy and
discuss what metric we use to measure the performance of mutual exclusion algorithms.
(Apr / May 2021) [CR]
Requirements of Mutual exclusion Algorithm:
 No Deadlock: Two or more site should not endlessly wait for any message that will never
arrive.
 No Starvation: Every site who wants to execute critical section should get an opportunity
to execute it in finite time.
Mutual exclusion algorithm ensures that if a process is already performing write operation
on a data object no other process/thread is allowed to access the same object until the first
process has finished writing upon the data object and released the object for other
processes to read and write upon.

Performance Metrics
The performance is generally measured by the following four metrics: Message complexity: The
number of messages required per CS execution by a site. Synchronization delay: After a site
leaves the CS, it is the time required and before the next site enters the CS (see Figure 1).
Response time:
The time interval a request waits for its CS execution to be over after its request messages have
been sent out (see Figure 2).
System throughput: The rate at which the system executes requests for the CS. system
throughput=1/(SD+E) where SD is the synchronization delay and E is the average critical section
execution time.

2. Explain Maekawa’s algorithm for mutual exclusion in distributed system and its
drawback. (Apr /May 2022) [UN]
A distributed mutual exclusion algorithm, such as Maekawa's algorithm, makes sure that
concurrent processes in a distributed system exclude one another. The voting-based algorithm was
proposed in 1985 by Mamoru Maekawa and Toshimitsu Masuzawa.
The algorithm divides the total number of processes in a distributed system into quorums or pairs
of mutually intersecting subsets. Each process has a specific quorum allocated to it, and it can only
allow access to the shared resource if it has gotten consent from every other process in its quorum
as well as from every other process it has already received consent from.
A procedure can only grant permission if it is not already in its crucial portion in order to ensure
mutual exclusion. Additionally, a process can only ask another process for permission if that other
process isn't right now in its vital portion.
The technique ensures that a process will finally be able to access the shared resource since it
ensures that at least one process from each quorum will eventually grant permission.
The mutual exclusion method developed by Maekawa is scalable, decentralized, and appropriate
for massively distributed systems. The algorithm's performance, however, may be impacted by the
need for several messages to be sent back and forth between processes.
The voting ring structure in Maekawa's algorithm
One of the most important elements of Maekawa's algorithm is the voting ring structure. This
program creates a logical voting ring out of the processes in a distributed system. According to
their allocated unique identities, each process is placed in a circular arrangement within the voting
ring.
Which processes are permitted to grant authorization to use the shared resource is determined by
the voting ring structure. The quorum for each procedure is determined by where it is in the voting
ring. The process itself and a smaller subset of the other processes in the voting ring make up the
quorum.
Each quorum's size is determined with care to ensure that it intersects with the others. Because it
ensures that each process can request approval from at least one process in each quorum, the
intersection property is significant.
A process must first wait for approval from all processes in its quorum before deciding which
processes can provide permission. It can only issue permits to other processes in its quorum and to
all processes that have asked for it once it has obtained consent from all processes in its quorum.
Drawbacks of Maekawa's Algorithm:
This algorithm is deadlock prone because a site is exclusively locked by other sites and requests
are not prioritized by their timestamp.
3. Outline Lamport’s algorithm with an example. (Nov / Dec 2021) [UN]
Lamport's Algorithm Lamport was the first to give a distributed mutual exclusion algorithm as
an illustration of his clock synchronization scheme. Let Ri be the request set of site Si , i.e. the
set of sites from which Si needs permission when it wants to enter CS.
In Lamport's algorithm, ∀i : 1 ≤ i ≤ N :: Ri= {S1, S2,...,SN}. Every site Si keeps a queue,
request_queuei, which contains mutual exclusion requests ordered by their timestamps. This
algorithm requires messages to be delivered in the FIFO order between every pair of sites. The
Algorithm Requesting the critical section.
1. When a site Si wants to enter the CS, it sends REQUEST(tsi, i) message to all the sites in its
request set Ri and places the request on request_queuei (tsi is the timestamp of the request).
2. When a site Sj receives the REQUEST(t si, i) message from site Si, it returns a time stamped
REPLY message to Si and places site S' is request on request_queue j.
Executing the critical section. 1. Site Si enters the CS when the two following conditions hold: a)
[L1:] Si has received a message with timestamp larger than (tsi, i) from all other sites.
b) [L2:] S'is request is at the top request_queuei. Releasing the critical section.
1. Site Si, upon exiting the CS, removes its request from the top of its request queue and sends a
time stamped RELEASE message to all the sites in its request set.
2. When a site Sj receives a RELEASE message from site Si, it removes S'is request from its
request queue. When a site removes a request from its request queue, its own request may come
at the top of the queue, enabling it to enter CS. The algorithm executes CS requests in the
increasing order of timestamps.
4. Explain the ‘Snapshot’ algorithm of Lamport’s. [UN]
Chandy–Lamport’s global state recording algorithm
Each distributed system has a number of processes running on a number of different physical
servers. These processes communicate with each other via communication channels using text
messaging. These processes neither have a shared memory nor a common physical clock; this
makes the process of determining the instantaneous global state difficult.
A process could record it own local state at a given time but the messages that are in transit(on its
way to be delivered) would not be included in the recorded state and hence the actual state of the
system would be incorrect after the time in transit message is delivered.
The main idea behind proposed algorithm is that if we know that all message that hat have been sent
by one process have been received by another then we can record the global state of the system.
Any process in the distributed system can initiate this global state recording algorithm using a
special message called MARKER. This marker traverses the distributed system across all
communication channels and causes each process to record its own state. In the end, the state of
entire system (Global state) is recorded. This algorithm does not interfere with normal execution of
processes.
Assumptions of the algorithm:
 There is finite number of processes in the distributed system and they do not share memory
and clocks.
 There is finite number of communication channels and they are unidirectional and FIFO
ordered.
 There exists a communication path between any two processes in the system
 On a channel, messages are received in the same order as they are sent.
Algorithm:
 Marker sending rule for a process P:
o Process p records its own local state
o For each outgoing channel C from process P, P sends marker along C before
sending any other messages along
C.(Note:ProcessQwillreceivethismarkeronhisincomingchannelC1.)
 Marker receiving rule for a process Q:
o If process Q has not yet recorded its own local state then
 Record the state of incoming channel C1 as an empty sequence or null.
 After recording the state of incoming channel C1, process Q Follows the
marker sending rule
o If process Q has already recorded its state
 Record the state of incoming channel C1 as the sequence of messages
received along channel C1 after the state of Q was recorded and
beforeQreceivedthemarkeralongC1fromprocess P.
Need of taking snapshot or recording global state of the system:
 Check pointing: It helps in creating check point. If somehow application fails, this check
point can be used
 Garbage collection: It can be used to remove objects that do not have any references.
 It can be used in deadlock and termination detection.
It is also helpful in other debugging.
5. External synchronization ensures internal synchronization. But the vice versa
does not stand true, Justify. Explain Lamport’s algorithm in brief. (Apr / May 2021).
[UN]
External synchronization means that all computers in the system are synchronized with an
external source of time (e.g., a UTC signal).
Internal synchronization means that all computers in the system are synchronized with one
another, but the time is not necessarily accurate with respect to UTC.
In a system that provides guaranteed bounds on message transmission time, synchronization is
straightforward since upper and lower bounds on the transmission time for a message are known.
One process sends a message to another process indicating its current time, t. The second process
sets its clock to t + (max+min)/2 where max and min are the upper and lower bounds for the
message transmission time respectively. This guarantees that the skew is at most (max-min)/2.
Cristian's method for synchronization in systems that do not provide such guarantees is similar, but
does not rely on a predetermined max and min transmission time. Instead, a process p1 requests the
current time from another process p2 and measures the RTT (Tround) of the request/reply.
Whenp1 receives the time t from p2 it sets its time to t + Tround/2.
The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an
internal synchronization mechanism that works by electing a master to coordinate the
synchronization. The master polls the other computers (called slaves) for their times, computes an
average, and tells each computer by how much it should adjust its clock.
The Network Time Protocol (NTP) is yet another method for synchronizing clocks that uses a
hierarchical architecture where he top level of the hierarchy (stratum 1) are servers connected to a
UTC time source such as a GPS unit. The TPSN protocol is very similar to NTP.
Synchronization in WSN
Some classifications of synchronization protocols:
 Physical time versus logical time
 External versus internal synchronization
 A priori versus posteriori synchronization - post-facto synchronization in WSN has been
proposed in several papers. Post-facto synchronization happens on demand, after the event
occurs. In this way, synchronization -- which requires resources such as energy -- only
happens when necessary.
Performance metrics:
 Precision - synchronization error
 Energy cost
 Memory requirements
 Fault tolerance
 Scalability
Decomposition of packet delay:
 Send time - Time for the packet to pass from the application layer to the MAC layer --
highly variable based on delays introduced by OS.
 Access time - Time the MAC layer must wait for the channel to become free -- highly
variable based on make protocol and one of the most critical factors contributing to delay.
 Transmission time - Time to transmit entire packet -- mainly deterministic and can be
estimated based on packet size and radio speed.
 Propagation time - Time to traverse the link -- typically negligible in WSN.
 Reception time - Time for the receiver radio to receive bits and pass to MAC layer --
mainly deterministic.
 Receive time - Time for MAC layer to construct packet and send to application layer --
changes based on delays of OS.
6. Explain Ricart-Agrawala Algorithm with an example. [UN]
The Ricart–Agrawala algorithm is an algorithm for mutual exclusion on a distributed system. This
algorithm is an extension and optimization of Lamport's Distributed Mutual Exclusion Algorithm,
by removing the need for. messages. It was developed by Glenn Ricart and Ashok Agrawala.
Three type of messages ( REQUEST, REPLY and RELEASE) are used. exiting the critical section.
REPLY message to site Si if it has not sent a REPLY message to the site from the time it received
the last RELEASE message. Otherwise, it queues up the request.
Ricart–Agrawala algorithm is an algorithm for mutual exclusion in a distributed system proposed
by Glenn Ricart and Ashok Agrawala. This algorithm is an extension and optimization of
Lamport's Distributed Mutual Exclusion Algorithm.
Ricart - Agarwala s a non token based algorithm that uses broadcast technique for mutual
calculation.
Consider a scenario when process PO & P1 went to enter critical sector. Both P0 & P1 send
broadcast to all 3 processes in the system along with time stamp.

Since P2 does not want to enter critical section hence it sends "OK" to both the processes.
Since P1 finds Ts of P0 to be lesser than Ts of self therefore P1 sends to P0.

Now P0 can enter critical section, as it is received (n-1) OK messages.


Once P0 is done with critical section, it sends "Ok" message to P1.

Now P1 can enter CS.


One of the problem with this algorithm is that once a process P1 receives "OK" from all (n-1)
processes, it can enter CS subsequently without sending repeat message.
Secondly if the node having process fails then all other processes stare forever. this can be solved
by detecting failure of nodes.
7. Show that in Ricart agrawala algorithm the critical section is accessed in increasing order
of timestamp. (Apr /May 2022) [UN]
Ricart-Agrawala algorithm requires (N − 1) REQUEST messages and (N − 1) REPLY messages.
Thus, it requires 2(N − 1) messages per CS execution. Most mutual exclusion algorithms use a
static approach to invoke mutual exclusion.
A timestamp is given to each critical section request using Lamport's logical clock. Timestamp is
used to determine priority of critical section requests. Smaller timestamp gets high priority over
larger timestamp. The execution of critical section request is always in the order of their
timestamp.
Ricart & Agrawala put forth a fully distributed mutual exclusion algorithm in 1981. It requires the
following: total ordering of all events in a system (e.g. Lamport's timestamp algorithm with unique
timestamps) messages are reliable (the delivery of every message is acknowledged).
The Ricart-Agrawala [21] algorithm assumes the communication channels are FIFO. The algo-
rithm uses two types of messages: REQUEST and REPLY. A process sends a REQUEST message
to all other processes to request their permission to enter the critical section.

8. Explain in detail about Suzuki-Kasami’s algorithm with its critical section with an example.
(Apr / May 2023) [UN]
Suzuki–Kasami algorithm is a token-based algorithm for achieving mutual exclusion in
distributed [Link] is modification of Ricart–Agrawala algorithm, a permission based
(Non-token based) algorithm which uses REQUEST and REPLY messages to ensure mutual
exclusion.
In token-based algorithms, A site is allowed to enter its critical section if it possesses the
unique token. Non-token based algorithms uses timestamp to order requests for the critical
section where as sequence number is used in token based algorithms.
Each requests for critical section contains a sequence number. This sequence number is used to
distinguish old and current requests.
Data structure and Notations:
An array of integers RN[1…N]
A site Si keeps RNi[1…N], where RNi[j] is the largest sequence number received so far
through
REQUEST message from site S i.
 An array of integer LN[1…N]
This array is used by the [Link][J] is the sequence number of the request that is recently
executed by site Sj.
 A queue Q
This data structure is used by the token to keep record of ID of sites waiting for t he token
Algorithm:
 To enter Critical section:
 When a site Si wants to enter the critical section and it does not have the token
then it increments its sequence number RNi[i] and sends a request
message REQUEST(i, sn) to all other sites in order to request the token.
Here sn is update value of RNi[i]
 When a site Sj receives the request message REQUEST(i, sn) from site Si, it
sets RNj[i] to maximum of RNj[i] and sn i.e RNj[i] = max(RNj[i], sn).
 After updating RNj[i], Site Sj sends the token to site S i if it has token
and RNj[i] = LN[i] + 1
 To execute the critical section:
 Site Si executes the critical section if it has acquired the token.
To release the critical section:
After finishing the execution Site Si exits the critical section and does following:
sets LN[i] = RNi[i] to indicate that its critical section request RNi[i] has been executed
For every site Sj, whose ID is not present in the token queue Q, it appends its ID
to Q if RNi[j] = LN[j] + 1 to indicate that site Sj has an outstanding request.
After above updation, if the Queue Q is non-empty, it pops a site ID from the Q and sends the
token to site indicated by popped ID.
If the queue Q is empty, it keeps the token
Message Complexity:
The algorithm requires 0 message invocation if the site already holds the idle token at the time of
critical section request or maximum of N message per critical section execution. This N messages
involves (N – 1) request messages 1 reply message
9. Justify the drawbacks of Suzuki–Kasami Algorithm. [EV]
Non-symmetric Algorithm: A site retains the token even if it does not have requested for critical
section. According to definition of symmetric algorithm
“No site possesses the right to access its critical section when it has not been requested.”
Performance:
Synchronization delay is 0 and no message is needed if the site holds the idle token at the time of
its request.
In case site does not holds the idle token, the maximum synchronization delay is equal to
maximum message transmission time and a maximum of N message is required per critical section
invocation.
10. Name and explain different types of deadlock models in distributed system with the
commonly used strategies to handle deadlocks with a neat diagram. (Nov / Dec 2022) [UN]
Various deadlock handling strategies in the distributed system are as follows: There are mainly
three approaches to handling deadlocks: deadlock prevention, deadlock avoidance, and deadlock
detection.
Deadlock detection algorithms in Distributed System
Various deadlock detection algorithms in the distributed system are as follows:
1. Path-Pushing Algorithms
2. Edge-chasing Algorithms
3. Diffusing Computations Based Algorithms
4. Global State Detection Based Algorithms
Path-Pushing Algorithms
Path-pushing algorithms detect distributed deadlocks by keeping an explicit global WFG. The
main concept is to create a global WFG for each distributed system site. When a site in this class
of algorithms performs a deadlock computation, it sends its local WFG to all neighboring sites.
The term path-pushing algorithm was led to feature the sending around the paths of global WFG.
Edge-Chasing Algorithms
An edge-chasing method verifies a cycle in a distributed graph structure by sending special
messages called probes along the graph's edges. These probing messages are distinct from request
and response messages. If a site receives the matching probe that it previously transmitted, it can
cancel the formation of the cycle.
Diffusing Computations Based Algorithms
In this algorithm, deadlock detection computation is diffused over the system's WFG. These
techniques use echo algorithms to detect deadlocks, and the underlying distributed computation is
superimposed on this computation. If this computation fails, the initiator reports a deadlock global
state detection.
Global State Detection Based Algorithms
Deadlock detection algorithms based on global state detection take advantage of the following
facts:
1. A consistent snapshot of a distributed system may be taken without freezing the underlying
computation.
2. If a stable property exists in the system before the snapshot collection starts, it will be
preserved.
11. Discuss with suitable example to show that a deadlock cannot occur if anyone of the four
conditions is absent. (Apr /May 2022) [CR]
A deadlock in OS is a situation in which more than one process is blocked because it is holding a
resource and also requires some resource that is acquired by some other process. The four
necessary conditions for a deadlock situation are mutual exclusion, no preemption, hold and wait
and circular set.
Deadlock can be prevented by eliminating any of the four necessary conditions, which are mutual
exclusion, hold and wait, no preemption, and circular wait. Mutual exclusion, hold and wait and no
preemption cannot be violated practically.
Mutual Exclusion: A resource can be held by only one process at a time. In other words, if a
process P1 is using some resource R at a particular instant of time, then some other process P2
can't hold or use the same resource R at that particular instant of time.
Disadvantages in Deadlock
1. Deadlock is an infinite Process. There is only detection, resolution, prevention techniques.
But, there is no Deadlock stopping techniques.

2. The Process Waiting for the Process

Conditions Required for Deadlock in Operating Systems (OS)


There are four conditions because of which Deadlock is going to occur in Operating Systems. The
Conditions are:
1. Hold and Wait

2. No Pre Emption

3. Mutual Exclusion

4. Circular Wait
12. List out the four classes of distributed deadlock detection algorithms and explain any
two of them. (Apr / May 2021) [UN]
 Path push-in – Path info- dispatched to waiting node to blocking off node.
 Edge-chasing – Probe message area unit dispatched on graph edge.
 Diffusion computation
 Global nation detection
Path-Pushing Algorithms
Path-pushing algorithms detect distributed deadlocks by keeping an explicit global WFG. The
main concept is to create a global WFG for each distributed system site. When a site in this class
of algorithms performs a deadlock computation, it sends its local WFG to all neighboring sites.
The term path-pushing algorithm was led to feature the sending around the paths of global WFG.
Edge-Chasing Algorithms
An edge-chasing method verifies a cycle in a distributed graph structure by sending special
messages called probes along the graph's edges. These probing messages are distinct from request
and response messages. If a site receives the matching probe that it previously transmitted, it can
cancel the formation of the cycle.
Diffusing Computations Based Algorithms
In this algorithm, deadlock detection computation is diffused over the system's WFG. These
techniques use echo algorithms to detect deadlocks, and the underlying distributed computation is
superimposed on this computation. If this computation fails, the initiator reports a deadlock global
state detection.
Global State Detection Based Algorithms
Deadlock detection algorithms based on global state detection take advantage of the following
facts:
1. A consistent snapshot of a distributed system may be taken without freezing the underlying
computation.
2. If a stable property exists in the system before the snapshot collection starts, it will be
preserved.
13. What is a deadlock? How deadlock can be recovered? Explain distributed deadlocks.
[UN]
Deadlock:
Deadlock is a situation where each of the computer process waits for a resource which is being
assigned to some another process. In this situation, none of the process gets executed since the
resource it needs, is held by some other process which is also waiting for some other resource to be
released.
Deadlock Detection
1. If resources have single instance:
In this case for Deadlock detection we can run an algorithm to check for cycle in the
Resource Allocation Graph. Presence of cycle in the graph is the sufficient condition for
deadlock.

In the above diagram, resource 1 and resource 2 have single instances. There is a cycle
R1→P1→R2→[Link],DeadlockisConfirmed.
Deadlock Recovery
This involves a careful analysis of the system's state to determine which resources can be safely
released without compromising the system's integrity. There are four primary methods of deadlock
recovery: process termination, resource pre-emption, priority inversion, and rollback.
There are four primary methods of deadlock recovery: process termination, resource pre-
emption, priority inversion, and rollback.
Distributed Deadlocks
Distributed deadlocks can occur when distributed transactions or concurrency control are utilized
in distributed systems. It may be identified via a distributed technique like edge chasing or by
creating a global wait-for graph (WFG) from local wait-for graphs at a deadlock detector. Phantom
deadlocks are identified in a distributed system but do not exist due to internal system delays.
14. How we can achieve deadlock detection in distributed systems? Provide various
models to carry out the same. [AN]
Distributed deadlocks can develop in distributed systems when distributed transactions or
concurrency control are used. It is found via a distributed approach such as edge chasing or
constructing a global wait-for graph (WFG) from local wait-for graphs at a deadlock
detector. Phantom deadlocks are discovered in a distributed system but do not occur owing to
internal system delays.
As a distributed system is so large, deadlock cannot be prevented or avoided. As a result, only
deadlock detection is feasible. The following are necessary for distributed system deadlock
detection techniques:
 Progress: The technique should be able to detect all of the system's deadlocks.
 Safety: The procedure should not identify phantom or fake deadlocks.
Methods to detect deadlock
In distributed systems, there are three methods to detect deadlocks:
 Centralized approach: One resource is responsible for detecting deadlock. This technique
is beneficial because it is simple and quick to implement. Still, the disadvantages include a
high burden at one node, single-point failure, and reduced reliability.
 Distributed approach: It involves several nodes working together to detect deadlocks.
There is no one point of failure since the burden is evenly distributed across all nodes. The
discovery of deadlocks becomes faster as well.
 Hierarchical method: It is the most advantageous method. In a distributed system, it
integrates both centralized and distributed techniques for deadlock detection. In this
technique, some nodes or clusters of nodes are in charge of deadlock detection, which is
managed by a single node.
Algorithms for deadlock detection
The following Knapps' classification algorithms can be used to request resources in distributed
systems in a variety of ways:
Path pushing algorithm
The main idea is to generate a global WFG (wait-for graph) for each distributed system site:
1. When an algorithm class conducts a deadlock calculation, it broadcasts its local WFG to all
surrounding areas.
2. When it receives information from its neighbors, it does the exact computation and returns
the resulting local WFGs.
3. This process repeats until all locations have received their local WFGs.
Edge changing algorithms
This is a straightforward approach for detecting the existence of cycles in a distributed network.
Messages are classified into two types:
 Request
 Reply
A probe message is distinct from a recommendation or a reply message. Probe messages are
limited in length and size. A request message is sent from one site to another to access a requested
resource, and the responding site sends a reply message.
If the graph contains a cycle, the first site will get the matching reply message from the second site
through another channel (the second site is not directly connected to the first site). This is due to
the presence of a cycle.

15. Demonstrate the Chandy-Misra-Hass Detection Algorithm. [UN]

Another fully distributed deadlock detection algorithm is given by Chandy, Misra, and Hass
(1983). This is considered an edge-chasing, probe-based algorithm.

It is also considered one of the best deadlock detection algorithms for distributed systems.

If a process makes a request for a resource which fails or times out, the process generates a probe
message and sends it to each of the processes holding one or more of its requested resources.

Each probe message contains the following information:

 the id of the process that is blocked (the one that initiates the probe message);
 the id of the process is sending this particular version of the probe message; and
 the id of the process that should receive this probe message.

When a process receives a probe message, it checks to see if it is also waiting for resources.

If not, it is currently using the needed resource and will eventually finish and release the resource.
If it is waiting for resources, it passes on the probe message to all processes it knows to be holding
resources it has itself requested.

The process first modifies the probe message, changing the sender and receiver ids.

If a process receives a probe message that it recognizes as having initiated, it knows there is a
cycle in the system and thus, deadlock.

The following example is based on the same data used in the Silberschatz-Galvin algorithm
example. In this case P1 initiates the probe message, so that all the messages shown have P1 as the
[Link] the probe message is received by process P3, it modifies it and sends it to two more
[Link], the probe message returns to process P1. Deadlock!

The advantages of this algorithm include the following:

 It is easy to implement.
 Each probe message is of fixed length.
 There is very little computation.
 There is very little overhead.
 There is no need to construct a graph, nor to pass graph information to other sites.
 This algorithm does not find false (phantom) deadlock.
 There is no need for special data structures.
UNIT – IV CONSENSUS AND RECOVERY
PART – B & C
1. What are the key assumption underlying while designing agreement algorithms and brief
them. (Apr / May 2023) [CR]
Consensus algorithms necessarily assume that some processes and systems will be unavailable and
that some communications will be lost. Hence these algorithms must be fault-tolerant.
 Examples of consensus algorithm: Deciding whether to commit a distributed transaction to a
database.
 Synchronizing state machine replicas and ensuring consistency among them.
 Assumptions in Consensus algorithms  Failure models:
 Some of the processes may be faulty in distributed systems. A faulty process can behave in any
manner allowed by the failure model assumed. Some of the well known failure models includes
fail-stop, send omission and receive omission, and Byzantine failures. Fail stop model: a process
may crash in the middle of a step, which could be the execution of a local operation or processing
of a message for a send or receive event. it may send a message to only a subset of the destination
set before crashing. Byzantine failure model: a process may behave arbitrarily. The choice of the
failure model determines the feasibility and complexity of solving consensus.
Synchronous/asynchronous communication:
 If a failure-prone process chooses to send a message to process but fails, then intended process
cannot detect the non-arrival of the message.
This is a major hurdle in asynchronous system. In a synchronous system, a unsent message
scenario can be identified by the intended recipient, at the end of the round. The intended
recipient can deal with the non-arrival of the expected message by assuming the arrival of a
message containing some default data, and then proceeding with the next round of the algorithm.
Network connectivity: The system has full logical connectivity, i.e., each process can
communicate with any other by direct message passing. Sender identification:
 A process that receives a message always knows the identity of the sender process. When
multiple messages are expected from the same sender in a single round, a scheduling algorithm is
employed that sends these messages in sub-rounds, so that each message sent within the round can
be uniquely identified. Channel reliability:
 The channels are reliable, and only the processes may fail. Authenticated vs. non-authenticated
messages:
 With unauthenticated messages, when a faulty process relays a message to other processes (i) it
can forge the message and claim that it was received from another process, (ii) it can also tamper
with the contents of a received message before relaying it. Thus, faulty processes can inflict less
damage. Agreement variable:
 The agreement variable may be boolean or multi valued, and need not be an integer. This
simplifying assumption does not affect the results for other data types, but helps in the abstraction
while presenting the algorithms. Byzantine General problem The Byzantine Generals’ Problem
(BGP) is a classic problem faced by any distributed computer system network.
The army has many divisions and each division has a general.
 The generals communicate between each as well as between all lieutenants within their division
only through messengers. All the generals or commanders have to agree upon one of the two plans
of action.
 Exact time to attack all at once or if faced by fierce resistance then the time to retreat all at once.
The army cannot hold on forever.
2. Illustrate the different types of failure in distributed systems and explain how to prevent
them. (Nov / Dec 2022) [UN]
In a distributed database system, we need to deal with four types of failures: transaction failures
(aborts), site (system) failures, media (disk) failures, and communication line failures. Some of
these are due to hardware and others are due to software.
In a distributed database system, we need to deal with four types of failures: transaction failures
(aborts), site (system) failures, media (disk) failures, and communication line failures. Some of
these are due to hardware and others are due to software.
1. Transaction Failures: Transactions can fail for a number of reasons. Failure can be due to an
error in the transaction caused by incorrect input data as well as the detection of a present or
potential [Link] usual approach to take in cases of transaction failure is to abort the
transaction, thus resetting the database to its state prior to the start of this transaction.
2. Site (System) Failures: The reasons for system failure can be traced back to a hardware or to a
software failure. The system failure is always assumed to result in the loss of main memory
contents. Therefore, any part of the database that was in main memory buffers is lost as a result of
a system failure. Total failure refers to the simultaneous failure of all sites in the distributed
system; partial failure indicates the failure of only some sites while the others remain operational.
3. Media Failures: Media failure refers to the failures of the secondary storage devices that store
the database. Such failures may be due to operating system errors, as well as to hardware faults
such as head crashes or controller failures. The important point is that all or part of the database
that is on the secondary storage is considered to be destroyed and inaccessible.
4. Communication Failures There are a number of types of communication failures. The most
common ones are the errors in the messages, improperly ordered messages, lost messages, and
communication line failures.
3. Illustrate briefly the two kinds of checkpoints for checkpoint algorithm. (Nov / Dec 2022)
[UN]
There are two main approaches for check pointing in the distributed computing
systems: coordinated check pointing and uncoordinated check pointing. In the coordinated check
pointing approach, processes must ensure that their checkpoints are consistent.
In the coordinated check pointing approach, processes must ensure that their checkpoints are
consistent. This is usually achieved by some kind of two-phase commit protocol algorithm. In the
uncoordinated check pointing, each process checkpoints its own state independently.
Asynchronous check pointing: – After executing an event, the triplet is recorded without any
synchronization with other processes. – Local checkpoint consists of set of records, first are stored
in volatile log, then moved to stable log.
The product supports two checkpoint algorithms: the time-based algorithm and the record based
algorithm. Both are explained in the following sections.
Time-based algorithm
The time-based checkpoint algorithm commits global transactions at a specified time interval. The
following example declares a time-based algorithm in xJCL:
<checkpoint-algorithm name="timebased">
<classname>[Link]</classname>
<props>
<prop name="interval" value="15" />
<prop name="TransactionTimeOut" value="30" />
</props>
</checkpoint-algorithm>
The units of interval and TransactionTimeOut properties in the previous example are expressed in
seconds.
Record-based algorithm
The record-based checkpoint algorithm commits global transactions at a specified number of
iterations of the processJobStep method of batch step. Each call to the processJobStep method is
treated as iterating through one record. The processJobStep method can retrieve multiple records
from a batch data stream on each call. However, for this checkpoint algorithm one record is the
equivalent one call to the process Job Step method.
The following example declares a record-based algorithm in xJCL:
<checkpoint-algorithm name="record based">
<classname>[Link]</classname>
<props>
<prop name="record count" value="1000" />
<prop name="Transaction Time Out" value="60" />
</props>
</checkpoint-algorithm>
The unit of the Transaction Time Out property in the previous example is expressed in seconds.
If not specified in x JCL, the default transaction timeout is 60 seconds and the default record
count is 10000.
4. Discuss the issues in failure recovery with an example. (Apr / May 2022) [CR]

In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.

The processes communicate solely by exchanging messages over faultfree, FIFO communication
channels
Processes Pi, Pj , and Pk have taken checkpoints The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1
created an orphan message H

 Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1

 Messages C, D, E, and F are potentially problematic

 – Message C: a delayed message – Message D: a lost message since the send event for D is
recorded in the restored state for 𝑃𝑗, but the receive event has been undone at process 𝑃𝑖. – Lost
messages can be handled by having processes keep a message log of all the sent messages –
Messages E, F: delayed orphan messages. After resuming execution from their checkpoints,
processes will generate both of these messages.

5. Explain the agreement statement that should be followed in synchronous system with
failure. (Apr / May 2022) [UN]

In a failure-free system, consensus can be reached by collecting information from the different
processes, arriving at a decision, and distributing this decision in the system.

A distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received. The decision can be reached by using an
application specific function.

 Algorithms to collect the initial values and then distribute the decision may be based on the
token circulation on a logical ring, or the three-phase tree-based broadcast converge cast:
broadcast, or direct communication with all nodes. In a synchronous system, this can be done
simply in a constant number of rounds.

6. Write about the issues in failure recovery and discuss about any two recovery
mechanisms. (Nov / Dec 2021) [CR]

Issues in Failure Recovery

In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery.

The computation comprises of three processes Pi, Pj, and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels. Processes Pi, Pj , and Pk have taken checkpoints.

The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1 created an orphan message H

 Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1

 Messages C, D, E, and F are potentially problematic – Message C: a delayed message –


Message D: a lost message since the send event for D is recorded in the restored state for 𝑃𝑗, but
the receive event has been undone at process 𝑃𝑖. – Lost messages can be handled by having
processes keep a message log of all the sent messages – Messages E, F: delayed orphan
messages. After resuming execution from their checkpoints, processes will generate both of
these messages.
7. Outline the role of consensus and agreement algorithms in distributed, system
along with its underlying structure, benefits and an example. (Nov / Dec 2021) [UN]

System assumptions Failure models Synchronous/ Asynchronous communication Network


connectivity Sender identification Channel reliability Authenticated vs. non-authenticated
messages Agreement variable.

Problem Specifications Byzantine Agreement (single source has an initial value) Agreement: All
non-faulty processes must agree on the same value. Validity: If the source process is non-faulty,
then the agreed upon value by all the non-faulty processes must be the same as the initial value
of the source. Termination: Each non-faulty process must eventually decide on a value.
Consensus Problem (all processes have an initial value) Agreement: All non-faulty processes
must agree on the same (single) value. Validity: If all the non-faulty processes have the same
initial value, then the agreed upon value by all the non-faulty processes must be that same value.
Termination: Each non-faulty process must eventually decide on a value. Interactive Consistency
(all processes have an initial value) Agreement: All non-faulty processes must agree on the same
array of values A[v1 . . . vn]. Validity: If process i is non-faulty and its initial value is vi , then all
non-faulty processes agree on vi as the ith element of the array A. If process j is faulty, then the
non-faulty processes can agree on any value for A[j]. Termination: Each non-faulty process must
eventually decide on the array A. These problems are equivalent to one another! Show using
reductions.

 Further, common knowledge of the decision value can be obtained using an additional round.
In an asynchronous system, consensus can similarly be reached in a constant number of message
hops. Further, concurrent common knowledge of the consensus value can also be attained.
8. Construct Byzantine agreement problem in distributed system. [AP]
To achieve reliability, the fault-tolerance scheme of the distributed computing system must be
revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all
fault-free processors to agree on a common value, even if some components are corrupt.
The Byzantine agreement problem necessitates that a single value be agreed upon by all non-
faulty processors, with the initial value chosen by an arbitrarily chosen processor.
he byzantine General's problem is an illustrative example of byzantine faults: A number of
different armies want to besiege an enemy city. Success requires agreement on a common plan
amongst the disbursed army; communication is only by messages.
here are two categories of failures that are considered. One is fail-stop(in which the node fails and
stops operating) and other is arbitrary-node failure. Some of the arbitrary node failures are given
below : Failure to return a result.

9. Summarize the Agreement in a Failure-Free System . [UN]

In a failure-free system, consensus can be reached by collecting information from the different
processes, arriving at a decision, and distributing this decision in the system.

A distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received.
The decision can be reached by using an application specific function.

Algorithms to collect the initial values and then distribute the decision may be based on the token
circulation on a logical ring, or the three-phase tree-based broadcast converge cast: broadcast, or
direct communication with all nodes. In a synchronous system, this can be done simply in a
constant number of rounds.

Further, common knowledge of the decision value can be obtained using an additional round. In
an asynchronous system, consensus can similarly be reached in a constant number of message
hops.

Further, concurrent common knowledge of the consensus value can also be attained.

 Agreement in (message-passing) synchronous systems with failures Consensus algorithm for


crash failures (synchronous system) Consensus algorithm for crash failures message passing
synchronous system.

 The consensus algorithm for n processes where up to f processes where f < n may fail in a fail
stop failure model. Here the consensus variable x is integer value; each process has initial value
xi.

10. Classification of Agreement Protocol in Distributed Systems. [AN]

Byzantine Agreement Problem: The Byzantine agreement problem necessitates that a single
value be agreed upon by all non-faulty processors, with the initial value chosen by an arbitrarily
chosen processor.

The source processor broadcasts this initial value to all other processors, and the solution must
assure agreement on the same value among all non-faulty processors, with the exception that if the
source processor is not faulty, the agreed-upon value must match the source’s initial value.

In the event that the source processor is faulty, non-faulty processors can agree on any common
value, regardless of what faulty processors agree on or agree on at all.

The Consensus Problem: Each processor has its own initial value in the Consensus issue, and
non-faulty processors must agree on a single, common value.
Each processor sends out a broadcast its starting value to all other processors, which may be
different, and the solution must assure agreement on the same single value across all non-faulty
processors.

If the initial values are different, all non-faulty processors can agree on any common value. The
values reached by defective processors are meaningless.

Interactive Consistency Problem: The Interactive consistency problem necessitates the


agreement of all non-faulty processors on a set of common values, each with its unique initial
value.

Each CPU broadcasts its initial value to all other processors, and the solution must assure
agreement on the same vector across all non-faulty processes (V1, V2, …… , Vn). If the ith
processor is not defective and its initial value is vi, then the ith value that all non-faulty processors
must agree on must be vi. If the jth processor is malfunctioning, the remaining processors can
agree on any common value for vj.

11. Comparison of Agreement Protocols in Distributed Systems. [EV]

Agreement Failure
Purpose Requirements
Protocol Model

Agree upon a Each node must broadcast an initial


Byzantine single value among Byzantine value, and the solution must ensure
Agreement all non-faulty failure model agreement on the same value among
nodes all non-faulty nodes.

Agree upon a Each node has its own initial value,


Fail-stop or
single common and the solution must ensure
Consensus Crash failure
value among all agreement on the same value among
model
nodes all non-faulty nodes.
Agreement Failure
Purpose Requirements
Protocol Model

Agree on a set of Each node broadcasts its initial value,


Fail-stop or
Interactive common values and the solution must ensure
Crash failure
Consistency with unique initial agreement on the same vector among
model
values all non-faulty nodes.

12. Elaborate the Check pointing and rollback recovery. [CR]

Rollback recovery protocols restore the system back to a consistent state after a failure, It achieves
fault tolerance by periodically saving the state of a process during the failure free execution.

It treats a distributed system application as a collection of processes that communicate over a


network Checkpoints The saved state is called a checkpoint, and the procedure of restarting from
a previously check pointed state is called rollback recovery.

A checkpoint can be saved on either the stable storage or the volatile storage

Why is rollback recovery of distributed systems complicated?

Messages induce inter-process dependencies during failure-free operation Rollback propagation


The dependencies among messages may force some of the processes that did not fail to roll back.

This phenomenon of cascaded rollback is called the domino effect. Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing Techniques that avoid
domino effect

1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to form
a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.

The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage or
the volatile storage depending on the failure scenarios to be tolerated.

13. Construct the Background and definitions of System model. [CR]

A distributed system consists of a fixed number of processes, P1, P2,…_ PN , which communicate
only through messages. Processes cooperate to execute a distributed application and interact with
the outside world by receiving and sending input and output messages, respectively.

Rollback-recovery protocols generally make assumptions about the reliability of the inter-process
communication. Some protocols assume that the communication uses first-in-first-out (FIFO)
order, while other protocols assume that the communication subsystem can lose, duplicate, or
reorder messages. Rollback-recovery protocols therefore must maintain information about the
internal interactions among processes and also the external interactions with the outside world.

14. Justify the Checkpoint-based recovery in Distributed Systems. [EV]

Checkpoint-based rollback-recovery techniques can be classified into three categories:

1. Uncoordinated check pointing

2. Coordinated check pointing

3. Communication-induced check pointing


1. Uncoordinated Check pointing: Each process has autonomy in deciding when to take
checkpoints

Advantages

 The lower runtime overhead during normal execution

Disadvantages

i. Domino effect during a recovery

ii. Recovery from a failure is slow because processes need to iterate to find a consistent set of
checkpoints

iii. Each process maintains multiple checkpoints and periodically invoke a garbage collection
algorithm

iv. Not suitable for application with frequent output commits The processes record the
dependencies among their checkpoints caused by message exchange during failure-free operation
The following direct dependency tracking technique is commonly used in uncoordinated check
pointing.

2. Coordinated Check pointing

In coordinated check pointing, processes orchestrate their check pointing activities so that all local
checkpoints form a consistent global state Types

i. Blocking Check pointing: After a process takes a local checkpoint, to prevent orphan messages,
it remains blocked until the entire check pointing activity is complete Disadvantages: The
computation is blocked during the check pointing

ii. Non-blocking Check pointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated check pointing is to prevent a process from
receiving application messages that could make the checkpoint inconsistent.
3. Communication-induced Check pointing

Communication-induced check pointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints. Two types of checkpoints

i. Autonomous checkpoints

ii. Forced checkpoints

15. Explain the Koo and Toueg Coordinated Check pointing and Recovery Technique. [UN]

Koo and Toueg coordinated check pointing and recovery technique takes a consistent set of
checkpoints and avoids the domino effect and live lock problems during the recovery.

• Includes 2 parts: the check pointing algorithm and the recovery algorithm .

The Check pointing Algorithm The checkpoint algorithm makes the following assumptions about
the distributed system: Processes communicate by exchanging messages through communication
channels.

 Communication channels are FIFO.

 Assume that end-to-end protocols (the sliding window protocol) exist to handle with message
loss due to rollback recovery and communication failure. Communication failures do not divide
the network.

 The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent and
Tentative.

First Phase

1. An initiating process Pi takes a tentative checkpoint and requests all other processes to take
tentative checkpoints. Each process informs Pi whether it succeeded in taking a tentative
checkpoint.

2. A process says “no” to a request if it fails to take a tentative checkpoint 3. If Pi learns that all
the processes have successfully taken tentative checkpoints, Pi decides that all tentative
checkpoints should be made permanent; otherwise, Pi decides that all the tentative checkpoints
should be thrown-away.

Second Phase

1. Pi informs all the processes of the decision it reached at the end of the first phase.

2. A process, on receiving the message from Pi will act accordingly.

3. Either all or none of the processes advance the checkpoint by taking permanent checkpoints.

4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot send
messages related to the basic computation until it is informed of Pi’s decision.

16. Determine the Algorithm for asynchronous check pointing and recovery in Distributed
Systems. [EV]

The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.

A. System Model and Assumptions

The algorithm makes the following assumptions about the underlying system:

The message transmission delay is arbitrary, but finite.

Underlying computation/application is event-driven: process P is at state s, receives message m,


processes the message, moves to state s’ and send messages out. So the triplet (s, m, msgs_sent)
represents the state of P

Two type of log storage are maintained: – Volatile log: short time to access but lost if processor
crash. Move to stable log periodically. – Stable log: longer time to access but remained if crashed
A. Asynchronous Check pointing

After executing an event, the triplet is recorded without any synchronization with other processes.
– Local checkpoint consists of set of records, first are stored in volatile log, then moved to stable
log.
B. The Recovery Algorithm

Notations and data structure The following notations and data structure are used by the algorithm:

• RCVDi←j(CkPti) represents the number of messages received by processor pi from processor pj


, from the beginning of the computation till the checkpoint CkPti.

Basic idea

Since the algorithm is based on asynchronous check pointing, the main issue in the recovery is to
find a consistent set of checkpoints to which the system can be restored.

The recovery algorithm achieves this by making each processor keep track of both the number of
messages it has sent to other processors as well as the number of messages it has received from
other processors. Whenever a processor rolls back, it is necessary for all other processors to find
out if any message has become an orphan message.

Orphan messages are discovered by comparing the number of messages sent to and received from
neighboring processors.

For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.
UNIT – V CLOUD COMPUTING
PART B & C
1. Originate of the Cloud Computing. [CR]
Before emerging the cloud computing, there was Client/Server computing which is basically a
centralized storage in which all the software applications, all the data and all the controls are
resided on the server side.
If a single user wants to access specific data or run a program, he/she need to connect to the server
and then gain appropriate access, and then he/she can do his/her business.
Then after, distributed computing came into picture, where all the computers are networked
together and share their resources when needed.
On the basis of above computing, there was emerged of cloud computing concepts that
later implemented.
At around in 1961, John MacCharty suggested in a speech at MIT that computing can be sold like
a utility, just like a water or electricity. It was a brilliant idea, but like all brilliant ideas, it was
ahead if its time, as for the next few decades, despite interest in the model, the technology simply
was not ready for it.
But of course time has passed and the technology caught that idea and after few years we
mentioned that:
In 1999, [Link] started delivering of applications to users using a simple website. The
applications were delivered to enterprises over the Internet, and this way the dream of computing
sold as utility were true.
In 2002, Amazon started Amazon Web Services, providing services like storage, computation and
even human intelligence. However, only starting with the launch of the Elastic Compute Cloud in
2006 a truly commercial service open to everybody existed.
In 2009, Google Apps also started to provide cloud computing enterprise applications.
Of course, all the big players are present in the cloud computing evolution, some were earlier,
some were later. In 2009, Microsoft launched Windows Azure, and companies like Oracle and
HP have all joined the game. This proves that today, cloud computing has become mainstream.
2. Demonstrate the Cloud Computing Architecture. [UN]
As we know, cloud computing technology is used by both small and large organizations to store
the information in cloud and access it from anywhere at anytime using the internet connection.
Cloud computing architecture is a combination of service-oriented architecture and event-
driven architecture.
Cloud computing architecture is divided into the following two parts -
o Front End
o Back End
The below diagram shows the architecture of cloud computing -

Front End
The front end is used by the client. It contains client-side interfaces and applications that are
required to access the cloud computing platforms. The front end includes web servers (including
Chrome, Firefox, internet explorer, etc.), thin & fat clients, tablets, and mobile devices.
Back End
The back end is used by the service provider. It manages all the resources that are required to
provide cloud computing services. It includes a huge amount of data storage, security mechanism,
virtual machines, deploying models, servers, traffic control mechanisms, etc.
3. Explain the Components of Cloud Computing Architecture. [UN]
There are the following components of cloud computing architecture -
1. Client Infrastructure
Client Infrastructure is a Front end component. It provides GUI (Graphical User Interface) to
interact with the cloud.
2. Application
The application may be any software or platform that a client wants to access.
3. Service
A Cloud Services manages that which type of service you access according to the client’s
requirement.
Cloud computing offers the following three type of services:
i. Software as a Service (SaaS) – It is also known as cloud application services. Mostly, SaaS
applications run directly through the web browser means we do not require to download and install
these applications. Some important example of SaaS is given below –
Example: Google Apps, Salesforce Dropbox, Slack, Hubspot, Cisco WebEx.
ii. Platform as a Service (PaaS) – It is also known as cloud platform services. It is quite similar
to SaaS, but the difference is that PaaS provides a platform for software creation, but using SaaS,
we can access software over the internet without the need of any platform.
Example: Windows Azure, [Link], Magento Commerce Cloud, OpenShift.
iii. Infrastructure as a Service (IaaS) – It is also known as cloud infrastructure services. It is
responsible for managing applications data, middleware, and runtime environments.
Example: Amazon Web Services (AWS) EC2, Google Compute Engine (GCE), Cisco Metapod.
4. Runtime Cloud
Runtime Cloud provides the execution and runtime environment to the virtual machines.
5. Storage
Storage is one of the most important components of cloud computing. It provides a huge amount
of storage capacity in the cloud to store and manage data.
6. Infrastructure
It provides services on the host level, application level, and network level. Cloud infrastructure
includes hardware and software components such as servers, storage, network devices,
virtualization software, and other storage resources that are needed to support the cloud computing
model.
7. Management
Management is used to manage components such as application, service, runtime cloud, storage,
infrastructure, and other security issues in the backend and establish coordination between them.
8. Security
Security is an in-built back end component of cloud computing. It implements a security
mechanism in the back end.
9. Internet
The Internet is medium through which front end and back end can interact and communicate with
each other.
4. Summarize the Cloud Computing Technologies. [UN]
A list of cloud computing technologies are given below -
o Virtualization
o Service-Oriented Architecture (SOA)
o Grid Computing
o Utility Computing
Virtualization
Virtualization is the process of creating a virtual environment to run multiple applications and
operating systems on the same server. The virtual environment can be anything, such as a single
instance or a combination of many operating systems, storage devices, network application servers,
and other environments.
Grid Computing
Grid computing is also known as distributed computing. It is a processor architecture that
combines various different computing resources from multiple locations to achieve a common
goal. In grid computing, the grid is connected by parallel nodes to form a computer cluster. These
computer clusters are in different sizes and can run on any operating system.
Grid computing contains the following three types of machines -
1. Control Node: It is a group of server which administrates the whole network.
2. Provider: It is a computer which contributes its resources in the network resource pool.
3. User: It is a computer which uses the resources on the network.
Mainly, grid computing is used in the ATMs, back-end infrastructures, and marketing
research.
Utility Computing
Utility computing is the most trending IT service model. It provides on-demand computing
resources (computation, storage, and programming services via API) and infrastructure based on
the pay per use method. It minimizes the associated costs and maximizes the efficient use of
resources. The advantage of utility computing is that it reduced the IT cost, provides greater
flexibility, and easier to manage.
Large organizations such as Google and Amazon established their own utility services for
computing storage and application.
5. Classify the Cloud Computing Applications. [UN]
Cloud service providers provide various applications in the field of art, business, data storage and
backup services, education, entertainment, management, social networking, etc.
The most widely used cloud computing applications are given below -
1. Art Applications
Cloud computing offers various art applications for quickly and easily design attractive cards,
booklets, and images. Some most commonly used cloud art applications are given below:
i Moo
Moo is one of the best cloud art applications. It is used for designing and printing business cards,
postcards, and mini cards.
ii. Vistaprint
Vistaprint allows us to easily design various printed marketing products such as business cards,
Postcards, Booklets, and wedding invitations cards.
iii. Adobe Creative Cloud
Adobe creative cloud is made for designers, artists, filmmakers, and other creative professionals.
It is a suite of apps which includes PhotoShop image editing programming, Illustrator, InDesign,
TypeKit, Dreamweaver, XD, and Audition.
2. Business Applications
Business applications are based on cloud service providers. Today, every organization requires the
cloud business application to grow their business. It also ensures that business applications are
24*7 available to users.
There are the following business applications of cloud computing -
i. MailChimp
MailChimp is an email publishing platform which provides various options to design,
send, and save templates for emails.
iii. Salesforce
Salesforce platform provides tools for sales, service, marketing, e-commerce, and more. It also
provides a cloud development platform.
iv. Chatter
Chatter helps us to share important information about the organization in real time.
v. Bitrix24
Bitrix24 is a collaboration platform which provides communication, management, and social
collaboration tools.
vi. Paypal
Paypal offers the simplest and easiest online payment mode using a secure internet account.
Paypal accepts the payment through debit cards, credit cards, and also from Paypal account
holders.
vii. Slack
Slack stands for Searchable Log of all Conversation and Knowledge. It provides a user-
friendly interface that helps us to create public and private channels for communication.
viii. Quickbooks
Quickbooks works on the terminology "Run Enterprise anytime, anywhere, on any device." It
provides online accounting solutions for the business. It allows more than 20 users to work
simultaneously on the same system.
3. Data Storage and Backup Applications
Cloud computing allows us to store information (data, files, images, audios, and videos) on the
cloud and access this information using an internet connection. As the cloud provider is responsible
for providing security, so they offer various backup recovery application for retrieving the lost
data.
A list of data storage and backup applications in the cloud are given below -
i. [Link]
Box provides an online environment for secure content management,
workflow, and collaboration. It allows us to store different files such as Excel, Word, PDF, and
images on the cloud. The main advantage of using box is that it provides drag & drop service for
files and easily integrates with Office 365, G Suite, Salesforce, and more than 1400 tools.
ii. Mozy
Mozy provides powerful online backup solutions for our personal and business data. It schedules
automatically back up for each day at a specific time.
iii. Joukuu
Joukuu provides the simplest way to share and track cloud-based backup files. Many users use
joukuu to search files, folders, and collaborate on documents.
iv. Google G Suite
Google G Suite is one of the best cloud storage and backup application. It includes Google
Calendar, Docs, Forms, Google+, Hangouts, as well as cloud storage and tools for managing cloud
apps. The most popular app in the Google G Suite is Gmail. Gmail offers free email services to
users.
4. Education Applications
Cloud computing in the education sector becomes very popular. It offers various online distance
learning platforms and student information portals to the students. The advantage of using
cloud in the field of education is that it offers strong virtual classroom environments, Ease of
accessibility, secure data storage, scalability, greater reach for the students, and minimal hardware
requirements for the applications.
There are the following education applications offered by the cloud -
i. Google Apps for Education
Google Apps for Education is the most widely used platform for free web-based email, calendar,
documents, and collaborative study.
ii. Chromebooks for Education
Chromebook for Education is one of the most important Google's projects. It is designed for the
purpose that it enhances education innovation.
iii. Tablets with Google Play for Education
It allows educators to quickly implement the latest technology solutions into the classroom and
make it available to their students.
iv. AWS in Education
AWS cloud provides an education-friendly environment to universities, community colleges, and
schools.
5. Entertainment Applications
Entertainment industries use a multi-cloud strategy to interact with the target audience. Cloud
computing offers various entertainment applications such as online games and video conferencing.
i. Online games
Today, cloud gaming becomes one of the most important entertainment media. It offers various
online games that run remotely from the cloud. The best cloud gaming services are Shaow,
GeForce Now, Vortex, Project xCloud, and PlayStation Now.
ii. Video Conferencing Apps
Video conferencing apps provides a simple and instant connected experience. It allows us to
communicate with our business partners, friends, and relatives using a cloud-based video
conferencing. The benefits of using video conferencing are that it reduces cost, increases
efficiency, and removes interoperability.
6. Management Applications
Cloud computing offers various cloud management tools which help admins to manage all types
of cloud activities, such as resource deployment, data integration, and disaster recovery. These
management tools also provide administrative control over the platforms, applications, and
infrastructure.
Some important management applications are -
i. Toggl
Toggl helps users to track allocated time period for a particular project.
ii. Evernote
Ever note allows you to sync and save your recorded notes, typed notes, and other notes in one
convenient place. It is available for both free as well as a paid version.
It uses platforms like Windows, macOS, Android, iOS, Browser, and Unix.
iii. Outright
Outright is used by management users for the purpose of accounts. It helps to track income,
expenses, profits, and losses in real-time environment.
iv. Go To Meeting
GoToMeeting provides Video Conferencing and online meeting apps, which allows you to start
a meeting with your business partners from anytime, anywhere using mobile phones or tablets.
Using GoToMeeting app, you can perform the tasks related to the management such as join
meetings in seconds, view presentations on the shared screen, get alerts for upcoming meetings,
etc.
6. Illustrate the Security Risks of Cloud Computing. [UN]
Cloud computing provides various advantages, such as improved collaboration, excellent
accessibility, Mobility, Storage capacity, etc. But there are also security risks in cloud computing.
Some most common Security Risks of Cloud Computing are given below-
I. Data Loss
Data loss is the most common cloud security risks of cloud computing. It is also known as data
leakage. Data loss is the process in which data is being deleted, corrupted, and unreadable by a
user, software, or application.
II. Hacked Interfaces and Insecure APIs
As we all know, cloud computing is completely depends on Internet, so it is compulsory to protect
interfaces and APIs that are used by external users. APIs are the easiest way to communicate with
most of the cloud services. In cloud computing, few services are available in the public domain.
These services can be accessed by third parties, so there may be a chance that these services easily
harmed and hacked by hackers.
III. Data Breach
Data Breach is the process in which the confidential data is viewed, accessed, or stolen by the third
party without any authorization, so organization's data is hacked by the hackers.
IV. Vendor lock-in
Vendor lock-in is the of the biggest security risks in cloud computing. Organizations may face
problems when transferring their services from one vendor to another. As different vendors
provide different platforms, that can cause difficulty moving one cloud to another.
V. Increased complexity strains IT staff
Migrating, integrating, and operating the cloud services is complex for the IT staff. IT staff must
require the extra capability and skills to manage, integrate, and maintain the data to the cloud.
VI. Spectre & Meltdown
Spectre & Meltdown allows programs to view and steal data which is currently processed on
computer. It can run on personal computers, mobile devices, and in the cloud. It can store the
password, your personal information such as images, emails, and business documents in the
memory of other running programs.
VII. Denial of Service (DoS) attacks
Denial of service (DoS) attacks occur when the system receives too much traffic to buffer the
server. Mostly, DoS attackers target web servers of large organizations such as banking sectors,
media companies, and government organizations. To recover the lost data, DoS attackers charge a
great deal of time and money to handle the data.
VIII. Account hijacking
Account hijacking is a serious security risk in cloud computing. It is the process in which
individual user's or organization's cloud account (bank account, e-mail account, and social media
account) is stolen by hackers. The hackers use the stolen account to perform unauthorized
activities.
7. Classify the Different types of Cloud Computing. [AN]
Types of Cloud
There are the following 4 types of cloud that you can deploy according to the organization's needs-
o Public Cloud
o Private Cloud
o Hybrid Cloud
o Community Cloud
Public Cloud
Public cloud is open to all to store and access information via the Internet using the pay-per-usage
method.
In public cloud, computing resources are managed and operated by the Cloud Service Provider
(CSP).
Example: Amazon elastic compute cloud (EC2), IBM SmartCloud Enterprise, Microsoft, Google
App Engine, Windows Azure Services Platform.
Advantages of Public Cloud
There are the following advantages of Public Cloud -
o Public cloud is owned at a lower cost than the private and hybrid cloud.
o Public cloud is maintained by the cloud service provider, so do not need to worry about the
maintenance.
Disadvantages of Public Cloud
o Public Cloud is less secure because resources are shared publicly.
o Performance depends upon the high-speed internet network link to the cloud provider.
o The Client has no control of data.
Private Cloud
Private cloud is also known as an internal cloud or corporate cloud. It is used by organizations
to build and manage their own data centers internally or by the third party. It can be deployed using
Opensource tools such as Openstack and Eucalyptus.
Based on the location and management, National Institute of Standards and Technology (NIST)
divide private cloud into the following two parts-
o On-premise private cloud
o Outsourced private cloud
Advantages of Private Cloud
There are the following advantages of the Private Cloud -
o Private cloud provides a high level of security and privacy to the users.
o Private cloud offers better performance with improved speed and space capacity.
Disadvantages of Private Cloud
o Skilled people are required to manage and operate cloud services.
o Private cloud is accessible within the organization, so the area of operations is limited.
Hybrid Cloud
Hybrid Cloud is a combination of the public cloud and the private cloud. we can say:
Hybrid Cloud = Public Cloud + Private Cloud
Hybrid cloud is partially secure because the services which are running on the public cloud can be
accessed by anyone, while the services which are running on a private cloud can be accessed only
by the organization's users.
Example: Google Application Suite (Gmail, Google Apps, and Google Drive), Office 365 (MS
Office on the Web and One Drive), Amazon Web Services.
Advantages of Hybrid Cloud
There are the following advantages of Hybrid Cloud -
o Hybrid cloud is suitable for organizations that require more security than the public cloud.
o Hybrid cloud helps you to deliver new products and services more quickly.
Disadvantages of Hybrid Cloud
o In Hybrid Cloud, security feature is not as good as the private cloud.
o Managing a hybrid cloud is complex because it is difficult to manage more than one type
of deployment model.
o In the hybrid cloud, the reliability of the services depends on cloud service providers.
Community Cloud
Community cloud allows systems and services to be accessible by a group of several organizations
to share the information between the organization and a specific community. It is owned, managed,
and operated by one or more organizations in the community, a third party, or a combination of
them.
Example: Health Care community cloud
Advantages of Community Cloud
There are the following advantages of Community Cloud -
o Community cloud is cost-effective because the whole cloud is being shared by several
organizations or communities.
o It provides better security than the public cloud.
o It provdes collaborative and distributive environment.
Disadvantages of Community Cloud
o Community cloud is not a good choice for every organization.
o Security features are not as good as the private cloud.
o It is not suitable if there is no collaboration.
o The fixed amount of data storage and bandwidth is shared among all community members.
8. Difference between public cloud, private cloud, hybrid cloud, and community cloud.
[AN]
The below table shows the difference between public cloud, private cloud, hybrid cloud, and
community cloud.

Parameter Public Cloud Private Cloud Hybrid Cloud Community


Cloud

Host Service Enterprise (Third Enterprise (Third Community


provider party) party) (Third party)

Users General public Selected users Selected users Community


members

Access Internet Internet, VPN Internet, VPN Internet, VPN

Owner Service Enterprise Enterprise Community


provider

9. Construct the Cloud Deployment Model. [CR]


What is Cloud Deployment Model?
It works as your virtual computing environment with a choice of deployment model depending on
how much data you want to store and who has access to the Infrastructure.
Different Types of Cloud Computing Deployment Models
Most cloud hubs have tens of thousands of servers and storage devices to enable fast loading. It is
often possible to choose a geographic area to put the data "closer" to users. Thus, deployment
models for cloud computing are categorized based on their location. To know which model would
best fit the requirements of your organization, let us first learn about the various types.
Public Cloud
The name says it all. It is accessible to the public. Public deployment models in the cloud are
perfect for organizations with growing and fluctuating demands. It also makes a great choice for
companies with low-security concerns.
Benefits of Public Cloud
o Minimal Investment - As a pay-per-use service, there is no large upfront cost and is ideal
for businesses who need quick access to resources
o No Hardware Setup - The cloud service providers fully fund the entire Infrastructure
o No Infrastructure Management - This does not require an in-house team to utilize the public
cloud.
Limitations of Public Cloud
o Data Security and Privacy Concerns - Since it is accessible to all, it does not fully protect
against cyber-attacks and could lead to vulnerabilities.
o Reliability Issues - Since the same server network is open to a wide range of users, it can
lead to malfunction and outages
o Service/License Limitation - While there are many resources you can exchange with
tenants, there is a usage cap.
Private Cloud
It means that it will be integrated with your data center and managed by your IT team.
Alternatively, you can also choose to host it externally. The private cloud offers bigger
opportunities that help meet specific organizations' requirements when it comes to customization.
It's also a wise choice for mission-critical processes that may have frequently changing
requirements.
Benefits of Private Cloud
o Data Privacy - It is ideal for storing corporate data where only authorized personnel gets
access
o Security - Segmentation of resources within the same Infrastructure can help with better
access and higher levels of security.
o Supports Legacy Systems - This model supports legacy systems that cannot access the
public cloud.
Limitations of Private Cloud
o Higher Cost - With the benefits you get, the investment will also be larger than the public
cloud. Here, you will pay for software, hardware, and resources for staff and training.
o Fixed Scalability - The hardware you choose will accordingly help you scale in a certain
direction
o High Maintenance - Since it is managed in-house, the maintenance costs also increase.
Community Cloud
The community cloud operates in a way that is similar to the public cloud. There's just one
difference - it allows access to only a specific set of users who share common objectives and use
cases. This type of deployment model of cloud computing is managed and hosted internally or by
a third-party vendor. However, you can also choose a combination of all three.
Benefits of Community Cloud
o Smaller Investment - A community cloud is much cheaper than the private & public cloud
and provides great performance
o Setup Benefits - The protocols and configuration of a community cloud must align with
industry standards, allowing customers to work much more efficiently.
Limitations of Community Cloud
o Shared Resources - Due to restricted bandwidth and storage capacity, community resources
often pose challenges.
o Not as Popular - Since this is a recently introduced model, it is not that popular or available
across industries
Hybrid Cloud
As the name suggests, a hybrid cloud is a combination of two or more cloud architectures. While
each model in the hybrid cloud functions differently, it is all part of the same architecture. Further,
as part of this deployment of the cloud computing model, the internal or external providers can
offer resources.
Benefits of Hybrid Cloud
o Cost-Effectiveness - The overall cost of a hybrid solution decreases since it majorly uses
the public cloud to store data.
o Security - Since data is properly segmented, the chances of data theft from attackers are
significantly reduced.
o Flexibility - With higher levels of flexibility, businesses can create custom solutions that
fit their exact requirements
Limitations of Hybrid Cloud
o Complexity - It is complex setting up a hybrid cloud since it needs to integrate two or more
cloud architectures
o Specific Use Case - This model makes more sense for organizations that have multiple use
cases or need to separate critical and sensitive data
10. Classify the Cloud Service Models. [AN]
There are the following three types of cloud service models -
1. Infrastructure as a Service (IaaS)
2. Platform as a Service (PaaS)
3. Software as a Service (SaaS)
1. Infrastructure as a Service (IaaS)
IaaS is also known as Hardware as a Service (HaaS). It is a computing infrastructure managed
over the internet. The main advantage of using IaaS is that it helps users to avoid the cost and
complexity of purchasing and managing the physical servers.
Characteristics of IaaS
There are the following characteristics of IaaS -
o Resources are available as a service
o Services are highly scalable
o Dynamic and flexible
o GUI and API-based access
o Automated administrative tasks
Example: DigitalOcean, Linode, Amazon Web Services (AWS), Microsoft Azure, Google
Compute Engine (GCE), Rackspace, and Cisco Metacloud.
Platform as a Service (PaaS)
PaaS cloud computing platform is created for the programmer to develop, test, run, and manage
the applications.
Characteristics of PaaS
There are the following characteristics of PaaS -
o Accessible to various users via the same development application.
o Integrates with web services and databases.
o Builds on virtualization technology, so resources can easily be scaled up or down as per
the organization's need.
o Support multiple languages and frameworks.
o Provides an ability to "Auto-scale".
Example: AWS Elastic Beanstalk, Windows Azure, Heroku, [Link], Google App Engine,
Apache Stratos, Magento Commerce Cloud, and OpenShift.
Software as a Service (SaaS)
SaaS is also known as "on-demand software". It is a software in which the applications are hosted
by a cloud service provider. Users can access these applications with the help of internet
connection and web browser.
Characteristics of SaaS
There are the following characteristics of SaaS -
o Managed from a central location
o Hosted on a remote server
o Accessible over the internet
o Users are not responsible for hardware and software updates. Updates are applied
automatically.
o The services are purchased on the pay-as-per-use basis
Example: BigCommerce, Google Apps, Salesforce, Dropbox, ZenDesk, Cisco WebEx, ZenDesk,
Slack, and GoToMeeting.
Difference between IaaS, PaaS, and SaaS
The below table shows the difference between IaaS, PaaS, and SaaS -

IaaS Paas SaaS


It provides a virtual data center to It provides virtual platforms and It provides web
store information and create tools to create, test, and deploy software and apps to
platforms for app development, apps. complete business
testing, and deployment. tasks.

It provides access to resources It provides runtime environments It provides software as


such as virtual machines, virtual and deployment tools for a service to the end-
storage, etc. applications. users.

It is used by network architects. It is used by developers. It is used by end users.

IaaS provides only Infrastructure. PaaS provides SaaS provides


Infrastructure+Platform. Infrastructure+Platform
+Software.

11. Demonstrate the Factors Driving Cloud Adoption in the Financial Sector. [UN]
A number of factors are driving its adoption at an accelerating rate.
1. Need for Agility and Flexibility
These systems are monolithic, heavily siloed, and difficult both to change and scale. In contrast,
cloud native platforms are scalable on-demand. When software features are released to distributed,
microservice-based applications by DevOps teams, organizations are able to innovate in response
to market changes and pivot as needed.
2. Transition towards OpEx Models
Traditionally, financial services organizations used on-prem data centers that required massive
CapEx investments to purchase space, equipment, software, and trained workforces. The cloud
offers a great opportunity to switch IT spending from CapEx to OpEx models.
3. Need for Real-Time Information
The need for real-time information is present in all industries but is especially strong in the
financial sector. These organizations often leverage high-frequency trading applications to analyze
tick-by-tick data and search for signals in markets, such as price changes and movements in rates.
4. Massive and Steadily Increasing Data Storage Requirements
Financial services organizations have always had to store massive amounts of data, and storing
this data properly has always been an important part of maintaining operations while minimizing
cost.
5. Changing Customer Behaviour
The behaviour of customers is changing quickly. A survey by Gallup found that, in 2018, only
66% of millennials had visited a physical bank branch in the past year. Younger generations are
increasingly using online banking services, especially on their phones. Banks need to offer
customers personalized digital experiences, services that are enabled by cloud native technologies.
6. New Kinds of Services
In addition to supporting more mobile experiences, cloud native technologies also offer banks the
ability to provide unique types of services in the marketplace. Customers are also looking for new
kinds of applications to support new investment types.
7. Security and Compliance
The financial industry operates around capital markets, banking, and insurance, and each
industry’s profile has very specific regulatory requirements. Offering improving security
mechanisms, public cloud providers can actually be much better than traditional systems at
systemic security services, such as identifying potential breaches. Organizations that
follow container security best practices can secure their applications against a wider range of
attacks.
8. Green IT
The urgency to reduce greenhouse gas emissions means financial services organizations are
looking for ways to lower the carbon footprint of their IT. The cloud will play a pivotal role in
enabling both.
12. Explain the Virtualization in Cloud Computing. [UN]
Virtualization is the "creation of a virtual (rather than actual) version of something, such as a
server, a desktop, a storage device, an operating system or network resources".
What is the concept behind the Virtualization?
Creation of a virtual machine over existing operating system and hardware is known as Hardware
Virtualization. A Virtual machine provides an environment that is logically separated from the
underlying hardware.
Types of Virtualization:
1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.
1) Hardware Virtualization:
Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.
2) Operating System Virtualization:
Operating System Virtualization is mainly used for testing the applications on different platforms
of OS.
3) Server Virtualization:
Server virtualization is done because a single physical server can be divided into multiple servers
on the demand basis and for balancing the load.
4) Storage Virtualization:
Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.
How does virtualization work in cloud computing?
Virtualization plays a very important role in the cloud computing technology, normally in the
cloud computing, users share the data present in the clouds like application etc, but actually with
the help of virtualization users shares the Infrastructure.
13. Categorize of the Cloud Service Provider Companies. [AN]
Cloud Service providers (CSP) offers various services such as Software as a Service, Platform
as a service, Infrastructure as a service, network services, business applications, mobile
applications, and infrastructure in the cloud. The cloud service providers host these services in
a data center, and users can access these services through cloud provider companies using an
Internet connection.
There are the following Cloud Service Providers Companies -
1. Amazon Web Services (AWS)
AWS (Amazon Web Services) is a secure cloud service platform provided by Amazon. It offers
various services such as database storage, computing power, content delivery, Relational Database,
Simple Email, Simple Queue, and other functionality to increase the organization's growth.
Features of AWS
o AWS is cost-effective as it works on a pay-as-you-go pricing model.
o It provides various flexible storage options.
o It can efficiently manage and secure Windows workloads.
2. Microsoft Azure
Microsoft Azure is also known as Windows Azure. It supports various operating systems,
databases, programming languages, frameworks that allow IT professionals to easily build, deploy,
and manage applications through a worldwide network. It also allows users to create different
groups for related utilities.
Features of Microsoft Azure
o Microsoft Azure provides scalable, flexible, and cost-effective
o It allows developers to quickly manage applications and websites.
o It managed each resource individually.
3. Google Cloud Platform
Google cloud platform is a product of Google. It consists of a set of physical devices, such as
computers, hard disk drives, and virtual machines. It also helps organizations to simplify the
migration process.
Features of Google Cloud
o Google cloud includes various big data services such as Google BigQuery, Google
CloudDataproc, Google CloudDatalab, and Google Cloud Pub/Sub.
o It offers various scalable and high-performance
o GCP provides various serverless services such as Messaging, Data Warehouse, Database,
Compute, Storage, Data Processing, and Machine learning (ML)
4. IBM Cloud Services
IBM Cloud is an open-source, faster, and more reliable platform. It is built with a suite of advanced
data and AI tools. It offers various services such as Infrastructure as a service, Software as a
service, and platform as a service. You can access its services like compute power, cloud data &
Analytics, cloud use cases, and storage networking using internet connection.
Feature of IBM Cloud
o IBM cloud improves operational efficiency.
o Its speed and agility improve the customer's satisfaction.
o It offers Infrastructure as a Service (IaaS), Platform as a Service (PaaS), as well as Software
as a Service (SaaS)
o It offers various cloud communications services to our IT environment.
5. VMware Cloud
VMware cloud is a Software-Defined Data Center (SSDC) unified platform for the Hybrid Cloud.
It allows cloud providers to build agile, flexible, efficient, and robust cloud services.
Features of VMware
o VMware cloud works on the pay-as-per-use model and monthly subscription
o It provides better customer satisfaction by protecting the user's data.
o It eliminates the time and cost complexity.
6. Oracle cloud
Oracle cloud platform is offered by the Oracle Corporation. It combines Platform as a Service,
Infrastructure as a Service, Software as a Service, and Data as a Service with cloud infrastructure.
Features of Oracle cloud
o Its infrastructure uses various languages including, Java, Ruby, PHP, [Link].
o It integrates with Docker, VMware, and other DevOps tools.
o It maximizes the value of IT investments.
7. Red Hat
Red Hat virtualization is an open standard and desktop virtualization platform produced by Red
Hat. It is very popular for the Linux environment to provide various infrastructure solutions for
virtualized servers as well as technical workstations.
Features of Rad Hat
o Red Hat cloud includes OpenShift, which is an app development platform that allows
developers to access, modernize, and deploy apps
o It supports up to 16 virtual machines, each having up to 256GB of RAM.
o It offers better reliability, availability, and serviceability.
8. Digital Ocean
Digital Ocean is the unique cloud provider that offers computing services to the organization. It
was founded in 2011 by Moisey Uretsky and Ben. It is one of the best cloud provider that allows
us to manage and deploy web applications.
Features of DigitalOcean
o It uses the KVM hypervisor to allocate physical resources to the virtual servers.
o It provides high-quality performance.
o It offers a digital community platform that helps to answer queries and holding feedbacks.
9. Rackspace
Rackspace offers cloud computing services such as hosting web applications, Cloud Backup,
Cloud Block Storage, Databases, and Cloud Servers. The main aim to designing Rackspace is to
easily manage private and public cloud deployments. Its data centers operating in the USA, UK,
Hong Kong, and Australia.
Features of Rackspace
o Rackspace provides various tools that help organizations to collaborate and communicate
more efficiently.
o We can access files that are stored on the Rackspace cloud drive, anywhere, anytime using
any device.
o It offers 6 globally data centers.
o It can manage both virtual servers and dedicated physical servers on the same network.
10. Alibaba Cloud
Alibaba Cloud is used to develop data management and highly scalable cloud computing services.
It offers various services, including Elastic Computing, Storage, Networking, Security, Database
Services, Application Services, Media Services, Cloud Communication, and Internet of Things.
Features of Alibaba Cloud
o It globally deals with its 14 data centers.
o It offers scalable and reliable data storage.
14. How to create Amazon EC2 window instances? [RE]
To launch an instance
1. Sign in to the AWS Console and open the Amazon EC2 console.
2. From the navigation bar, select the region for the instance. Here we are going to choose
Singapore data region. Otherwise, this choice is important because some Amazon EC2
resources can be shared between regions, while others can't be.
3. On console dashboard, click Launch Instance.
4. To Choose an Amazon Machine Image (AMI) page displays a list of basic configurations
called Amazon Machine Images (AMIs) that serve as templates for your instance. Select
the 64-bit version of Microsoft Windows Server 2008 R2. Notice that this configuration is
marked as Free tier eligible.
5. To Choose an Instance Type page, you can select the hardware configuration for your
instance. The [Link] instance will be selected by default. Click Review and Launch to
let the wizard complete or not with other configuration settings for you, so you can get
started quickly.
6. To Review Instance Launch page, you need to go to the settings for your instance.
7. Click on Launch.
8. In the Select an existing key pair or create a new key pair dialog box, you can
select Choose an existing key pair, to select a key pair you already created.
9. A confirmation page will open to know that your instance is launching. Click View
Instances to close the confirmation page and return to the console.
10. On the Instances page, you can view the status of the launch. It takes a short time for an
instance to launch. When you launch an instance, its initial state is pending. After the
instance starts, its state changes to running and it receives a public DNS name.
11. Record the public DNS name for your instance because you'll need it for the next step.
12. (Optional) After your instance is launched, you can view its security group rules. From the
Instances page, select the instance. In the Description tab, find Security groups and
click view rules.
15. Contrast between AWS, Azure, and Google Cloud Platform. [UN]

Parameter AWS Azure Google Cloud


Platform

App Testing It uses device farm It uses DevTest labs It uses Cloud
Test labs.

API Management Amazon API gateway Azure API gateway Cloud


endpoints.

Kubernetes EKS Kubernetes service Kubernetes


Management engine

Git Repositories AWS source Azure source Cloud source


repositories repositories repositories.
Data warehouse Redshift SQL warehouse Big Query

Object Storage S3 Block Blobs and files Google cloud


storage.

Relational DB RDS Relational DBs Google Cloud


SQL

Block Storage EBS Page Blobs Persistent disks

Marketplace AWS Azure G suite

File Storage EFS Azure Files ZFS and Avere

Media Services Amazon Elastic Azure media services Cloud video


transcoder intelligence API

Virtual network VPC VNet Subnet

Pricing Per hour Per minute Per minute

Maximum processors in 128 128 96


VM

Maximum memory in 3904 3800 1433


VM (GiB)

Catching ElasticCache RedisCache CloudCDN

Load Balancing Elastic Load Load Balancer Cloud Load


Configuration Balancing Application Gateway Balancing

Global Content Delivery CloudFront Content Delivery Cloud


Networks Network Interconnect
16. Explain the Different Types of Load Balancing Algorithms in Cloud Computing. [UN]
1. Static Algorithm
Static algorithms are built for systems with very little variation in load. The entire traffic is divided
equally between the servers in the static algorithm. This algorithm requires in-depth knowledge of
server resources for better performance of the processor, which is determined at the beginning of
the implementation.
2. Dynamic Algorithm
The dynamic algorithm first finds the lightest server in the entire network and gives it priority for
load balancing. This requires real-time communication with the network which can help increase
the system's traffic. Here, the current state of the system is used to control the load.
The characteristic of dynamic algorithms is to make load transfer decisions in the current system
state. In this system, processes can move from a highly used machine to an underutilized machine
in real time.
3. Round Robin Algorithm
Processors assign each process circularly without defining any priority. It gives fast response in
case of uniform workload distribution among the processes. All processes have different loading
times. Therefore, some nodes may be heavily loaded, while others may remain under-utilised.
4. Weighted Round Robin Load Balancing Algorithm
Weighted Round Robin Load Balancing Algorithms have been developed to enhance the most
challenging issues of Round Robin Algorithms. In this algorithm, there are a specified set of
weights and functions, which are distributed according to the weight values.
5. Opportunistic Load Balancing Algorithm
The opportunistic load balancing algorithm allows each node to be busy. It never considers the
current workload of each system. Regardless of the current workload on each node, OLB
distributes all unfinished tasks to these nodes.
6. Minimum to Minimum Load Balancing Algorithm
Under minimum to minimum load balancing algorithms, first of all, those tasks take minimum
time to complete. Among them, the minimum value is selected among all the functions. According
to that minimum time, the work on the machine is scheduled.
17. Elaborate the Scaling in Cloud Computing. [CR]
Cloud scalability in cloud computing refers to increasing or decreasing IT resources as needed to
meet changing demand. Scalability is one of the hallmarks of the cloud and the primary driver of
its explosive popularity with businesses.
Data storage capacity, processing power, and networking can all be increased by using existing
cloud computing infrastructure. Scaling can be done quickly and easily, usually without any
disruption or downtime.
Types of scaling
o Vertical Scalability (Scaled-up)
o horizontal scalability
o diagonal scalability
Vertical Scaling
To understand vertical scaling, imagine a 20-story hotel. There are innumerable rooms inside this
hotel from where the guests keep coming and going. Often there are spaces available, as not all
rooms are filled at once.
Horizontal Scaling
Horizontal scaling refers to adding more servers to your network, rather than simply adding
resources like with vertical scaling. This method tends to take more time and is more complex, but
it allows you to connect servers together, handle traffic efficiently and execute concurrent
workloads.
Diagonal Scaling
It is a mixture of both Horizontal and Vertical scalability where the resources are added both
vertically and horizontally. Well, you get diagonal scaling, which allows you to experience the
most efficient infrastructure scaling.
Why is cloud scalable?
Scalable cloud architecture is made possible through virtualization. Unlike physical machines
whose resources and performance are relatively set, virtual machines virtual machines (VMs) are
highly flexible and can be easily scaled up or down. They can be moved to a different server or
hosted on multiple servers at once; workloads and applications can be shifted to larger VMs as
needed.
Third-party cloud providers also have all the vast hardware and software resources already in place
to allow for rapid scaling that an individual business could not achieve cost-effectively on its own.
Benefits of cloud scalability
o Convenience
o Flexibility and speed
o Cost Savings
o Disaster recovery
When to Use Cloud Scalability?
Successful businesses use scalable business models to grow rapidly and meet changing demands.
It's no different with their IT. Cloud scalability benefits help businesses stay agile and competitive.
18. Illustrate the Cloud Monitoring. [UN]
Cloud monitoring comprises a series of strategies and practices for analyzing, tracking, and
managing cloud-based services and applications. Cloud monitoring provides an efficient way to
achieve this visibility while providing an enterprise with actionable insights to improve availability
and user experiences.
Website monitoring. Every business needs to ensure its websites and web services remain highly
accessible, optimized, and secure. Cloud monitoring tools help to identify minor and large-scale
hardware failures and security gaps so that developers and administrators can take corrective action
before problems affect user experiences.
Database monitoring. Ensuring that the performance and health of your database management
system remain intact is a vital part of business continuity. Cloud monitoring solutions actively
monitor cloud database resources, tracking processes, queries, and availability of services to ensure
the accuracy and reliability of database management systems, regardless of the number of
instances and how they are deployed.
19. Summarize the Amazon EC2. [UN]
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable
compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
Instance types
Amazon EC2 passes on to you the financial benefits of Amazon scale. You pay a very low rate for
the compute capacity you actually consume. Refer to Amazon EC2 Instance Purchasing
Options for a more detailed description.
 On-Demand Instances
o Users that prefer the low cost and flexibility of Amazon EC2 without any up-front payment or
long-term commitment
o Applications with short-term, spiky, or unpredictable workloads that cannot be interrupted
o Applications being developed or tested on Amazon EC2 for the first time
o Spot Instances
o Applications that have flexible start and end times
o Applications that are only feasible at very low compute prices
o Users with urgent computing needs for large amounts of additional capacity
 Reserved Instances — Reserved Instances provide you with a significant discount (up to 72%)
compared to On-Demand Instance pricing. You have the flexibility to change families, operating
system types, and tenancies while benefitting from Reserved Instance pricing when you use
Convertible Reserved Instances.
 C7g Instances — C7g Instances, powered by the latest generation AWS Graviton3 processors,
provide the best price performance in Amazon EC2 for compute-intensive workloads.
 Inf2 Instances — Inf2 Instances are purpose--built for deep learning inference. They deliver high
performance at the lowest cost in Amazon EC2 for generative AI models, including large language
models (LLMs) and vision transformers.
 M7g Instances — M7g instances, powered by the latest generation AWS Graviton3 processors,
provide the best price performance in Amazon EC2 for general purpose workloads. M7g instances
are ideal for applications built on open-source software such as application servers, microservices,
gaming servers, mid-size data stores, and caching fleets.
 R7g Instances — R7g Instances, powered by the latest generation AWS Graviton3 processors,
provide the best price performance in Amazon EC2 for memory-intensive workloads.
 Tm1 Instances — Trn1 Instances, powered by AWS Trainium accelerators, are purpose-built for
high-performance deep learning training of generative AI models, including LLMs and latent
diffusion models.

You might also like