Cs3551 - Distributed Computing - Part B & C
Cs3551 - Distributed Computing - Part B & C
PART – B & C
1. Define Distributed Systems. What are the significant issues and challenges of the distributed
systems? (Apr/May 2018) [UN]
Designing a distributed system does not come as easy and straight forward. A number of challenges
need to be overcome in order to get the ideal system.
1. Heterogeneity
2. Transparency
3. Openness
4. Concurrency
5. Security
6. Scalability
7. Failure Handling
2. Explain how a parallel system differs from a distributed system. [UN]
Parallel and distributed computing are important technologies that have key differences in their primary
function. Read on to learn more about them and how they differ.
Parallel computing, also known as parallel processing speeds up a computational task by dividing it
into smaller jobs across multiple processors inside one computer. Distributed computing, on the other
hand, uses a distributed system, such as the internet, to increase the available computing power and
enable larger, more complex tasks to be executed across multiple machines.
In general, a process can send a communication in one of two ways: blocking or non blocking. After
sending a blocking communication, the process goes in to the waiting state until it receives are sponge.
Non blocking communication allows the process to continue execution after sending the communication.
Both types of communication are useful.
6. Illustrate the difference between Message passing and shared memory process Communication
model. (Nov /Dec 2022) [UN]
What is Shared Memory?
The fundamental model of inter-process communication is the shared memory system. In a shared
memory system, the collaborating communicates with each other by establishing the shared memory
region in the address space region.
If the process wishes to initiate communication and has data to share, create a shared memory region in
its address space. After that, if another process wishes to communicate and tries to read the shared data,
it must attach to the starting process's shared address space.
What is Message Passing?
In this message passing process model, the processes communicate with others by exchanging messages.
A communication link between the processes is required for this purpose, and it must provide at least
two operations: transmit (message) and receive (message). Message sizes might be flexible or fixed.
The main differences between the Shared Memory and the Message Passing are as follows:
It offers a maximum speed of computation because It takes a huge time because it is performed via
communication is completed via the shared the kernel (system calls).
memory, so the system calls are only required to
establish the shared memory.
The code for reading and writing the data from the No such code is required in this case because
shared memory should be written explicitly by the the message passing feature offers a method for
developer. communication and synchronization of
activities executed by the communicating
processes.
It is used to communicate between the single It is most commonly utilized in a distributed
processor and multiprocessor systems in which the setting when communicating processes are
processes to be communicated are on the same spread over multiple devices linked by a
machine and share the same address space. network.
Make sure that processes in shared memory aren't It is useful for sharing little quantities of
writing to the same address simultaneously. data without causing disputes.
7. Explain the need of Distributed systems its characteristics with example. [UN]
Fault tolerance
Consistency
Security
Reliability
Concurrent transactions
8. Elaborate the examples of distributed systems. [CR]
To place distributed systems in a realistic context through examples: the Internet, an intranet and mobile
computing.
1. The Internet:
A vast interconnected collection of computer networks of many different [Link] message by
employing a common means of communication (Internet Protocol). The web is not equal to the Internet.
2. Intranet: An intranet is a private network that is contained within an enterprise.
It may consist of many interlinked local area networks and also use leased lines in the Wide Area
Network. It separately administrated and enforces local security policies. It is connected to the Internet
via a router. It uses firewall to protect an Intranet by preventing unauthorized messages leaving or
entering. Some are isolated from the Internet. Users in an intranet share data by means of file services.
Two send events s and s’ are related by causality ordering (not physical time ordering), then a causally
ordered execution requires that their corresponding receive events r and r’ occur in the same order at all
common destinations.
If s and s’ are not related by causality, then CO is vacuously (blankly)satisfied.
Causal order is used in applications that update shared data, distributed shared memory, or fair
resource allocation.
The delayed message m is then given to the application for processing. The event of an application
processing an arrived message is referred to as a delivery event.
12. Discuss the Model of Distributed Computations. [CR]
A Distributed Program is composed of a set of n asynchronous processes, p1, p2, ..., pi , ..., pn. The
processes do not share a global memory and communicate solely by passing messages. The processes
do not share a global clock that is instantaneously accessible to these processes. Process execution and
message transfer are asynchronous. Without loss of generality, we assume that each process is running
on a different processor. Let Cij denote the channel from process pi to process pj and let mij denote a
message sent by pi to pj . The message transmission delay is finite and unpredictable.
A Model of Distributed Executions The execution of a process consists of a sequential execution of its
actions. The actions are atomic and the actions of a process are modeled as three types of events,
namely, internal events, message send events, and message receive events.
A send event changes the state of the process that sends the message and the state of the channel on
which the message is sent. A receive event changes the state of the process that receives the message
and the state of the channel on which the message is received.
The execution of process pi produces a sequence of events e 1 i , e 2 i , ..., e x i , e x+1 i , ... and is
denoted by Hi where Hi = (hi , →i) hi is the set of events produced by pi and binary relation →i defines
a linear order on these events. Relation →i expresses causal dependencies among the events of pi .
A Model of Distributed Executions The evolution of a distributed execution is depicted by a space-time
diagram. A horizontal line represents the progress of the process; a dot indicates an event; a slant arrow
indicates a message transfer. Since we assume that an event execution is atomic (hence, indivisible and
instantaneous), it is justified to denote it as a dot on a process line. In the Figure 2.1, for process p1, the
second event is a message send event, the third event is an internal event, and the
fourth event is a message receive event.
13. Build the Models of Communication Networks. [CR]
There are several models of the service provided by communication networks, namely, FIFO, Non-
FIFO, and causal ordering. In the FIFO model, each channel acts as a first-in first-out message queue
and thus, message ordering is preserved by a channel. In the non-FIFO model, a channel acts like a set
in which the sender process adds messages and the receiver process removes messages from it in a
random order
The “causal ordering” model is based on Lamport’s “happens before” relation. A system that supports
the causal ordering model satisfies the following property: CO: For any two messages mij and mkj , if
send(mij) −→ send(mkj), then rec(mij) −→ rec(mkj). This property ensures that causally related
messages destined to the same destination are delivered in an order that is consistent with their
causality relation.
Causally ordered delivery of messages implies FIFO message delivery. (Note that CO ⊂ FIFO ⊂ Non-
FIFO.) Causal ordering model considerably simplifies the design of distributed algorithms because it
provides a built-in synchronization.
The three main types of communication models in distributed systems are:
FIFO (first-in, first-out): each channel acts as a FIFO message queue.
Non-FIFO (N-FIFO): a channel acts like a set in which a sender process adds messages and receiver
removes messages in random order.
Causal Ordering (CO): It follows Lamport’s law. N- FIFO FIFO o The relation between the
three models is given by CO A system that supports the causal ordering model satisfies the following
property:
GLOBAL STATE
Distributed Snapshot represents a state in which the distributed system might have been in. A snapshot
of the system is a single configuration of the system. The global state of a distributed system is a
collection of the local states of its components, namely, the processes and the communication channels.
The state of a process at any time is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application. The state of a channel is
given by the set of messages in transit in the channel.
The state of a channel is difficult to state formally because a channel is a distributed entity and its state
depends upon the states of the processes it connects.
A distributed snapshot should reflect a consistent state. A global state is consistent if it could have been
observed by an external observer. For a successful Global State, all states must be consistent: If we
have recorded that a process P has received a message from a process Q, then we should have also
recorded that process Q had actually send that message.
Otherwise, a snapshot will contain the recording of messages that have been received but never sent.
The reverse condition (Q has sent a message that P has not received) is allowed. The notion of a global
state can be graphically represented by what is called a cut. A cut represents the last event that has been
recorded for each process. The history of each process if given by: Each event either is an internal
action of the process. We denote by si k the state of process pi immediately before the kth event occurs.
The state si in the global state S corresponding to the cut C is that of pi immediately after the last event
processed by pi in the cut – eici. The set of events eici is called the frontier of the cut.
Fig : Types of cuts Consistent states: The states should not violate causality. Such states are called
consistent global states and are meaningful global states. Inconsistent global states: They are not
meaningful in the sense that a distributed system can never be in an inconsistent state.
In Figure 2.2: A global state GS1 = {LS1 1 , LS3 2 , LS3 3 , LS2 4 } is inconsistent because the state of
p2 has recorded the receipt of message m12, however, the state of p1 has not recorded its send. A
global state GS2 consisting of local states {LS2 1 , LS4 2 , LS4 3 , LS2 4 } is consistent; all the
channels are empty except C21 that contains message m21.
15. Why global states are essential in distributed computing systems? Elaborate with an
example.
[CR]
Thus, the ability to construct a global state and evaluate a predicate over such a state constitutes the
core of solutions to many problems in distributed computing. The global state of a distributed system
is the union of the states of the individual processes.
The global state of a distributed system is a collection of the local states of the processes and the
channels.
● A global state computed along a consistent cut is correct.
● The global state of a consistent cut comprises the local. state of each process at the time the cut
event happens.
Any process in the distributed system can initiate this global state recording algorithm using a special
message called MARKER. This marker traverse the distributed system across all communication
channel and cause each process to record its own state. In the end, the state of entire system (Global
state) is recorded.
Distributed applications/services execute concurrently on multiple machines.
• A Snapshot of the distributed application, i.e. a global picture is useful ○ Checkpointing: can restart
distributed application on failure ○ Garbage collection of objects: objects at servers that don’t have
any other objects (at any servers) with pointers to them ○ Deadlock detection: Useful in database
transaction systems ○ Termination of computation: Useful in batch computing systems like
Folding@Home, SETI@HomeGlobal Snapshot = Global State Individual state of each process in the
distributed system + Individual state of each communication channel in the distributed system.
• Capture the instantaneous state of each process
• Capture the instantaneous state of each communication channel, i.e., messages in transit on the
channels.
The notions of global time and global state are closely related.
● But, merely synchronizing clocks and taking local snapshots is not enough
● Need to account for messages in transit
● A process can (without freezing the whole computation) compute the best possible approximation
of a global state [Chandy & Lamport 85]
● A global state that could have occurred
● No process in the system can decide whether the state did really occur
● Guarantee stable properties (i.e. once they become true, they remain true)
UNIT – II LOGICAL TIME AND GLOBAL STATE
PART – B & C
1. Explain Physical clock synchronization algorithm. [UN]
Every computer contains a clock which is an electronic device that counts the oscillations in a
crystal at a particular frequency. Synchronization of these physical clocks to some known high
degree of accuracy is needed. This helps to measure the time relative to each local clock to
determine order between events.
i. Christian’s Algorithm
o Distributed systems offer us the ability to solve problems we couldn‘t possibly solve with a
single machine. However, I have recently become painfully aware that every solution to a
problem brings with it a series of new problems and considerations to be made.
o Within the realm of distributed systems, one of the largest problems we gain over a single
machine system is attempting to keep a correct timeline of events. This is needed for a
variety of reasons — one of the most crucial is understanding the ordering and causality of
events within your system. This is what allows analysis to be performed when evaluating
why your system executed a specific action.
o For instance, if you want to know why a particular value was written to your database, you
may search for the series of events leading up to the point of the INSERT request being
created. You can then place these events in chronological order using their timestamp,
traversing from latest → oldest in order to understand the causality of actions leading up to
the final value being written [Figure 1].
9. Discuss the purpose of message ordering paradigms and provide example for
asynchronous execution communication in detail. Nov / Dec 2021) [CR]
o For any two events a and b, where each can be either a send or a receive event, the
notation.
o a ~ b denotes that a and b occur at the same process, i.e., a ∈Ei and b ∈Ei for some process
i. The send and receive event pair for a message called pair of corresponding events.
o For a given execution E, let the set of all send–receive event pairs be denoted as = {(s,r)
∈Ei × Ej | s corresponds to r}. 2.1 Message ordering paradigms
o Several orderings on messages have been defined: (i) non-FIFO, (ii) FIFO, (iii) causal
order, and (iv) synchronous order. Asynchronous executions Definition 6.1 (A-execution):
An asynchronous execution (or A-execution) is an execution (E,≺) for which the causality
relation is a partial order.
o On a logical link between two nodes (is formed as multiple paths may exist) in the system,
if the messages are delivered in any order then it is known as non-FIFO executions.
Example: IPv4.
10. Explain global states and consistent cuts with example Time and Global States. [UN]
There are two formal models of distributed systems: synchronous and asynchronous.
Synchronous distributed systems have the following characteristics:
Generally, timing is a challenging an important issue in building distributed systems. Consider a
couple of examples:
Suppose we want to build a distributed system to track the battery usage of a bunch of
laptop computers and we'd like to record the percentage of the battery each has remaining at
exactly 2pm.
Suppose we want to build a distributed, real time auction and we want to know which of two
bidders submitted their bid first.
Suppose we want to debug a distributed system and we want to know whether variable x1in
process p1 ever differs by more than 50 from variable x2 in process 2.
Clock Synchronization
Every computer has a physical clock that counts oscillations of a crystal. This hardware clock is
used by the computer's software clock to track the current time. However, the hardware clock is
subject to drift -- the clock's frequency varies and the time becomes inaccurate. As a result, any two
clocks are likely to be slightly different at any given time. The difference between two clocks is
called their skew.
There are several methods for synchronizing physical clocks.
1. External synchronization means that all computers in the system are synchronized with an
external source of time (e.g., a UTC signal).
2. Internal synchronization means that all computers in the system are synchronized with one
another, but the time is not necessarily accurate with respect to UTC.
The second process sets its clock to t + (max+min)/2wheremax and min are the upper and lower
bounds for the message transmission time respectively. This guarantees that the skew is at
most(max-min)/2.
Instead, a process p1 requests the current time from another process p2and measures the RTT
(Tround) of the request/reply. When p1 receives the time t from p2 it sets its time to t+T round/2.
The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an
internal synchronization mechanism t hat works by electing a master to coordinate the
synchronization. The master polls the other computers (called slaves) for their times, computes an
average, and tells each computer by how much it should adjust its clock.
Global States
It is often desirable to determine whether a particular property is true of a distributed system as it
executes.
In general, this problem is referred to as Global Predicate Evaluation. "A global state predicate is a
function that maps from the set of global state of processes in the system ρ to {True,False}."
Cuts
Because physical time cannot be perfectly synchronized in a distributed system it is not possible to
gather the global state of the system at a particular time. Cuts provide the ability to" assemble a
meaningful global state from local states recorded at different times".
Definitions:
Ρ is a system of N processes pi (i=1,2, ...,N)
History (pi)= hi= <e i0,e i1,...>
Hi k=< ei0,e i1,...,e ik>-a finite prefix of the process's history
Si k is the state of the process pi immediately before the k th event occurs
All processes record sending and receiving of messages. If a process pi records the sending
of message m to process pj and pj has not recorded receipt of the message, then m is part of
the state of the channel between pi and pj.
A cut is consistent if, for all events e and e':
o(e∈Cande'→e)⇒e '∈C
Distributed Debugging
To further examine how you might produce consistent cuts, we'll use the distributed
debuggingexample. Recall that we have several processes, each with a variable x i. "The safety
condition required in this example is |xi-xj|<=δ(i,j=1,2,...,N)."
The algorithm we'll discuss is a centralized algorithm that determines post hoc whether thesafety
condition was ever violated. The processes in the system, p1, p2, ..., pN, send their states to a
passive monitoring process, p0.p0 is not part of the system. Based on the states collected, p0 can
evaluate the safety condition.
11. Explain an algorithm using multicast and logical clocks for mutual exclusion. [UN]
Mutual exclusion is a concurrency control property which is introduced to prevent race conditions.
It is the requirement that a process cannot enter its critical section while another con current process
is currently present or executing in its critical section i.e only one process is allowed to execute the
critical section at any given instance of time.
In Distributed systems, we neither have shared memory nor a common physical clock and there for
we cannot solve mutual exclusion problem using shared variables.
A site in Distributed System do not have complete information of state of the system due to lack of
shared memory and a common physical clock.
Requirements of Mutual exclusion Algorithm:
No Deadlock:
Two or more site should not endlessly wait for any message that will never arrive.
No Starvation:
Every site who wants to execute critical section should get an opportunity to execute it in
finite time. Any site should not wait indefinitely to execute critical section while other site
are repeatedly executing critical section
Fairness:
Each site should get a fair chance to execute critical section. Any request to execute critical
section must be executed in the order they are made i.e Critical
sectionexecutionrequestsshouldbeexecutedintheorderoftheirarrivalin the system.
Fault Tolerance:
In case of failure, it should be able to recognize it by itself in order to continue functioning
without any disruption.
1. Token Based Algorithm:
o A unique token is shared among all the sites.
o If a site possesses the unique token, it is allowed to enter its critical section
o This approach uses sequence number to order requests for the critical section.
o Each requests for critical section contains a sequence number. This sequence number
is used to distinguish hold and current requests.
o This approach in Mutual exclusion as the token is unique
o Example:
o Suzuki-Kasami‘s Broadcast Algorithm
2. Non-token based approach:
o A site communicates with other sites in order to determine which sites should
execute critical section next. This requires exchange of two or more successive
round of messages among sites.
o This approach use timestamps instead of sequence number to order requests for the
critical section.
o Whenever a site make request for critical section, it gets a time stamp. Time stamp is
also used to resolve any conflict between critical section requests.
o All algorithm which follows non-token based approach maintains a logical clock.
Logical clocks get updated according to Lamport‘s scheme
o Example:
o Lamport's algorithm, Ricart – Agrawala algorithm
3. Quorum based approach:
o Instead of requesting permission to execute the critical section from all other sites,
Each site requests only a subset of sites which is called a quorum.
o Any two subsets of sites or Quorum contain a common site. This common site is
responsible to ensure mutual exclusion.
12. Write short notes on locks with suitable example. [UN]
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
[Link],thedataitemcanonlyreadbythetransaction.
Itcanbesharedbetweenthetransactionsbecausewhenthetransactionholdsalock,thenitcan'tupdat
ethe data on the data item.
2. Exclusive lock:
In the exclusive lock, the data item can be both reads as well as written by the transaction.
This lock is exclusive, and in this lock, multiple transactions do not modify the same
Data simultaneously.
There are four types of lock protocols available:
1. Simplistic clock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based protocols allow all
the transactions to get the lock on the data before insert or delete or update on it. It will unlock the
data item after completing the transaction.
2. Pre - Claiming Lock Protocol
Pre- Claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
Before initiating an execution of the transaction, it requests DBMS for all the lock on all
those data items.
If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.
3. Two-phase locking(2PL)
The two-phase lockingprotocoldividestheexecutionphaseofthetransactionintothree parts.
In the first part, when the execution of the transaction starts, it seeks permission for the lock
it requires.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired by the
transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be released, but
no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can happen:
13. Explain in detail about asynchronous execution with synchronous communication. [UN]
Synchronous communication: The calling party requests a service, and waits for the service
to complete. Only when it receives the result of the service it continues with its work.
Asynchronous communication: The calling party initiates a service call, but does not wait
for the result. The caller immediately continues with its work without caring for the result.
A non-blocking GUI usually has nothing to do with the low-level communication contracts
and can be achieved by different means, e.g. parallel processing of the inter active and
communication tasks. The truth is that synchronous communication on a certain level of
abstraction can be implemented with asynchronous interfaces on another level of abstraction
and vice versa, if needed.
File based communication is often considered to be asynchronous. One party writes a file but does
not care if the other party is active, fetches the file or is able to process it. However it is possible to
implement another layer of functionality so that the second (reading) party gives feedback, e.g. by
writing a short result file, so that the first (writing) party can wait and poll for the result of the file
processing.
Communication over a database often is implemented by one party writing execution orders into a
special table and the other party reads this table periodically and
processesnewentries,markingthemas―done‖or―failed‖[Link]
communication pattern.
Synchronous services are easy to implement, since they keep the complexity of the communication
low by providing immediate feedback. They avoid the need to keep the context of a call on the
client and server side, including e.g. the caller‘s address or a message id, beyond the life time of the
request
Connections with a lack of stability (e.g. a dial-in network connection is not available all the
time).
The caller is not interested in the result of the call or cannot wait for the result for some
reason, [Link] must free its resources.
The simplest asynchronous message exchange pattern is called fire-and-forget and means that a
message is sent but no feedback is required (at least on that level of abstraction!).
14. How Casual order and Total Order is implemented in Synchronization. [UN]
Causal ordering
Causal ordering is a vital tool for thinking about distributed systems.
Messages sent between machines may arrive zero or more times at any point after they are sent
This is the sole reason that building distributed systems is hard.
For example, because of this property it is impossible for two computers communicating
Over a network to agree on the exact time. You can send me a message saying" it is now know how
long it took for that message to arrive. We can send messages back and forth all day but we will
never know for sure that we are synchronized.
If we can't agree on the time then we can't always agree on what order things happen in. Suppose I
say "my user logged on at10:00:00" and you say "my user logged on at10:00:01". Maybe mine was
first or maybe my clock is just fast relative to yours.
This concept is called causal ordering and is written like this:
A->B(event A is causally ordered before event B)
Let's define it a little more formally. We model the world as follows: We have a number of
machines on which we observe a series of events. These events are either specific to one machine
(eg user input) or are communications between machines.
If A and B happen on the same machine and A happens before B then A -> B If I send
you some message M and you receive it then(send M)->(recv M)
If A->B and B->C then A->C
In contrast, causal ordering is only a partial order – some times events happen with no possible
causal relationship i.e. not(A-> B or B->A).
If not(A->B)then A cannot possibly have caused B
The lack of a total global order is not just an accidental property of computer systems, it is a
fundamental property of the laws of physics. I claimed that understanding causal order makes many
other concepts much simpler. Let's skim over some examples.
Vector Clocks
Lamport‘s clocks and Vector clocks are data-structures which efficiently approximate the causal
ordering and so can be used by programs to reason about causality.
If A->B then LC_A<LC_B
If VC_A<VC_B then A->B
Different types of vector clock trade-off compression Vs accuracy by storing smaller or larger
portions of the causal history of an event.
Consistency
A distributed system is consistent exactly when the outside world can never observe two different
serialisations.
CAP Theorem
The CAP (Consistency-Availability-Partition) theorem also boils down to causality. When a
machine in a distributed system is asked to perform an action that depends on its current state it
must decide that state by choosing a serialisation of the events it has seen. It has two options:
Choose a serialization of its current events immediately
Wait until it is sure it has seen all concurrent events before choosing a serialisation
Ordering requires waiting
Even your hardware cannot escape this law. It provides the illusion of synchronous access to
memory at the cost of availability. If you want to write fast parallel programs then you need to
understand the messaging model used by the underlying hardware.
Eventual Consistency
A system is eventually consistent if the final state of each machine is the same regardless of how we
choose to serialize update events. An eventually consistent system allows us to sacrifice consistency
for availability without having the state of different machines diverge irreparably.
15. What is group communication? What are the Key areas of applications of group
communication? Explain the programming model for group communication.(Apr /May 2018)
[UN]
Group Communication.
A group is an operating system abstraction for a collective of related processes. A set of cooperative
processes may, for example, form a group to provide an extendable, efficient, available and reliable
service.
Multicast algorithms can be built on top of lower-level communication primitives such as point-to-
point sends and receives or perhaps by availing of specific network mechanisms designed for this
purpose.
The recipients would have to buffer messages until safe for delivery in case they were called on for
this role.
Ordered Reliable Multicasts
A FIFO ordered protocol guarantees that messages by the same sender are delivered in the order
that they were sent.
Every process p executes the following:-
multicast(R, m):
Tag m with sender (m) and seq
#(m)send(m) to all group including p
The receive(R, m) occurs as follows: up on
arrival(m)do
If p has not previously executed receive(R, m)then if
sender(m)<>p then
send(m)to all group
receive(R, m)
It is easy to use Reliable Multicast to build a FIFO Multicast algorithm. To F-multicast a message
m, a process q simply R-multicasts m. The difference is at the receiver which orders the delivery.
Multicast (F, m):
multicast (R, m)
If m is the it h message from q, then m is tagged sender (m)=q and seq#(m)=i.
For each q, every process p maintains a counter next[q] that indicates the sequence number of the
next F-multicast from q that p is willing to F-deliver.
16. Discuss in detail about Snapshot algorithms for FIFO channels. (Apr / May2021).
[CR]
Each distributed application has number of processes running on different physical servers. These
processes communicate with each other through messaging channels. A snapshot captures the local
states of each process along with the state of each communication channel.
Snapshots are required to:
Check pointing
Collecting garbage
Detecting deadlocks
Debugging
Chandy – Lamport algorithm
The algorithm will record a global snapshot for each process channel.
The Chandy-Lamport algorithm uses a control message, called a marker.
After a site has recorded its snapshot, it sends a marker along all of its outgoing channels
before sending out any more messages.
Since channels are FIFO, a marker separates the messages in the channel into those to be
included in the snapshot from those not to be recorded in the snapshot.
This addresses issue I1. The role of markers in a FIFO system is to act as delimiters for
the messages in the channels so that the channel state recorded by the process at the
receiving end of the channel satisfies the condition C2.
Initiating a snapshot
Process Pi initiates the snapshot
Pi records its own state and prepares a special marker message.
Send the marker message to all other processes.
Start recording all incoming messages from channels Cij for j not equal to i.
UNIT – III DISTRIBUTED MUTEX AND DEADLOCK
PART – B & C
1. Discuss in detail the requirements that mutual exclusion algorithms should satisfy and
discuss what metric we use to measure the performance of mutual exclusion algorithms.
(Apr / May 2021) [CR]
Requirements of Mutual exclusion Algorithm:
No Deadlock: Two or more site should not endlessly wait for any message that will never
arrive.
No Starvation: Every site who wants to execute critical section should get an opportunity
to execute it in finite time.
Mutual exclusion algorithm ensures that if a process is already performing write operation
on a data object no other process/thread is allowed to access the same object until the first
process has finished writing upon the data object and released the object for other
processes to read and write upon.
Performance Metrics
The performance is generally measured by the following four metrics: Message complexity: The
number of messages required per CS execution by a site. Synchronization delay: After a site
leaves the CS, it is the time required and before the next site enters the CS (see Figure 1).
Response time:
The time interval a request waits for its CS execution to be over after its request messages have
been sent out (see Figure 2).
System throughput: The rate at which the system executes requests for the CS. system
throughput=1/(SD+E) where SD is the synchronization delay and E is the average critical section
execution time.
2. Explain Maekawa’s algorithm for mutual exclusion in distributed system and its
drawback. (Apr /May 2022) [UN]
A distributed mutual exclusion algorithm, such as Maekawa's algorithm, makes sure that
concurrent processes in a distributed system exclude one another. The voting-based algorithm was
proposed in 1985 by Mamoru Maekawa and Toshimitsu Masuzawa.
The algorithm divides the total number of processes in a distributed system into quorums or pairs
of mutually intersecting subsets. Each process has a specific quorum allocated to it, and it can only
allow access to the shared resource if it has gotten consent from every other process in its quorum
as well as from every other process it has already received consent from.
A procedure can only grant permission if it is not already in its crucial portion in order to ensure
mutual exclusion. Additionally, a process can only ask another process for permission if that other
process isn't right now in its vital portion.
The technique ensures that a process will finally be able to access the shared resource since it
ensures that at least one process from each quorum will eventually grant permission.
The mutual exclusion method developed by Maekawa is scalable, decentralized, and appropriate
for massively distributed systems. The algorithm's performance, however, may be impacted by the
need for several messages to be sent back and forth between processes.
The voting ring structure in Maekawa's algorithm
One of the most important elements of Maekawa's algorithm is the voting ring structure. This
program creates a logical voting ring out of the processes in a distributed system. According to
their allocated unique identities, each process is placed in a circular arrangement within the voting
ring.
Which processes are permitted to grant authorization to use the shared resource is determined by
the voting ring structure. The quorum for each procedure is determined by where it is in the voting
ring. The process itself and a smaller subset of the other processes in the voting ring make up the
quorum.
Each quorum's size is determined with care to ensure that it intersects with the others. Because it
ensures that each process can request approval from at least one process in each quorum, the
intersection property is significant.
A process must first wait for approval from all processes in its quorum before deciding which
processes can provide permission. It can only issue permits to other processes in its quorum and to
all processes that have asked for it once it has obtained consent from all processes in its quorum.
Drawbacks of Maekawa's Algorithm:
This algorithm is deadlock prone because a site is exclusively locked by other sites and requests
are not prioritized by their timestamp.
3. Outline Lamport’s algorithm with an example. (Nov / Dec 2021) [UN]
Lamport's Algorithm Lamport was the first to give a distributed mutual exclusion algorithm as
an illustration of his clock synchronization scheme. Let Ri be the request set of site Si , i.e. the
set of sites from which Si needs permission when it wants to enter CS.
In Lamport's algorithm, ∀i : 1 ≤ i ≤ N :: Ri= {S1, S2,...,SN}. Every site Si keeps a queue,
request_queuei, which contains mutual exclusion requests ordered by their timestamps. This
algorithm requires messages to be delivered in the FIFO order between every pair of sites. The
Algorithm Requesting the critical section.
1. When a site Si wants to enter the CS, it sends REQUEST(tsi, i) message to all the sites in its
request set Ri and places the request on request_queuei (tsi is the timestamp of the request).
2. When a site Sj receives the REQUEST(t si, i) message from site Si, it returns a time stamped
REPLY message to Si and places site S' is request on request_queue j.
Executing the critical section. 1. Site Si enters the CS when the two following conditions hold: a)
[L1:] Si has received a message with timestamp larger than (tsi, i) from all other sites.
b) [L2:] S'is request is at the top request_queuei. Releasing the critical section.
1. Site Si, upon exiting the CS, removes its request from the top of its request queue and sends a
time stamped RELEASE message to all the sites in its request set.
2. When a site Sj receives a RELEASE message from site Si, it removes S'is request from its
request queue. When a site removes a request from its request queue, its own request may come
at the top of the queue, enabling it to enter CS. The algorithm executes CS requests in the
increasing order of timestamps.
4. Explain the ‘Snapshot’ algorithm of Lamport’s. [UN]
Chandy–Lamport’s global state recording algorithm
Each distributed system has a number of processes running on a number of different physical
servers. These processes communicate with each other via communication channels using text
messaging. These processes neither have a shared memory nor a common physical clock; this
makes the process of determining the instantaneous global state difficult.
A process could record it own local state at a given time but the messages that are in transit(on its
way to be delivered) would not be included in the recorded state and hence the actual state of the
system would be incorrect after the time in transit message is delivered.
The main idea behind proposed algorithm is that if we know that all message that hat have been sent
by one process have been received by another then we can record the global state of the system.
Any process in the distributed system can initiate this global state recording algorithm using a
special message called MARKER. This marker traverses the distributed system across all
communication channels and causes each process to record its own state. In the end, the state of
entire system (Global state) is recorded. This algorithm does not interfere with normal execution of
processes.
Assumptions of the algorithm:
There is finite number of processes in the distributed system and they do not share memory
and clocks.
There is finite number of communication channels and they are unidirectional and FIFO
ordered.
There exists a communication path between any two processes in the system
On a channel, messages are received in the same order as they are sent.
Algorithm:
Marker sending rule for a process P:
o Process p records its own local state
o For each outgoing channel C from process P, P sends marker along C before
sending any other messages along
C.(Note:ProcessQwillreceivethismarkeronhisincomingchannelC1.)
Marker receiving rule for a process Q:
o If process Q has not yet recorded its own local state then
Record the state of incoming channel C1 as an empty sequence or null.
After recording the state of incoming channel C1, process Q Follows the
marker sending rule
o If process Q has already recorded its state
Record the state of incoming channel C1 as the sequence of messages
received along channel C1 after the state of Q was recorded and
beforeQreceivedthemarkeralongC1fromprocess P.
Need of taking snapshot or recording global state of the system:
Check pointing: It helps in creating check point. If somehow application fails, this check
point can be used
Garbage collection: It can be used to remove objects that do not have any references.
It can be used in deadlock and termination detection.
It is also helpful in other debugging.
5. External synchronization ensures internal synchronization. But the vice versa
does not stand true, Justify. Explain Lamport’s algorithm in brief. (Apr / May 2021).
[UN]
External synchronization means that all computers in the system are synchronized with an
external source of time (e.g., a UTC signal).
Internal synchronization means that all computers in the system are synchronized with one
another, but the time is not necessarily accurate with respect to UTC.
In a system that provides guaranteed bounds on message transmission time, synchronization is
straightforward since upper and lower bounds on the transmission time for a message are known.
One process sends a message to another process indicating its current time, t. The second process
sets its clock to t + (max+min)/2 where max and min are the upper and lower bounds for the
message transmission time respectively. This guarantees that the skew is at most (max-min)/2.
Cristian's method for synchronization in systems that do not provide such guarantees is similar, but
does not rely on a predetermined max and min transmission time. Instead, a process p1 requests the
current time from another process p2 and measures the RTT (Tround) of the request/reply.
Whenp1 receives the time t from p2 it sets its time to t + Tround/2.
The Berkeley algorithm, developed for collections of computers running Berkeley UNIX, is an
internal synchronization mechanism that works by electing a master to coordinate the
synchronization. The master polls the other computers (called slaves) for their times, computes an
average, and tells each computer by how much it should adjust its clock.
The Network Time Protocol (NTP) is yet another method for synchronizing clocks that uses a
hierarchical architecture where he top level of the hierarchy (stratum 1) are servers connected to a
UTC time source such as a GPS unit. The TPSN protocol is very similar to NTP.
Synchronization in WSN
Some classifications of synchronization protocols:
Physical time versus logical time
External versus internal synchronization
A priori versus posteriori synchronization - post-facto synchronization in WSN has been
proposed in several papers. Post-facto synchronization happens on demand, after the event
occurs. In this way, synchronization -- which requires resources such as energy -- only
happens when necessary.
Performance metrics:
Precision - synchronization error
Energy cost
Memory requirements
Fault tolerance
Scalability
Decomposition of packet delay:
Send time - Time for the packet to pass from the application layer to the MAC layer --
highly variable based on delays introduced by OS.
Access time - Time the MAC layer must wait for the channel to become free -- highly
variable based on make protocol and one of the most critical factors contributing to delay.
Transmission time - Time to transmit entire packet -- mainly deterministic and can be
estimated based on packet size and radio speed.
Propagation time - Time to traverse the link -- typically negligible in WSN.
Reception time - Time for the receiver radio to receive bits and pass to MAC layer --
mainly deterministic.
Receive time - Time for MAC layer to construct packet and send to application layer --
changes based on delays of OS.
6. Explain Ricart-Agrawala Algorithm with an example. [UN]
The Ricart–Agrawala algorithm is an algorithm for mutual exclusion on a distributed system. This
algorithm is an extension and optimization of Lamport's Distributed Mutual Exclusion Algorithm,
by removing the need for. messages. It was developed by Glenn Ricart and Ashok Agrawala.
Three type of messages ( REQUEST, REPLY and RELEASE) are used. exiting the critical section.
REPLY message to site Si if it has not sent a REPLY message to the site from the time it received
the last RELEASE message. Otherwise, it queues up the request.
Ricart–Agrawala algorithm is an algorithm for mutual exclusion in a distributed system proposed
by Glenn Ricart and Ashok Agrawala. This algorithm is an extension and optimization of
Lamport's Distributed Mutual Exclusion Algorithm.
Ricart - Agarwala s a non token based algorithm that uses broadcast technique for mutual
calculation.
Consider a scenario when process PO & P1 went to enter critical sector. Both P0 & P1 send
broadcast to all 3 processes in the system along with time stamp.
Since P2 does not want to enter critical section hence it sends "OK" to both the processes.
Since P1 finds Ts of P0 to be lesser than Ts of self therefore P1 sends to P0.
8. Explain in detail about Suzuki-Kasami’s algorithm with its critical section with an example.
(Apr / May 2023) [UN]
Suzuki–Kasami algorithm is a token-based algorithm for achieving mutual exclusion in
distributed [Link] is modification of Ricart–Agrawala algorithm, a permission based
(Non-token based) algorithm which uses REQUEST and REPLY messages to ensure mutual
exclusion.
In token-based algorithms, A site is allowed to enter its critical section if it possesses the
unique token. Non-token based algorithms uses timestamp to order requests for the critical
section where as sequence number is used in token based algorithms.
Each requests for critical section contains a sequence number. This sequence number is used to
distinguish old and current requests.
Data structure and Notations:
An array of integers RN[1…N]
A site Si keeps RNi[1…N], where RNi[j] is the largest sequence number received so far
through
REQUEST message from site S i.
An array of integer LN[1…N]
This array is used by the [Link][J] is the sequence number of the request that is recently
executed by site Sj.
A queue Q
This data structure is used by the token to keep record of ID of sites waiting for t he token
Algorithm:
To enter Critical section:
When a site Si wants to enter the critical section and it does not have the token
then it increments its sequence number RNi[i] and sends a request
message REQUEST(i, sn) to all other sites in order to request the token.
Here sn is update value of RNi[i]
When a site Sj receives the request message REQUEST(i, sn) from site Si, it
sets RNj[i] to maximum of RNj[i] and sn i.e RNj[i] = max(RNj[i], sn).
After updating RNj[i], Site Sj sends the token to site S i if it has token
and RNj[i] = LN[i] + 1
To execute the critical section:
Site Si executes the critical section if it has acquired the token.
To release the critical section:
After finishing the execution Site Si exits the critical section and does following:
sets LN[i] = RNi[i] to indicate that its critical section request RNi[i] has been executed
For every site Sj, whose ID is not present in the token queue Q, it appends its ID
to Q if RNi[j] = LN[j] + 1 to indicate that site Sj has an outstanding request.
After above updation, if the Queue Q is non-empty, it pops a site ID from the Q and sends the
token to site indicated by popped ID.
If the queue Q is empty, it keeps the token
Message Complexity:
The algorithm requires 0 message invocation if the site already holds the idle token at the time of
critical section request or maximum of N message per critical section execution. This N messages
involves (N – 1) request messages 1 reply message
9. Justify the drawbacks of Suzuki–Kasami Algorithm. [EV]
Non-symmetric Algorithm: A site retains the token even if it does not have requested for critical
section. According to definition of symmetric algorithm
“No site possesses the right to access its critical section when it has not been requested.”
Performance:
Synchronization delay is 0 and no message is needed if the site holds the idle token at the time of
its request.
In case site does not holds the idle token, the maximum synchronization delay is equal to
maximum message transmission time and a maximum of N message is required per critical section
invocation.
10. Name and explain different types of deadlock models in distributed system with the
commonly used strategies to handle deadlocks with a neat diagram. (Nov / Dec 2022) [UN]
Various deadlock handling strategies in the distributed system are as follows: There are mainly
three approaches to handling deadlocks: deadlock prevention, deadlock avoidance, and deadlock
detection.
Deadlock detection algorithms in Distributed System
Various deadlock detection algorithms in the distributed system are as follows:
1. Path-Pushing Algorithms
2. Edge-chasing Algorithms
3. Diffusing Computations Based Algorithms
4. Global State Detection Based Algorithms
Path-Pushing Algorithms
Path-pushing algorithms detect distributed deadlocks by keeping an explicit global WFG. The
main concept is to create a global WFG for each distributed system site. When a site in this class
of algorithms performs a deadlock computation, it sends its local WFG to all neighboring sites.
The term path-pushing algorithm was led to feature the sending around the paths of global WFG.
Edge-Chasing Algorithms
An edge-chasing method verifies a cycle in a distributed graph structure by sending special
messages called probes along the graph's edges. These probing messages are distinct from request
and response messages. If a site receives the matching probe that it previously transmitted, it can
cancel the formation of the cycle.
Diffusing Computations Based Algorithms
In this algorithm, deadlock detection computation is diffused over the system's WFG. These
techniques use echo algorithms to detect deadlocks, and the underlying distributed computation is
superimposed on this computation. If this computation fails, the initiator reports a deadlock global
state detection.
Global State Detection Based Algorithms
Deadlock detection algorithms based on global state detection take advantage of the following
facts:
1. A consistent snapshot of a distributed system may be taken without freezing the underlying
computation.
2. If a stable property exists in the system before the snapshot collection starts, it will be
preserved.
11. Discuss with suitable example to show that a deadlock cannot occur if anyone of the four
conditions is absent. (Apr /May 2022) [CR]
A deadlock in OS is a situation in which more than one process is blocked because it is holding a
resource and also requires some resource that is acquired by some other process. The four
necessary conditions for a deadlock situation are mutual exclusion, no preemption, hold and wait
and circular set.
Deadlock can be prevented by eliminating any of the four necessary conditions, which are mutual
exclusion, hold and wait, no preemption, and circular wait. Mutual exclusion, hold and wait and no
preemption cannot be violated practically.
Mutual Exclusion: A resource can be held by only one process at a time. In other words, if a
process P1 is using some resource R at a particular instant of time, then some other process P2
can't hold or use the same resource R at that particular instant of time.
Disadvantages in Deadlock
1. Deadlock is an infinite Process. There is only detection, resolution, prevention techniques.
But, there is no Deadlock stopping techniques.
2. No Pre Emption
3. Mutual Exclusion
4. Circular Wait
12. List out the four classes of distributed deadlock detection algorithms and explain any
two of them. (Apr / May 2021) [UN]
Path push-in – Path info- dispatched to waiting node to blocking off node.
Edge-chasing – Probe message area unit dispatched on graph edge.
Diffusion computation
Global nation detection
Path-Pushing Algorithms
Path-pushing algorithms detect distributed deadlocks by keeping an explicit global WFG. The
main concept is to create a global WFG for each distributed system site. When a site in this class
of algorithms performs a deadlock computation, it sends its local WFG to all neighboring sites.
The term path-pushing algorithm was led to feature the sending around the paths of global WFG.
Edge-Chasing Algorithms
An edge-chasing method verifies a cycle in a distributed graph structure by sending special
messages called probes along the graph's edges. These probing messages are distinct from request
and response messages. If a site receives the matching probe that it previously transmitted, it can
cancel the formation of the cycle.
Diffusing Computations Based Algorithms
In this algorithm, deadlock detection computation is diffused over the system's WFG. These
techniques use echo algorithms to detect deadlocks, and the underlying distributed computation is
superimposed on this computation. If this computation fails, the initiator reports a deadlock global
state detection.
Global State Detection Based Algorithms
Deadlock detection algorithms based on global state detection take advantage of the following
facts:
1. A consistent snapshot of a distributed system may be taken without freezing the underlying
computation.
2. If a stable property exists in the system before the snapshot collection starts, it will be
preserved.
13. What is a deadlock? How deadlock can be recovered? Explain distributed deadlocks.
[UN]
Deadlock:
Deadlock is a situation where each of the computer process waits for a resource which is being
assigned to some another process. In this situation, none of the process gets executed since the
resource it needs, is held by some other process which is also waiting for some other resource to be
released.
Deadlock Detection
1. If resources have single instance:
In this case for Deadlock detection we can run an algorithm to check for cycle in the
Resource Allocation Graph. Presence of cycle in the graph is the sufficient condition for
deadlock.
In the above diagram, resource 1 and resource 2 have single instances. There is a cycle
R1→P1→R2→[Link],DeadlockisConfirmed.
Deadlock Recovery
This involves a careful analysis of the system's state to determine which resources can be safely
released without compromising the system's integrity. There are four primary methods of deadlock
recovery: process termination, resource pre-emption, priority inversion, and rollback.
There are four primary methods of deadlock recovery: process termination, resource pre-
emption, priority inversion, and rollback.
Distributed Deadlocks
Distributed deadlocks can occur when distributed transactions or concurrency control are utilized
in distributed systems. It may be identified via a distributed technique like edge chasing or by
creating a global wait-for graph (WFG) from local wait-for graphs at a deadlock detector. Phantom
deadlocks are identified in a distributed system but do not exist due to internal system delays.
14. How we can achieve deadlock detection in distributed systems? Provide various
models to carry out the same. [AN]
Distributed deadlocks can develop in distributed systems when distributed transactions or
concurrency control are used. It is found via a distributed approach such as edge chasing or
constructing a global wait-for graph (WFG) from local wait-for graphs at a deadlock
detector. Phantom deadlocks are discovered in a distributed system but do not occur owing to
internal system delays.
As a distributed system is so large, deadlock cannot be prevented or avoided. As a result, only
deadlock detection is feasible. The following are necessary for distributed system deadlock
detection techniques:
Progress: The technique should be able to detect all of the system's deadlocks.
Safety: The procedure should not identify phantom or fake deadlocks.
Methods to detect deadlock
In distributed systems, there are three methods to detect deadlocks:
Centralized approach: One resource is responsible for detecting deadlock. This technique
is beneficial because it is simple and quick to implement. Still, the disadvantages include a
high burden at one node, single-point failure, and reduced reliability.
Distributed approach: It involves several nodes working together to detect deadlocks.
There is no one point of failure since the burden is evenly distributed across all nodes. The
discovery of deadlocks becomes faster as well.
Hierarchical method: It is the most advantageous method. In a distributed system, it
integrates both centralized and distributed techniques for deadlock detection. In this
technique, some nodes or clusters of nodes are in charge of deadlock detection, which is
managed by a single node.
Algorithms for deadlock detection
The following Knapps' classification algorithms can be used to request resources in distributed
systems in a variety of ways:
Path pushing algorithm
The main idea is to generate a global WFG (wait-for graph) for each distributed system site:
1. When an algorithm class conducts a deadlock calculation, it broadcasts its local WFG to all
surrounding areas.
2. When it receives information from its neighbors, it does the exact computation and returns
the resulting local WFGs.
3. This process repeats until all locations have received their local WFGs.
Edge changing algorithms
This is a straightforward approach for detecting the existence of cycles in a distributed network.
Messages are classified into two types:
Request
Reply
A probe message is distinct from a recommendation or a reply message. Probe messages are
limited in length and size. A request message is sent from one site to another to access a requested
resource, and the responding site sends a reply message.
If the graph contains a cycle, the first site will get the matching reply message from the second site
through another channel (the second site is not directly connected to the first site). This is due to
the presence of a cycle.
Another fully distributed deadlock detection algorithm is given by Chandy, Misra, and Hass
(1983). This is considered an edge-chasing, probe-based algorithm.
It is also considered one of the best deadlock detection algorithms for distributed systems.
If a process makes a request for a resource which fails or times out, the process generates a probe
message and sends it to each of the processes holding one or more of its requested resources.
the id of the process that is blocked (the one that initiates the probe message);
the id of the process is sending this particular version of the probe message; and
the id of the process that should receive this probe message.
When a process receives a probe message, it checks to see if it is also waiting for resources.
If not, it is currently using the needed resource and will eventually finish and release the resource.
If it is waiting for resources, it passes on the probe message to all processes it knows to be holding
resources it has itself requested.
The process first modifies the probe message, changing the sender and receiver ids.
If a process receives a probe message that it recognizes as having initiated, it knows there is a
cycle in the system and thus, deadlock.
The following example is based on the same data used in the Silberschatz-Galvin algorithm
example. In this case P1 initiates the probe message, so that all the messages shown have P1 as the
[Link] the probe message is received by process P3, it modifies it and sends it to two more
[Link], the probe message returns to process P1. Deadlock!
It is easy to implement.
Each probe message is of fixed length.
There is very little computation.
There is very little overhead.
There is no need to construct a graph, nor to pass graph information to other sites.
This algorithm does not find false (phantom) deadlock.
There is no need for special data structures.
UNIT – IV CONSENSUS AND RECOVERY
PART – B & C
1. What are the key assumption underlying while designing agreement algorithms and brief
them. (Apr / May 2023) [CR]
Consensus algorithms necessarily assume that some processes and systems will be unavailable and
that some communications will be lost. Hence these algorithms must be fault-tolerant.
Examples of consensus algorithm: Deciding whether to commit a distributed transaction to a
database.
Synchronizing state machine replicas and ensuring consistency among them.
Assumptions in Consensus algorithms Failure models:
Some of the processes may be faulty in distributed systems. A faulty process can behave in any
manner allowed by the failure model assumed. Some of the well known failure models includes
fail-stop, send omission and receive omission, and Byzantine failures. Fail stop model: a process
may crash in the middle of a step, which could be the execution of a local operation or processing
of a message for a send or receive event. it may send a message to only a subset of the destination
set before crashing. Byzantine failure model: a process may behave arbitrarily. The choice of the
failure model determines the feasibility and complexity of solving consensus.
Synchronous/asynchronous communication:
If a failure-prone process chooses to send a message to process but fails, then intended process
cannot detect the non-arrival of the message.
This is a major hurdle in asynchronous system. In a synchronous system, a unsent message
scenario can be identified by the intended recipient, at the end of the round. The intended
recipient can deal with the non-arrival of the expected message by assuming the arrival of a
message containing some default data, and then proceeding with the next round of the algorithm.
Network connectivity: The system has full logical connectivity, i.e., each process can
communicate with any other by direct message passing. Sender identification:
A process that receives a message always knows the identity of the sender process. When
multiple messages are expected from the same sender in a single round, a scheduling algorithm is
employed that sends these messages in sub-rounds, so that each message sent within the round can
be uniquely identified. Channel reliability:
The channels are reliable, and only the processes may fail. Authenticated vs. non-authenticated
messages:
With unauthenticated messages, when a faulty process relays a message to other processes (i) it
can forge the message and claim that it was received from another process, (ii) it can also tamper
with the contents of a received message before relaying it. Thus, faulty processes can inflict less
damage. Agreement variable:
The agreement variable may be boolean or multi valued, and need not be an integer. This
simplifying assumption does not affect the results for other data types, but helps in the abstraction
while presenting the algorithms. Byzantine General problem The Byzantine Generals’ Problem
(BGP) is a classic problem faced by any distributed computer system network.
The army has many divisions and each division has a general.
The generals communicate between each as well as between all lieutenants within their division
only through messengers. All the generals or commanders have to agree upon one of the two plans
of action.
Exact time to attack all at once or if faced by fierce resistance then the time to retreat all at once.
The army cannot hold on forever.
2. Illustrate the different types of failure in distributed systems and explain how to prevent
them. (Nov / Dec 2022) [UN]
In a distributed database system, we need to deal with four types of failures: transaction failures
(aborts), site (system) failures, media (disk) failures, and communication line failures. Some of
these are due to hardware and others are due to software.
In a distributed database system, we need to deal with four types of failures: transaction failures
(aborts), site (system) failures, media (disk) failures, and communication line failures. Some of
these are due to hardware and others are due to software.
1. Transaction Failures: Transactions can fail for a number of reasons. Failure can be due to an
error in the transaction caused by incorrect input data as well as the detection of a present or
potential [Link] usual approach to take in cases of transaction failure is to abort the
transaction, thus resetting the database to its state prior to the start of this transaction.
2. Site (System) Failures: The reasons for system failure can be traced back to a hardware or to a
software failure. The system failure is always assumed to result in the loss of main memory
contents. Therefore, any part of the database that was in main memory buffers is lost as a result of
a system failure. Total failure refers to the simultaneous failure of all sites in the distributed
system; partial failure indicates the failure of only some sites while the others remain operational.
3. Media Failures: Media failure refers to the failures of the secondary storage devices that store
the database. Such failures may be due to operating system errors, as well as to hardware faults
such as head crashes or controller failures. The important point is that all or part of the database
that is on the secondary storage is considered to be destroyed and inaccessible.
4. Communication Failures There are a number of types of communication failures. The most
common ones are the errors in the messages, improperly ordered messages, lost messages, and
communication line failures.
3. Illustrate briefly the two kinds of checkpoints for checkpoint algorithm. (Nov / Dec 2022)
[UN]
There are two main approaches for check pointing in the distributed computing
systems: coordinated check pointing and uncoordinated check pointing. In the coordinated check
pointing approach, processes must ensure that their checkpoints are consistent.
In the coordinated check pointing approach, processes must ensure that their checkpoints are
consistent. This is usually achieved by some kind of two-phase commit protocol algorithm. In the
uncoordinated check pointing, each process checkpoints its own state independently.
Asynchronous check pointing: – After executing an event, the triplet is recorded without any
synchronization with other processes. – Local checkpoint consists of set of records, first are stored
in volatile log, then moved to stable log.
The product supports two checkpoint algorithms: the time-based algorithm and the record based
algorithm. Both are explained in the following sections.
Time-based algorithm
The time-based checkpoint algorithm commits global transactions at a specified time interval. The
following example declares a time-based algorithm in xJCL:
<checkpoint-algorithm name="timebased">
<classname>[Link]</classname>
<props>
<prop name="interval" value="15" />
<prop name="TransactionTimeOut" value="30" />
</props>
</checkpoint-algorithm>
The units of interval and TransactionTimeOut properties in the previous example are expressed in
seconds.
Record-based algorithm
The record-based checkpoint algorithm commits global transactions at a specified number of
iterations of the processJobStep method of batch step. Each call to the processJobStep method is
treated as iterating through one record. The processJobStep method can retrieve multiple records
from a batch data stream on each call. However, for this checkpoint algorithm one record is the
equivalent one call to the process Job Step method.
The following example declares a record-based algorithm in xJCL:
<checkpoint-algorithm name="record based">
<classname>[Link]</classname>
<props>
<prop name="record count" value="1000" />
<prop name="Transaction Time Out" value="60" />
</props>
</checkpoint-algorithm>
The unit of the Transaction Time Out property in the previous example is expressed in seconds.
If not specified in x JCL, the default transaction timeout is 60 seconds and the default record
count is 10000.
4. Discuss the issues in failure recovery with an example. (Apr / May 2022) [CR]
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network.
The processes communicate solely by exchanging messages over faultfree, FIFO communication
channels
Processes Pi, Pj , and Pk have taken checkpoints The rollback of process 𝑃𝑖 to checkpoint 𝐶𝑖,1
created an orphan message H
Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1
– Message C: a delayed message – Message D: a lost message since the send event for D is
recorded in the restored state for 𝑃𝑗, but the receive event has been undone at process 𝑃𝑖. – Lost
messages can be handled by having processes keep a message log of all the sent messages –
Messages E, F: delayed orphan messages. After resuming execution from their checkpoints,
processes will generate both of these messages.
5. Explain the agreement statement that should be followed in synchronous system with
failure. (Apr / May 2022) [UN]
In a failure-free system, consensus can be reached by collecting information from the different
processes, arriving at a decision, and distributing this decision in the system.
A distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received. The decision can be reached by using an
application specific function.
Algorithms to collect the initial values and then distribute the decision may be based on the
token circulation on a logical ring, or the three-phase tree-based broadcast converge cast:
broadcast, or direct communication with all nodes. In a synchronous system, this can be done
simply in a constant number of rounds.
6. Write about the issues in failure recovery and discuss about any two recovery
mechanisms. (Nov / Dec 2021) [CR]
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery.
The computation comprises of three processes Pi, Pj, and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels. Processes Pi, Pj , and Pk have taken checkpoints.
Orphan message I is created due to the roll back of process 𝑃𝑗 to checkpoint 𝐶𝑗,1
Problem Specifications Byzantine Agreement (single source has an initial value) Agreement: All
non-faulty processes must agree on the same value. Validity: If the source process is non-faulty,
then the agreed upon value by all the non-faulty processes must be the same as the initial value
of the source. Termination: Each non-faulty process must eventually decide on a value.
Consensus Problem (all processes have an initial value) Agreement: All non-faulty processes
must agree on the same (single) value. Validity: If all the non-faulty processes have the same
initial value, then the agreed upon value by all the non-faulty processes must be that same value.
Termination: Each non-faulty process must eventually decide on a value. Interactive Consistency
(all processes have an initial value) Agreement: All non-faulty processes must agree on the same
array of values A[v1 . . . vn]. Validity: If process i is non-faulty and its initial value is vi , then all
non-faulty processes agree on vi as the ith element of the array A. If process j is faulty, then the
non-faulty processes can agree on any value for A[j]. Termination: Each non-faulty process must
eventually decide on the array A. These problems are equivalent to one another! Show using
reductions.
Further, common knowledge of the decision value can be obtained using an additional round.
In an asynchronous system, consensus can similarly be reached in a constant number of message
hops. Further, concurrent common knowledge of the consensus value can also be attained.
8. Construct Byzantine agreement problem in distributed system. [AP]
To achieve reliability, the fault-tolerance scheme of the distributed computing system must be
revised. This kind of problem is known as a Byzantine agreement (BA) problem. It requires all
fault-free processors to agree on a common value, even if some components are corrupt.
The Byzantine agreement problem necessitates that a single value be agreed upon by all non-
faulty processors, with the initial value chosen by an arbitrarily chosen processor.
he byzantine General's problem is an illustrative example of byzantine faults: A number of
different armies want to besiege an enemy city. Success requires agreement on a common plan
amongst the disbursed army; communication is only by messages.
here are two categories of failures that are considered. One is fail-stop(in which the node fails and
stops operating) and other is arbitrary-node failure. Some of the arbitrary node failures are given
below : Failure to return a result.
In a failure-free system, consensus can be reached by collecting information from the different
processes, arriving at a decision, and distributing this decision in the system.
A distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received.
The decision can be reached by using an application specific function.
Algorithms to collect the initial values and then distribute the decision may be based on the token
circulation on a logical ring, or the three-phase tree-based broadcast converge cast: broadcast, or
direct communication with all nodes. In a synchronous system, this can be done simply in a
constant number of rounds.
Further, common knowledge of the decision value can be obtained using an additional round. In
an asynchronous system, consensus can similarly be reached in a constant number of message
hops.
Further, concurrent common knowledge of the consensus value can also be attained.
The consensus algorithm for n processes where up to f processes where f < n may fail in a fail
stop failure model. Here the consensus variable x is integer value; each process has initial value
xi.
Byzantine Agreement Problem: The Byzantine agreement problem necessitates that a single
value be agreed upon by all non-faulty processors, with the initial value chosen by an arbitrarily
chosen processor.
The source processor broadcasts this initial value to all other processors, and the solution must
assure agreement on the same value among all non-faulty processors, with the exception that if the
source processor is not faulty, the agreed-upon value must match the source’s initial value.
In the event that the source processor is faulty, non-faulty processors can agree on any common
value, regardless of what faulty processors agree on or agree on at all.
The Consensus Problem: Each processor has its own initial value in the Consensus issue, and
non-faulty processors must agree on a single, common value.
Each processor sends out a broadcast its starting value to all other processors, which may be
different, and the solution must assure agreement on the same single value across all non-faulty
processors.
If the initial values are different, all non-faulty processors can agree on any common value. The
values reached by defective processors are meaningless.
Each CPU broadcasts its initial value to all other processors, and the solution must assure
agreement on the same vector across all non-faulty processes (V1, V2, …… , Vn). If the ith
processor is not defective and its initial value is vi, then the ith value that all non-faulty processors
must agree on must be vi. If the jth processor is malfunctioning, the remaining processors can
agree on any common value for vj.
Agreement Failure
Purpose Requirements
Protocol Model
Rollback recovery protocols restore the system back to a consistent state after a failure, It achieves
fault tolerance by periodically saving the state of a process during the failure free execution.
A checkpoint can be saved on either the stable storage or the volatile storage
This phenomenon of cascaded rollback is called the domino effect. Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing Techniques that avoid
domino effect
1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to form
a system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage or
the volatile storage depending on the failure scenarios to be tolerated.
A distributed system consists of a fixed number of processes, P1, P2,…_ PN , which communicate
only through messages. Processes cooperate to execute a distributed application and interact with
the outside world by receiving and sending input and output messages, respectively.
Rollback-recovery protocols generally make assumptions about the reliability of the inter-process
communication. Some protocols assume that the communication uses first-in-first-out (FIFO)
order, while other protocols assume that the communication subsystem can lose, duplicate, or
reorder messages. Rollback-recovery protocols therefore must maintain information about the
internal interactions among processes and also the external interactions with the outside world.
Advantages
Disadvantages
ii. Recovery from a failure is slow because processes need to iterate to find a consistent set of
checkpoints
iii. Each process maintains multiple checkpoints and periodically invoke a garbage collection
algorithm
iv. Not suitable for application with frequent output commits The processes record the
dependencies among their checkpoints caused by message exchange during failure-free operation
The following direct dependency tracking technique is commonly used in uncoordinated check
pointing.
In coordinated check pointing, processes orchestrate their check pointing activities so that all local
checkpoints form a consistent global state Types
i. Blocking Check pointing: After a process takes a local checkpoint, to prevent orphan messages,
it remains blocked until the entire check pointing activity is complete Disadvantages: The
computation is blocked during the check pointing
ii. Non-blocking Check pointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated check pointing is to prevent a process from
receiving application messages that could make the checkpoint inconsistent.
3. Communication-induced Check pointing
Communication-induced check pointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints. Two types of checkpoints
i. Autonomous checkpoints
15. Explain the Koo and Toueg Coordinated Check pointing and Recovery Technique. [UN]
Koo and Toueg coordinated check pointing and recovery technique takes a consistent set of
checkpoints and avoids the domino effect and live lock problems during the recovery.
• Includes 2 parts: the check pointing algorithm and the recovery algorithm .
The Check pointing Algorithm The checkpoint algorithm makes the following assumptions about
the distributed system: Processes communicate by exchanging messages through communication
channels.
Assume that end-to-end protocols (the sliding window protocol) exist to handle with message
loss due to rollback recovery and communication failure. Communication failures do not divide
the network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent and
Tentative.
First Phase
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to take
tentative checkpoints. Each process informs Pi whether it succeeded in taking a tentative
checkpoint.
2. A process says “no” to a request if it fails to take a tentative checkpoint 3. If Pi learns that all
the processes have successfully taken tentative checkpoints, Pi decides that all tentative
checkpoints should be made permanent; otherwise, Pi decides that all the tentative checkpoints
should be thrown-away.
Second Phase
1. Pi informs all the processes of the decision it reached at the end of the first phase.
3. Either all or none of the processes advance the checkpoint by taking permanent checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot send
messages related to the basic computation until it is informed of Pi’s decision.
16. Determine the Algorithm for asynchronous check pointing and recovery in Distributed
Systems. [EV]
The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
The algorithm makes the following assumptions about the underlying system:
Two type of log storage are maintained: – Volatile log: short time to access but lost if processor
crash. Move to stable log periodically. – Stable log: longer time to access but remained if crashed
A. Asynchronous Check pointing
After executing an event, the triplet is recorded without any synchronization with other processes.
– Local checkpoint consists of set of records, first are stored in volatile log, then moved to stable
log.
B. The Recovery Algorithm
Notations and data structure The following notations and data structure are used by the algorithm:
Basic idea
Since the algorithm is based on asynchronous check pointing, the main issue in the recovery is to
find a consistent set of checkpoints to which the system can be restored.
The recovery algorithm achieves this by making each processor keep track of both the number of
messages it has sent to other processors as well as the number of messages it has received from
other processors. Whenever a processor rolls back, it is necessary for all other processors to find
out if any message has become an orphan message.
Orphan messages are discovered by comparing the number of messages sent to and received from
neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.
UNIT – V CLOUD COMPUTING
PART B & C
1. Originate of the Cloud Computing. [CR]
Before emerging the cloud computing, there was Client/Server computing which is basically a
centralized storage in which all the software applications, all the data and all the controls are
resided on the server side.
If a single user wants to access specific data or run a program, he/she need to connect to the server
and then gain appropriate access, and then he/she can do his/her business.
Then after, distributed computing came into picture, where all the computers are networked
together and share their resources when needed.
On the basis of above computing, there was emerged of cloud computing concepts that
later implemented.
At around in 1961, John MacCharty suggested in a speech at MIT that computing can be sold like
a utility, just like a water or electricity. It was a brilliant idea, but like all brilliant ideas, it was
ahead if its time, as for the next few decades, despite interest in the model, the technology simply
was not ready for it.
But of course time has passed and the technology caught that idea and after few years we
mentioned that:
In 1999, [Link] started delivering of applications to users using a simple website. The
applications were delivered to enterprises over the Internet, and this way the dream of computing
sold as utility were true.
In 2002, Amazon started Amazon Web Services, providing services like storage, computation and
even human intelligence. However, only starting with the launch of the Elastic Compute Cloud in
2006 a truly commercial service open to everybody existed.
In 2009, Google Apps also started to provide cloud computing enterprise applications.
Of course, all the big players are present in the cloud computing evolution, some were earlier,
some were later. In 2009, Microsoft launched Windows Azure, and companies like Oracle and
HP have all joined the game. This proves that today, cloud computing has become mainstream.
2. Demonstrate the Cloud Computing Architecture. [UN]
As we know, cloud computing technology is used by both small and large organizations to store
the information in cloud and access it from anywhere at anytime using the internet connection.
Cloud computing architecture is a combination of service-oriented architecture and event-
driven architecture.
Cloud computing architecture is divided into the following two parts -
o Front End
o Back End
The below diagram shows the architecture of cloud computing -
Front End
The front end is used by the client. It contains client-side interfaces and applications that are
required to access the cloud computing platforms. The front end includes web servers (including
Chrome, Firefox, internet explorer, etc.), thin & fat clients, tablets, and mobile devices.
Back End
The back end is used by the service provider. It manages all the resources that are required to
provide cloud computing services. It includes a huge amount of data storage, security mechanism,
virtual machines, deploying models, servers, traffic control mechanisms, etc.
3. Explain the Components of Cloud Computing Architecture. [UN]
There are the following components of cloud computing architecture -
1. Client Infrastructure
Client Infrastructure is a Front end component. It provides GUI (Graphical User Interface) to
interact with the cloud.
2. Application
The application may be any software or platform that a client wants to access.
3. Service
A Cloud Services manages that which type of service you access according to the client’s
requirement.
Cloud computing offers the following three type of services:
i. Software as a Service (SaaS) – It is also known as cloud application services. Mostly, SaaS
applications run directly through the web browser means we do not require to download and install
these applications. Some important example of SaaS is given below –
Example: Google Apps, Salesforce Dropbox, Slack, Hubspot, Cisco WebEx.
ii. Platform as a Service (PaaS) – It is also known as cloud platform services. It is quite similar
to SaaS, but the difference is that PaaS provides a platform for software creation, but using SaaS,
we can access software over the internet without the need of any platform.
Example: Windows Azure, [Link], Magento Commerce Cloud, OpenShift.
iii. Infrastructure as a Service (IaaS) – It is also known as cloud infrastructure services. It is
responsible for managing applications data, middleware, and runtime environments.
Example: Amazon Web Services (AWS) EC2, Google Compute Engine (GCE), Cisco Metapod.
4. Runtime Cloud
Runtime Cloud provides the execution and runtime environment to the virtual machines.
5. Storage
Storage is one of the most important components of cloud computing. It provides a huge amount
of storage capacity in the cloud to store and manage data.
6. Infrastructure
It provides services on the host level, application level, and network level. Cloud infrastructure
includes hardware and software components such as servers, storage, network devices,
virtualization software, and other storage resources that are needed to support the cloud computing
model.
7. Management
Management is used to manage components such as application, service, runtime cloud, storage,
infrastructure, and other security issues in the backend and establish coordination between them.
8. Security
Security is an in-built back end component of cloud computing. It implements a security
mechanism in the back end.
9. Internet
The Internet is medium through which front end and back end can interact and communicate with
each other.
4. Summarize the Cloud Computing Technologies. [UN]
A list of cloud computing technologies are given below -
o Virtualization
o Service-Oriented Architecture (SOA)
o Grid Computing
o Utility Computing
Virtualization
Virtualization is the process of creating a virtual environment to run multiple applications and
operating systems on the same server. The virtual environment can be anything, such as a single
instance or a combination of many operating systems, storage devices, network application servers,
and other environments.
Grid Computing
Grid computing is also known as distributed computing. It is a processor architecture that
combines various different computing resources from multiple locations to achieve a common
goal. In grid computing, the grid is connected by parallel nodes to form a computer cluster. These
computer clusters are in different sizes and can run on any operating system.
Grid computing contains the following three types of machines -
1. Control Node: It is a group of server which administrates the whole network.
2. Provider: It is a computer which contributes its resources in the network resource pool.
3. User: It is a computer which uses the resources on the network.
Mainly, grid computing is used in the ATMs, back-end infrastructures, and marketing
research.
Utility Computing
Utility computing is the most trending IT service model. It provides on-demand computing
resources (computation, storage, and programming services via API) and infrastructure based on
the pay per use method. It minimizes the associated costs and maximizes the efficient use of
resources. The advantage of utility computing is that it reduced the IT cost, provides greater
flexibility, and easier to manage.
Large organizations such as Google and Amazon established their own utility services for
computing storage and application.
5. Classify the Cloud Computing Applications. [UN]
Cloud service providers provide various applications in the field of art, business, data storage and
backup services, education, entertainment, management, social networking, etc.
The most widely used cloud computing applications are given below -
1. Art Applications
Cloud computing offers various art applications for quickly and easily design attractive cards,
booklets, and images. Some most commonly used cloud art applications are given below:
i Moo
Moo is one of the best cloud art applications. It is used for designing and printing business cards,
postcards, and mini cards.
ii. Vistaprint
Vistaprint allows us to easily design various printed marketing products such as business cards,
Postcards, Booklets, and wedding invitations cards.
iii. Adobe Creative Cloud
Adobe creative cloud is made for designers, artists, filmmakers, and other creative professionals.
It is a suite of apps which includes PhotoShop image editing programming, Illustrator, InDesign,
TypeKit, Dreamweaver, XD, and Audition.
2. Business Applications
Business applications are based on cloud service providers. Today, every organization requires the
cloud business application to grow their business. It also ensures that business applications are
24*7 available to users.
There are the following business applications of cloud computing -
i. MailChimp
MailChimp is an email publishing platform which provides various options to design,
send, and save templates for emails.
iii. Salesforce
Salesforce platform provides tools for sales, service, marketing, e-commerce, and more. It also
provides a cloud development platform.
iv. Chatter
Chatter helps us to share important information about the organization in real time.
v. Bitrix24
Bitrix24 is a collaboration platform which provides communication, management, and social
collaboration tools.
vi. Paypal
Paypal offers the simplest and easiest online payment mode using a secure internet account.
Paypal accepts the payment through debit cards, credit cards, and also from Paypal account
holders.
vii. Slack
Slack stands for Searchable Log of all Conversation and Knowledge. It provides a user-
friendly interface that helps us to create public and private channels for communication.
viii. Quickbooks
Quickbooks works on the terminology "Run Enterprise anytime, anywhere, on any device." It
provides online accounting solutions for the business. It allows more than 20 users to work
simultaneously on the same system.
3. Data Storage and Backup Applications
Cloud computing allows us to store information (data, files, images, audios, and videos) on the
cloud and access this information using an internet connection. As the cloud provider is responsible
for providing security, so they offer various backup recovery application for retrieving the lost
data.
A list of data storage and backup applications in the cloud are given below -
i. [Link]
Box provides an online environment for secure content management,
workflow, and collaboration. It allows us to store different files such as Excel, Word, PDF, and
images on the cloud. The main advantage of using box is that it provides drag & drop service for
files and easily integrates with Office 365, G Suite, Salesforce, and more than 1400 tools.
ii. Mozy
Mozy provides powerful online backup solutions for our personal and business data. It schedules
automatically back up for each day at a specific time.
iii. Joukuu
Joukuu provides the simplest way to share and track cloud-based backup files. Many users use
joukuu to search files, folders, and collaborate on documents.
iv. Google G Suite
Google G Suite is one of the best cloud storage and backup application. It includes Google
Calendar, Docs, Forms, Google+, Hangouts, as well as cloud storage and tools for managing cloud
apps. The most popular app in the Google G Suite is Gmail. Gmail offers free email services to
users.
4. Education Applications
Cloud computing in the education sector becomes very popular. It offers various online distance
learning platforms and student information portals to the students. The advantage of using
cloud in the field of education is that it offers strong virtual classroom environments, Ease of
accessibility, secure data storage, scalability, greater reach for the students, and minimal hardware
requirements for the applications.
There are the following education applications offered by the cloud -
i. Google Apps for Education
Google Apps for Education is the most widely used platform for free web-based email, calendar,
documents, and collaborative study.
ii. Chromebooks for Education
Chromebook for Education is one of the most important Google's projects. It is designed for the
purpose that it enhances education innovation.
iii. Tablets with Google Play for Education
It allows educators to quickly implement the latest technology solutions into the classroom and
make it available to their students.
iv. AWS in Education
AWS cloud provides an education-friendly environment to universities, community colleges, and
schools.
5. Entertainment Applications
Entertainment industries use a multi-cloud strategy to interact with the target audience. Cloud
computing offers various entertainment applications such as online games and video conferencing.
i. Online games
Today, cloud gaming becomes one of the most important entertainment media. It offers various
online games that run remotely from the cloud. The best cloud gaming services are Shaow,
GeForce Now, Vortex, Project xCloud, and PlayStation Now.
ii. Video Conferencing Apps
Video conferencing apps provides a simple and instant connected experience. It allows us to
communicate with our business partners, friends, and relatives using a cloud-based video
conferencing. The benefits of using video conferencing are that it reduces cost, increases
efficiency, and removes interoperability.
6. Management Applications
Cloud computing offers various cloud management tools which help admins to manage all types
of cloud activities, such as resource deployment, data integration, and disaster recovery. These
management tools also provide administrative control over the platforms, applications, and
infrastructure.
Some important management applications are -
i. Toggl
Toggl helps users to track allocated time period for a particular project.
ii. Evernote
Ever note allows you to sync and save your recorded notes, typed notes, and other notes in one
convenient place. It is available for both free as well as a paid version.
It uses platforms like Windows, macOS, Android, iOS, Browser, and Unix.
iii. Outright
Outright is used by management users for the purpose of accounts. It helps to track income,
expenses, profits, and losses in real-time environment.
iv. Go To Meeting
GoToMeeting provides Video Conferencing and online meeting apps, which allows you to start
a meeting with your business partners from anytime, anywhere using mobile phones or tablets.
Using GoToMeeting app, you can perform the tasks related to the management such as join
meetings in seconds, view presentations on the shared screen, get alerts for upcoming meetings,
etc.
6. Illustrate the Security Risks of Cloud Computing. [UN]
Cloud computing provides various advantages, such as improved collaboration, excellent
accessibility, Mobility, Storage capacity, etc. But there are also security risks in cloud computing.
Some most common Security Risks of Cloud Computing are given below-
I. Data Loss
Data loss is the most common cloud security risks of cloud computing. It is also known as data
leakage. Data loss is the process in which data is being deleted, corrupted, and unreadable by a
user, software, or application.
II. Hacked Interfaces and Insecure APIs
As we all know, cloud computing is completely depends on Internet, so it is compulsory to protect
interfaces and APIs that are used by external users. APIs are the easiest way to communicate with
most of the cloud services. In cloud computing, few services are available in the public domain.
These services can be accessed by third parties, so there may be a chance that these services easily
harmed and hacked by hackers.
III. Data Breach
Data Breach is the process in which the confidential data is viewed, accessed, or stolen by the third
party without any authorization, so organization's data is hacked by the hackers.
IV. Vendor lock-in
Vendor lock-in is the of the biggest security risks in cloud computing. Organizations may face
problems when transferring their services from one vendor to another. As different vendors
provide different platforms, that can cause difficulty moving one cloud to another.
V. Increased complexity strains IT staff
Migrating, integrating, and operating the cloud services is complex for the IT staff. IT staff must
require the extra capability and skills to manage, integrate, and maintain the data to the cloud.
VI. Spectre & Meltdown
Spectre & Meltdown allows programs to view and steal data which is currently processed on
computer. It can run on personal computers, mobile devices, and in the cloud. It can store the
password, your personal information such as images, emails, and business documents in the
memory of other running programs.
VII. Denial of Service (DoS) attacks
Denial of service (DoS) attacks occur when the system receives too much traffic to buffer the
server. Mostly, DoS attackers target web servers of large organizations such as banking sectors,
media companies, and government organizations. To recover the lost data, DoS attackers charge a
great deal of time and money to handle the data.
VIII. Account hijacking
Account hijacking is a serious security risk in cloud computing. It is the process in which
individual user's or organization's cloud account (bank account, e-mail account, and social media
account) is stolen by hackers. The hackers use the stolen account to perform unauthorized
activities.
7. Classify the Different types of Cloud Computing. [AN]
Types of Cloud
There are the following 4 types of cloud that you can deploy according to the organization's needs-
o Public Cloud
o Private Cloud
o Hybrid Cloud
o Community Cloud
Public Cloud
Public cloud is open to all to store and access information via the Internet using the pay-per-usage
method.
In public cloud, computing resources are managed and operated by the Cloud Service Provider
(CSP).
Example: Amazon elastic compute cloud (EC2), IBM SmartCloud Enterprise, Microsoft, Google
App Engine, Windows Azure Services Platform.
Advantages of Public Cloud
There are the following advantages of Public Cloud -
o Public cloud is owned at a lower cost than the private and hybrid cloud.
o Public cloud is maintained by the cloud service provider, so do not need to worry about the
maintenance.
Disadvantages of Public Cloud
o Public Cloud is less secure because resources are shared publicly.
o Performance depends upon the high-speed internet network link to the cloud provider.
o The Client has no control of data.
Private Cloud
Private cloud is also known as an internal cloud or corporate cloud. It is used by organizations
to build and manage their own data centers internally or by the third party. It can be deployed using
Opensource tools such as Openstack and Eucalyptus.
Based on the location and management, National Institute of Standards and Technology (NIST)
divide private cloud into the following two parts-
o On-premise private cloud
o Outsourced private cloud
Advantages of Private Cloud
There are the following advantages of the Private Cloud -
o Private cloud provides a high level of security and privacy to the users.
o Private cloud offers better performance with improved speed and space capacity.
Disadvantages of Private Cloud
o Skilled people are required to manage and operate cloud services.
o Private cloud is accessible within the organization, so the area of operations is limited.
Hybrid Cloud
Hybrid Cloud is a combination of the public cloud and the private cloud. we can say:
Hybrid Cloud = Public Cloud + Private Cloud
Hybrid cloud is partially secure because the services which are running on the public cloud can be
accessed by anyone, while the services which are running on a private cloud can be accessed only
by the organization's users.
Example: Google Application Suite (Gmail, Google Apps, and Google Drive), Office 365 (MS
Office on the Web and One Drive), Amazon Web Services.
Advantages of Hybrid Cloud
There are the following advantages of Hybrid Cloud -
o Hybrid cloud is suitable for organizations that require more security than the public cloud.
o Hybrid cloud helps you to deliver new products and services more quickly.
Disadvantages of Hybrid Cloud
o In Hybrid Cloud, security feature is not as good as the private cloud.
o Managing a hybrid cloud is complex because it is difficult to manage more than one type
of deployment model.
o In the hybrid cloud, the reliability of the services depends on cloud service providers.
Community Cloud
Community cloud allows systems and services to be accessible by a group of several organizations
to share the information between the organization and a specific community. It is owned, managed,
and operated by one or more organizations in the community, a third party, or a combination of
them.
Example: Health Care community cloud
Advantages of Community Cloud
There are the following advantages of Community Cloud -
o Community cloud is cost-effective because the whole cloud is being shared by several
organizations or communities.
o It provides better security than the public cloud.
o It provdes collaborative and distributive environment.
Disadvantages of Community Cloud
o Community cloud is not a good choice for every organization.
o Security features are not as good as the private cloud.
o It is not suitable if there is no collaboration.
o The fixed amount of data storage and bandwidth is shared among all community members.
8. Difference between public cloud, private cloud, hybrid cloud, and community cloud.
[AN]
The below table shows the difference between public cloud, private cloud, hybrid cloud, and
community cloud.
11. Demonstrate the Factors Driving Cloud Adoption in the Financial Sector. [UN]
A number of factors are driving its adoption at an accelerating rate.
1. Need for Agility and Flexibility
These systems are monolithic, heavily siloed, and difficult both to change and scale. In contrast,
cloud native platforms are scalable on-demand. When software features are released to distributed,
microservice-based applications by DevOps teams, organizations are able to innovate in response
to market changes and pivot as needed.
2. Transition towards OpEx Models
Traditionally, financial services organizations used on-prem data centers that required massive
CapEx investments to purchase space, equipment, software, and trained workforces. The cloud
offers a great opportunity to switch IT spending from CapEx to OpEx models.
3. Need for Real-Time Information
The need for real-time information is present in all industries but is especially strong in the
financial sector. These organizations often leverage high-frequency trading applications to analyze
tick-by-tick data and search for signals in markets, such as price changes and movements in rates.
4. Massive and Steadily Increasing Data Storage Requirements
Financial services organizations have always had to store massive amounts of data, and storing
this data properly has always been an important part of maintaining operations while minimizing
cost.
5. Changing Customer Behaviour
The behaviour of customers is changing quickly. A survey by Gallup found that, in 2018, only
66% of millennials had visited a physical bank branch in the past year. Younger generations are
increasingly using online banking services, especially on their phones. Banks need to offer
customers personalized digital experiences, services that are enabled by cloud native technologies.
6. New Kinds of Services
In addition to supporting more mobile experiences, cloud native technologies also offer banks the
ability to provide unique types of services in the marketplace. Customers are also looking for new
kinds of applications to support new investment types.
7. Security and Compliance
The financial industry operates around capital markets, banking, and insurance, and each
industry’s profile has very specific regulatory requirements. Offering improving security
mechanisms, public cloud providers can actually be much better than traditional systems at
systemic security services, such as identifying potential breaches. Organizations that
follow container security best practices can secure their applications against a wider range of
attacks.
8. Green IT
The urgency to reduce greenhouse gas emissions means financial services organizations are
looking for ways to lower the carbon footprint of their IT. The cloud will play a pivotal role in
enabling both.
12. Explain the Virtualization in Cloud Computing. [UN]
Virtualization is the "creation of a virtual (rather than actual) version of something, such as a
server, a desktop, a storage device, an operating system or network resources".
What is the concept behind the Virtualization?
Creation of a virtual machine over existing operating system and hardware is known as Hardware
Virtualization. A Virtual machine provides an environment that is logically separated from the
underlying hardware.
Types of Virtualization:
1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.
1) Hardware Virtualization:
Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.
2) Operating System Virtualization:
Operating System Virtualization is mainly used for testing the applications on different platforms
of OS.
3) Server Virtualization:
Server virtualization is done because a single physical server can be divided into multiple servers
on the demand basis and for balancing the load.
4) Storage Virtualization:
Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.
How does virtualization work in cloud computing?
Virtualization plays a very important role in the cloud computing technology, normally in the
cloud computing, users share the data present in the clouds like application etc, but actually with
the help of virtualization users shares the Infrastructure.
13. Categorize of the Cloud Service Provider Companies. [AN]
Cloud Service providers (CSP) offers various services such as Software as a Service, Platform
as a service, Infrastructure as a service, network services, business applications, mobile
applications, and infrastructure in the cloud. The cloud service providers host these services in
a data center, and users can access these services through cloud provider companies using an
Internet connection.
There are the following Cloud Service Providers Companies -
1. Amazon Web Services (AWS)
AWS (Amazon Web Services) is a secure cloud service platform provided by Amazon. It offers
various services such as database storage, computing power, content delivery, Relational Database,
Simple Email, Simple Queue, and other functionality to increase the organization's growth.
Features of AWS
o AWS is cost-effective as it works on a pay-as-you-go pricing model.
o It provides various flexible storage options.
o It can efficiently manage and secure Windows workloads.
2. Microsoft Azure
Microsoft Azure is also known as Windows Azure. It supports various operating systems,
databases, programming languages, frameworks that allow IT professionals to easily build, deploy,
and manage applications through a worldwide network. It also allows users to create different
groups for related utilities.
Features of Microsoft Azure
o Microsoft Azure provides scalable, flexible, and cost-effective
o It allows developers to quickly manage applications and websites.
o It managed each resource individually.
3. Google Cloud Platform
Google cloud platform is a product of Google. It consists of a set of physical devices, such as
computers, hard disk drives, and virtual machines. It also helps organizations to simplify the
migration process.
Features of Google Cloud
o Google cloud includes various big data services such as Google BigQuery, Google
CloudDataproc, Google CloudDatalab, and Google Cloud Pub/Sub.
o It offers various scalable and high-performance
o GCP provides various serverless services such as Messaging, Data Warehouse, Database,
Compute, Storage, Data Processing, and Machine learning (ML)
4. IBM Cloud Services
IBM Cloud is an open-source, faster, and more reliable platform. It is built with a suite of advanced
data and AI tools. It offers various services such as Infrastructure as a service, Software as a
service, and platform as a service. You can access its services like compute power, cloud data &
Analytics, cloud use cases, and storage networking using internet connection.
Feature of IBM Cloud
o IBM cloud improves operational efficiency.
o Its speed and agility improve the customer's satisfaction.
o It offers Infrastructure as a Service (IaaS), Platform as a Service (PaaS), as well as Software
as a Service (SaaS)
o It offers various cloud communications services to our IT environment.
5. VMware Cloud
VMware cloud is a Software-Defined Data Center (SSDC) unified platform for the Hybrid Cloud.
It allows cloud providers to build agile, flexible, efficient, and robust cloud services.
Features of VMware
o VMware cloud works on the pay-as-per-use model and monthly subscription
o It provides better customer satisfaction by protecting the user's data.
o It eliminates the time and cost complexity.
6. Oracle cloud
Oracle cloud platform is offered by the Oracle Corporation. It combines Platform as a Service,
Infrastructure as a Service, Software as a Service, and Data as a Service with cloud infrastructure.
Features of Oracle cloud
o Its infrastructure uses various languages including, Java, Ruby, PHP, [Link].
o It integrates with Docker, VMware, and other DevOps tools.
o It maximizes the value of IT investments.
7. Red Hat
Red Hat virtualization is an open standard and desktop virtualization platform produced by Red
Hat. It is very popular for the Linux environment to provide various infrastructure solutions for
virtualized servers as well as technical workstations.
Features of Rad Hat
o Red Hat cloud includes OpenShift, which is an app development platform that allows
developers to access, modernize, and deploy apps
o It supports up to 16 virtual machines, each having up to 256GB of RAM.
o It offers better reliability, availability, and serviceability.
8. Digital Ocean
Digital Ocean is the unique cloud provider that offers computing services to the organization. It
was founded in 2011 by Moisey Uretsky and Ben. It is one of the best cloud provider that allows
us to manage and deploy web applications.
Features of DigitalOcean
o It uses the KVM hypervisor to allocate physical resources to the virtual servers.
o It provides high-quality performance.
o It offers a digital community platform that helps to answer queries and holding feedbacks.
9. Rackspace
Rackspace offers cloud computing services such as hosting web applications, Cloud Backup,
Cloud Block Storage, Databases, and Cloud Servers. The main aim to designing Rackspace is to
easily manage private and public cloud deployments. Its data centers operating in the USA, UK,
Hong Kong, and Australia.
Features of Rackspace
o Rackspace provides various tools that help organizations to collaborate and communicate
more efficiently.
o We can access files that are stored on the Rackspace cloud drive, anywhere, anytime using
any device.
o It offers 6 globally data centers.
o It can manage both virtual servers and dedicated physical servers on the same network.
10. Alibaba Cloud
Alibaba Cloud is used to develop data management and highly scalable cloud computing services.
It offers various services, including Elastic Computing, Storage, Networking, Security, Database
Services, Application Services, Media Services, Cloud Communication, and Internet of Things.
Features of Alibaba Cloud
o It globally deals with its 14 data centers.
o It offers scalable and reliable data storage.
14. How to create Amazon EC2 window instances? [RE]
To launch an instance
1. Sign in to the AWS Console and open the Amazon EC2 console.
2. From the navigation bar, select the region for the instance. Here we are going to choose
Singapore data region. Otherwise, this choice is important because some Amazon EC2
resources can be shared between regions, while others can't be.
3. On console dashboard, click Launch Instance.
4. To Choose an Amazon Machine Image (AMI) page displays a list of basic configurations
called Amazon Machine Images (AMIs) that serve as templates for your instance. Select
the 64-bit version of Microsoft Windows Server 2008 R2. Notice that this configuration is
marked as Free tier eligible.
5. To Choose an Instance Type page, you can select the hardware configuration for your
instance. The [Link] instance will be selected by default. Click Review and Launch to
let the wizard complete or not with other configuration settings for you, so you can get
started quickly.
6. To Review Instance Launch page, you need to go to the settings for your instance.
7. Click on Launch.
8. In the Select an existing key pair or create a new key pair dialog box, you can
select Choose an existing key pair, to select a key pair you already created.
9. A confirmation page will open to know that your instance is launching. Click View
Instances to close the confirmation page and return to the console.
10. On the Instances page, you can view the status of the launch. It takes a short time for an
instance to launch. When you launch an instance, its initial state is pending. After the
instance starts, its state changes to running and it receives a public DNS name.
11. Record the public DNS name for your instance because you'll need it for the next step.
12. (Optional) After your instance is launched, you can view its security group rules. From the
Instances page, select the instance. In the Description tab, find Security groups and
click view rules.
15. Contrast between AWS, Azure, and Google Cloud Platform. [UN]
App Testing It uses device farm It uses DevTest labs It uses Cloud
Test labs.