0% found this document useful (0 votes)
224 views12 pages

Distributed Systems Overview and Challenges

The document discusses key concepts in distributed systems including: 1) Distributed systems are collections of independent computers that work together to perform tasks. They face challenges of transparency, flexibility, and scaling. 2) Middleware provides services to simplify programming distributed systems and hides heterogeneity between nodes. 3) Techniques for scaling distributed systems include decentralization, hiding communication latencies, and replication. Distributed algorithms must work with partial information and node failures.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
224 views12 pages

Distributed Systems Overview and Challenges

The document discusses key concepts in distributed systems including: 1) Distributed systems are collections of independent computers that work together to perform tasks. They face challenges of transparency, flexibility, and scaling. 2) Middleware provides services to simplify programming distributed systems and hides heterogeneity between nodes. 3) Techniques for scaling distributed systems include decentralization, hiding communication latencies, and replication. Distributed algorithms must work with partial information and node failures.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

A distributed system is a collection of  Reliability

independent computers that are  Performance


used jointly to perform a single task  Scalability
or to provide a single service.

Flexibility:
Network OS:
➜ Build a system out of (only)
Properties required components
 No single system image ➜ Extensibility: Components/services
 Individual nodes are highly can be changed or added
autonomous
 All distribution of tasks is ➜ Openness of interfaces and
explicit to the user specification
 Examples: Linux, Windows ➜ Allows reimplementation
and extension

Middleware OS: ➜ Interoperability

 Similar to a uniprocessor OS ➜ Separation of policy and


but mechanism
 System independent interface ➜ Standardized internal
for distributed programming
interfaces
 Improves transparency(e.g,
hides heterogeneity)
 Provides services(e.g., naming Techniques for Scaling:
services)
 Decentralisation
 Provides programming
model(e.g., distributed objects)  Hiding communication
latencies(asynchronous
communication, reduce
The distributed nature of a system communication)
gives rise to the following challenges:
 Distribution ( spreading
 Transparency data and control around)
 Flexibility
 Replication (making  Synchronization
copies of data and  Replication and Consistency
processes).  Naming
 Fault Tolerance
 Security
Decentralisation:
Avoid centralising
Most distributed systems are built
 Services( e.g., based on a particular paradigm (or
single server) model):
 Data (e.g., central  Shared memory
directories)  Distributed objects
 Algorithms(e.g.,  Distributed file system
based on  Shared documents
complete  Distributed coordination
information)  Agents

With regards to algorithms: MISCELLANEOUS ‘RULES OF THUMB’


➜ Do not require any machine  Trade-offs: Many of the
to hold complete system state challenges provide conflicting
➜ Allow nodes to make requirements. For example
decisions based on local info better scalability can cause
➜ Algorithms must survive worse overall performance.
failure of nodes Have to make trade-offs -what
➜ No assumption of a global is more important?
clock  Separation of Concerns: Split a
problem into individual
concerns and address each
Several key principles underlying all separately.
distributed systems:
 End-to-End Argument: Some
 Concurrency communication functions can
 Communication
only be reliably implemented
at the application level
STRUCTURED OVERLAY
 Policy vs. Mechanism: A
⮚ Nodes have identifier and
system should build
range
mechanisms that allow flexible
application of policies. Avoid ⮚ Data has identifier
built-in policies.
⮚ Nodes is responsible for data
that falls in its range
System Architecture ⮚ Search is routed to appropriate
⮚ Placement of machines node

⮚ Placement of software
machines Combination of architectures:
Example:
Software Architecture ⮚ Superpeer networks
Logical organization and roles of ⮚ Collaborative distributed
software components systems
⮚ Layered ⮚ Edge-server systems
⮚ Object-oriented

⮚ Data-centered STATEFUL VS STATELESS SERVERS:


⮚ Event-based Stateful:
⮚ Keeps persistent
information about
UNSTRUCTURED OVERLAY
clients
⮚ Data stored at random nodes
⮚ Improved
⮚ Partial view: node’s list of performance
neighbours
⮚ Expensive crash
⮚ Exchange partial views with recovery
neighbours to update
⮚ Must track clients ➀Data oriented vs control oriented
communication
Stateless:
➁ Synchronous vs asynchronous
⮚ Does not keeps state of clients
communication
⮚ Soft state design: limited client
➂ Transient vs persistent
state
communication
⮚ Can change own state
without informing clients
Data-Oriented vs Control-Oriented
⮚ No cleanup after crash
Communication:
⮚ Easy to replicate
Data-oriented communication
⮚ Increase communication
➜ Facilitates data exchange
between threads
Code Mobility: ➜ Shared address space,
shared memory & message passing
Weak vs Strong Mobility
⮚ Weak transfer only code Control-oriented communication
⮚ Strong transfer code and ➜ Associates a transfer of
execution segment control with communication
➜ Active messages, remote
procedure call (RPC) & remote
Sender vs Receiver initiated method invocation (RMI)
migration:
COMMUNICATION ABSTRACTIONS:
⮚ Sender Send program to
compute server Abstractions above simple message
passing make communication easier
⮚ Receiver Download applets
for the programmer.
Provided by higher level APIs

COMMUNICATION MODES
• ➀ Remote Procedure ➜ Service discovery
Call (RPC) & Remote
➜ Event notification
Method Invocation (RMI)
• ➁ Message-Oriented
Communication GOSSIP-BASED COMMUNICATION

• ➂ Group Communication Data Streams:

• ➃ Streams ➜ Sequence of data units

➜ Can apply to discrete and


continuos media (e.g., TCP
REMOTE PROCEDURE CALL(RPC):
connection is a stream)
Idea: Replace I/O oriented message
➜ Simple stream: single sequence of
passing model by execution of a
data
procedure call on a remote node
[BN84]: ➜ Complex stream: several related
streams (substreams) (e.g.,audio and
video)
➜ Synchronous - based on blocking
messages
Transmission modes:
➜ Message-passing details hidden
from application  Asynchronous no timing
constraints (e.g., file transfer)
➜ Procedure call parameters used to
 Synchronous maximum end-to-
transmit data
end delay for each data unit
➜ Client calls local “stub” which does (e.g., sending sampled data.
messaging and marshalling Must not be too old when it
reaches destination)
 Isochronous maximum and
GROUP-BASED COMMUNICATION minimum end-to-end delay
(e.g., video)
Used for:

➜ Replication of services
Timing model of a distributed system
➜ Replication of data
Affected by: ➜ Execution speed/time of processes
➜ Execution speed/time of processes ➜ Communication delay
➜ Communication delay ➜ Clocks & clock drift
➜ Clocks & clock drift ➜ (Partial) failure
➜ (Partial) failure
With coordination you should agree
on Values:
EVALUATING DISTRIBUTED
ALGORITHMS ➜ Agree on global value
General Properties: ➜ Agree on environment
➜ Performance ➜ Agree on state
 number of messages
exchanged
Main issues of coordination and
 response/wait time
synchronization.
 Delay
 throughput: 1/(delay + • Time and Clocks:
executiontime) • Global State:
 complexity: O()
• Concurrency Control:
➜ Efficiency
 resource usage: memory, CPU,
etc. Synchronization Modes NETWORK
TIME PROTOCOL (NTP):
➜ Scalability

➜ Reliability Multicast: for LAN, low accuracy


Procedure Call: clients poll,
 number of points of failure
reasonable accuracy
(low is good)
Symmetric: Between peer servers.
highest accuracy
Timing model of a distributed system
Affected by:
Read about LOGICAL CLOCKS – 1. Released: Outside of critical
Lamport’s logical clock section
Read about VECTOR CLOCKS 2. Wanted: Waiting to enter
critical section
Read about: CHANDY & LAMPORT’S
SNAPSHOTS 3. Held: Inside critical section

DISTRIBUTION MUTUAL EXCLUSION EVALUATING DISTRIBUTED


Requirements: ALGORITHMS
• ➀ Safety: At most one process General Properties:
may execute the critical section at
➜ Performance
a time
number of messages exchanged
• ➁ Liveness: Requests to enter and
exit the critical section eventually response/wait time
succeed Delay
• ➂ Ordering: Requests are throughput: 1/(delay +
processed in happened-before executiontime)
ordering
complexity: O()

METHOD 1: CENTRAL SERVER ➜ Efficiency

METHOD 2: TOKEN RING resource usage: memory, CPU, etc.

METHOD 3: USING MULTICAST AND ➜ Scalability


LOGICAL CLOCKS ➜ Reliability
number of points of failure (low is
Algorithm by Ricart & Agrawala: good)

➜ Processes pi maintain a Lamport


clock and can communicate pairwise
➜ Processes are in one of three
states:
SYNCHRONISATION AND OTHER Multicast ISSUES
COORDINATION 1
• Multicast
Performance:
• Elections ➜ Bandwidth
• Transactions ➜ Delay
Efficiency:
Examples of Multi-Cast
➜ Avoid sending a message over
Fault Tolerance a link multiple times (stress)
Service Discovery ➜ Distribution tree
Performance ➜ Hardware support (e.g.,
Event or Notification propagation Ethernet broadcast)
Network-level vs Application-level:
➜ Network routers understand
multicast
➜ Applications (or middleware)
send unicasts to group members
➜ Overlay distribution tree

Types of MultiCast:
BASIC MULTICAST
FIFO MULTICAST
CAUSAL MULTICAST
TOTALLY ORDERED MULTICAST
Sequence Based
Properties of Multicast
Aggreement-based
Group membership:
Open vs Closed group: Other possibilities:
Reliability: ➜ Moving sequencer
Ordering: ➜ Logical clock based
• each receiver
determines order independently
• delivery based REPLICATION I SSUES
on sender timestamp ordering
Updates

➜ Token based Replica placement


➜ Physical clock ordering Redirection/Routing
Hybrid Ordering:
➜ FIFO + Total
➜ Causal + Total Replica INCONSISTENCY
Staleness:

Research Bully Algorithm: Operation order:

Three types of messages in Bully Conflicting Data


algorithm:
• Election: announce election
• Answer: response to election TYPES OF DATA-CENTRIC
• Coordinator: announce CONSISTENCY MODELS Y:
elected coordinator STRICT CONSISTENCY
Read about Election RING SEQUENTIAL CONSISTENCY
ALGORITHM
CAUSAL CONSISTENCY
WEAK CONSISTENCY
ACID PROPERTIES OF TRANSACTIONS
RELEASE CONSISTENCY
atomic
ENTRY CONSISTENC
consistent
isolated
durable
CLIENT-CENTRIC CONSISTENCY
MODELS:
CLASSIFICATION OF TRANSACTION
Flat
Nested
Distributed CONSISTENCY PROTOCOLS:
Primary-Based Protocols:
➜ Remote-write protocols • Maintainability: how easily a
failed system can be repaired
➜ Local-write protocols
Replicated-Write Protocols:
CATEGORISING FAULTS AND
➜ Active Replication FAILURES
➜ Quorum-Based Protocols • Types of Faults:
• Transient Fault: occurs
PUSH vs PULL once then disappear

REPLICA PLACEMENT • Intermittent Fault:


occurs, vanishes,
DYNAMIC REPLICATION reoccurs, vanishes, etc.
• Permanent Fault:
DEPENDABILITY DEPENDS ON: persists until faulty
component is replaced
• Failure
• Types of Failures:
• Reliable Communication
• Process Failure: process
• Process Resilience
proceeds incorrectly or
• Recovery not at all
• Storage Failure: “stable”
secondary storage is
DEPENDABILITY DEPENDS ON:
inaccessible
• Availability: system is ready to
Communication Failure:
be used immediately
communication link or
• Reliability: system can run node failure
continuously without failure
• Safety: when a system
FAILURE MODELS:
(temporarily) fails to operate
correctly, • Crash Failure: a server halts,
nothing catastrophic happens but works correctly until it
halts
• Fail-Stop: server will Arbitrary Failure: a server may
stop in a way that clients produce arbitrary response at
can tell that it has arbitrary times (aka Byzantine failure)
halted.
• Fail-Resume server will
• FAULT TOLERANCE Techniques:
stop, then resume
execution at a later time. • ➜ Prevention: prevent
or reduce occurrence of
• Fail-Silent: clients do not
faults
know server has halted
• ➜ Prediction: predict
• Omission Failure: a server fails
the faults that can occur
to respond to incoming
and deal with them
requests
• Receive Omission: fails • ➜ Masking: hide the
to receive incoming occurrence of the fault
messages • ➜ Recovery: restore an
• Send Omission: fails to erroneous state to an
send messages error-free state

• Timing Failure: a server’s


response lies outside the STUDY Byzantine Fault Tolerance
specified
time interval
• Response Failure: a server’s DS CHALLENGES
response is incorrect Transparency
• Value Failure: the value Flexibility
of the response is wrong
Dependability
• State Transition Failure:
Performance
the server deviates from
the correct flow Scalability
of control

Three designs of replication:


➜ Explicit replication: The client
explicitly writes files to multiple
servers (not transparent).
➜ Lazy file replication: Server
automatically copies files to other
servers after file is written.
➜ Group file replication: WRITEs
simultaneously go to a group of
servers.

Read about case studies:


➜ Network File System (NFS)

➜ Andrew File System (AFS) & Coda

➜ Google File System (GFS)

Common questions

Powered by AI

Consistency models in distributed systems set the rules for data visibility and access, directly impacting data reliability and accessibility . Strict consistency models ensure that every read receives the most recent write, enhancing data reliability at the cost of accessibility due to the potential for increased latency . Sequential and eventual consistency models relax these rules, allowing systems to be more accessible and scalable by providing eventual data convergence . However, this can initially lead to divergent views at different nodes, affecting immediate data accuracy and might require reconciliation efforts . Thus, choosing a consistency model requires balancing between immediate data reliability and long-term accessibility .

The separation of policy and mechanism allows distributed systems to be more flexible by decoupling the decision-making processes (policies) from the implementation tasks (mechanisms). This separation enables the system to apply different policies without altering the underlying mechanisms, facilitating easy adaptation and reconfiguration . It also permits the creation of standardized interfaces, enhancing interoperability and extensibility . As such, system components can focus on executing mechanisms while varying policies to cater to specific needs, thus supporting a versatile and adaptive system architecture .

Improving scalability in distributed systems often involves decentralization, distribution, and replication to handle growing amounts of data and control . While these approaches help in handling increased loads, they can lead to performance issues such as increased complexity in data consistency and higher communication overhead . For instance, replication improves data availability but can result in data inconsistency which requires additional resources for synchronization . Therefore, these trade-offs involve balancing scalability improvements with the overhead costs and potential degradation of system performance .

Weak code mobility involves transferring only the code, leaving the execution state behind, which simplifies communication but limits execution context carrying . This means the code can be moved to another node, but it must recompute or reacquire any required state . In contrast, strong code mobility includes transferring both the code and its execution state, allowing execution to resume seamlessly at the destination . While this approach provides more context and continuity in execution, it demands more sophisticated support systems to manage the transfer of execution states, making implementations more complex and resource-intensive .

The main challenges of transparency in distributed systems include hiding the complexity and distribution of resources, ensuring location transparency, and abstracting the underlying heterogeneous components . These challenges can impact system design by necessitating middleware solutions that provide a system-independent interface and hide heterogeneity . This leads to enhanced extensibility and interoperability, allowing the system to adapt to changes and seamlessly integrate various components . Transparency challenges thus influence the choice of architecture and middleware to ensure an intuitive and cohesive user experience despite the complex back-end operations .

Multicast communication enhances fault tolerance in distributed systems by allowing messages to be sent to multiple receivers simultaneously, supporting efficient propagation for replication and redundancy . This capability is advantageous for service discovery, event notification, and replicating data or services without overloading the network with multiple unicast messages . However, multicast communication can introduce complexities in ensuring message ordering and reliability, especially concerning constraint fulfillment like FIFO or causal ordering . Moreover, network-level multicast requires direct support from network infrastructures, which can limit deployment flexibility and increase the dependency on network topology . As such, while multicast provides scalable and efficient fault-tolerant communication, it requires careful handling of order and reliability constraints .

The Network Time Protocol (NTP) affects synchronization by allowing distributed systems to align clocks across different nodes, ensuring accurate timekeeping . Among the NTP approaches, the symmetric mode provides the highest accuracy because it involves peer-to-peer interactions that account for variable network delays, leading to more precise time adjustments compared to multicast mode for LAN or client polling methods . This improves the coordination of time-sensitive operations and reduces potential errors due to clock drift in distributed networks .

Data-oriented communication in distributed systems focuses on data exchange, leveraging shared memory or message passing, which facilitates high data throughput and efficient data sharing between threads . It enhances performance in data-intensive tasks due to reduced overhead in data transmission. On the other hand, control-oriented communication integrates control transfers with data communication, like in Remote Procedure Calls (RPCs) and Remote Method Invocation (RMI), supporting complex interaction patterns and synchronization between distributed components . While data-oriented models optimize for data transfer efficiency and simplicity, control-oriented models provide enhanced functionality and coordination capabilities, though at the potential cost of performance overhead due to increased control messaging .

Fault tolerance techniques play a crucial role in maintaining dependability by ensuring systems continue to operate correctly even in the presence of faults . Techniques such as prevention, prediction, masking, and recovery can prevent system failures, predict potential faults to preemptively mitigate them, hide faults from users to maintain operational consistency, and restore systems to error-free states after faults occur . These methods help achieve high availability, reliability, safety, and maintainability, which are essential attributes of dependable distributed systems . Fault-tolerant systems are designed to handle transient, intermittent, and permanent faults, thereby ensuring continuous service provision and reducing downtime .

Remote Procedure Calls (RPCs) enhance communication in distributed systems by abstracting the complexities of message-passing methods, allowing procedures to be executed on remote machines as if they were local calls . RPCs hide the details of networking and data transmission, making it easier for developers to focus on higher-level application logic . They encapsulate communication details within a 'stub,' which automatically handles messaging, marshaling, and unmarshaling of data between the client and server . This paradigm shift from an I/O oriented model to an execution-based model simplifies the programming of distributed applications, improves code readability, and maintains consistency across networked components .

You might also like