Key Concepts in Distributed Systems
Key Concepts in Distributed Systems
In synchronous executions, operations across distributed systems occur simultaneously according to a global clock, ensuring that events are predictable and easier to debug. An example is a tightly-coupled parallel computation using synchronized steps. Asynchronous executions do not rely on global synchronization, allowing processes to run at their own pace based on message passing, which can lead to higher efficiency and fault tolerance but requires complex handling of event ordering. An example is an asynchronous web server handling multiple client requests independently .
Message-passing systems involve communication between processes through explicit messages, leading to a clear separation between processes and better scalability but require complex protocols for communication. Shared memory systems allow direct access to a common memory space, providing easier programming and faster data access, but suffer from challenges in maintaining coherency and are less scalable. Message-passing is advantageous in distributed environments where direct memory access across nodes is impractical, while shared memory is more suited for tightly-coupled systems .
Lamport's algorithm achieves mutual exclusion by using timestamped requests that are broadcast to all other processes in the system. Each process maintains a priority queue of requests and grants access to the critical section based on these timestamps. This ensures that requests are handled in a fair order, preventing conflicts but also potentially increasing system latency due to communication overhead and the need for processes to wait for replies from all others, impacting the overall performance .
Cloud computing is characterized by on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. It differs from traditional IT infrastructure by offering users the ability to quickly scale resources up or down, access resources over the internet from any location, and pay based on usage. Additionally, cloud computing supports multi-tenancy, allowing multiple users to share the underlying resources efficiently .
Checkpointing and rollback recovery involve periodically saving the state of a system to enable rollback to a safe point post-failure. Coordinated checkpointing involves synchronizing all processes to create consistent global states, simplifying recovery by preventing inconsistencies caused by partial checkpointing. Asynchronous methods allow checkpoints without explicit synchronization, improving performance by reducing waiting times but can result in increased complexity during recovery due to potential rollback dependencies and inconsistencies in recorded states .
Virtualization enables cloud computing by allowing multiple virtual machines to run on a single physical machine, providing scalable and isolated environments. It enhances scalability by allowing resources to be dynamically allocated as demand fluctuates, and facilitates efficient load balancing by distributing workloads across multiple virtualized instances. This leads to optimized resource utilization and improved system performance, enabling the cloud to accommodate varying loads without hardware limitations .
In synchronous systems, consensus and agreement algorithms leverage a known upper bound on message delivery times and processing delays, simplifying fault detection and decision-making. Failures are more straightforward to handle since all processes are expected to acknowledge or time-out simultaneously. In asynchronous systems, there is no global timing, making it difficult to distinguish between a failed process and a slow one, thus requiring more complex mechanisms like timeout-based heuristics or hypothetical rounds of agreement that can be prolonged over indefinite time, complicating failure handling .
Design issues and challenges in distributed systems include handling concurrency, ensuring the system's reliability and availability despite failures, maintaining security across networked components, and achieving efficient communication and synchronization between distributed processes. Unlike centralized systems where these aspects are managed within a single system unit, distributed systems must address these issues over a network of interconnected devices, introducing complexities like network latency, partitioning, and independent failures .
Logical clocks are used in distributed systems to order events and maintain a consistent state across different nodes without relying on precise time synchronization. Scalar time, such as Lamport timestamps, is simple and provides an overall ordering of events, but may not capture causality relationships. Vector time enhances this by using vector timestamps that allow tracking causality between events, thereby offering a more accurate representation of the order of operations in a distributed system .
The Chandy-Lamport Snapshot algorithm captures a global state of a distributed system without affecting its operation, which is essential for consistency checks and debugging. It involves initiating a snapshot by a process, recording its local state, and sending marker messages through each outgoing channel to inform other processes to record their states upon receipt. This enables the system to construct a comprehensive view of its state at a specific moment, consistent with the causal dependencies of the distributed system .