0% found this document useful (0 votes)
129 views3 pages

Overview of Shared Memory Systems

Shared memory systems connect processors to a global shared memory. Communication between processors occurs through reading and writing to shared memory. Performance can be impacted by contention when multiple processors access memory simultaneously. Cache coherency issues can also arise when copies of data in caches become inconsistent. Uniform memory access (UMA) systems provide equal access times to all memory for all processors. Non-uniform memory access (NUMA) systems attach local memory to each processor, resulting in non-uniform access times depending on data location.

Uploaded by

Pranav Kasliwal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views3 pages

Overview of Shared Memory Systems

Shared memory systems connect processors to a global shared memory. Communication between processors occurs through reading and writing to shared memory. Performance can be impacted by contention when multiple processors access memory simultaneously. Cache coherency issues can also arise when copies of data in caches become inconsistent. Uniform memory access (UMA) systems provide equal access times to all memory for all processors. Non-uniform memory access (NUMA) systems attach local memory to each processor, resulting in non-uniform access times depending on data location.

Uploaded by

Pranav Kasliwal
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Shared Memory Architecture

Figure 1: Shared memory systems.

 Shared memory systems form a major category of multiprocessors. In this category,


all processors share a global memory (See Fig. 1).
 Communication between tasks running on different processors is performed through
writing to and reading from the global memory.
 All interprocessor coordination and synchronization is also accomplished via the
global memory.
 Two main problems need to be addressed when designing a shared memory system:
1. performance degradation due to contention. Performance degradation might
happen when multiple processors are trying to access the shared memory
simultaneously. A typical design might use caches to solve the contention
problem.
2. coherence problems. Having multiple copies of data, spread throughout the
caches, might lead to a coherence problem. The copies in the caches are
coherent if they are all equal to the same value. However, if one of the
processors writes over the value of one of the copies, then the copy becomes
inconsistent because it no longer equals the value of the other copies.
 Scalability remains the main drawback of a shared memory system.

Classification of Shared Memory Systems

Figure 2: Shared memory via two ports.

 The simplest shared memory system consists of one memory module (M) that can be
accessed from two processors P1 and P2 (see Fig. 2).
o Requests arrive at the memory module through its two ports. An arbitration unit
within the memory module passes requests through to a memory controller.
o If the memory module is not busy and a single request arrives, then the arbitration
unit passes that request to the memory controller and the request is satisfied.
o The module is placed in the busy state while a request is being serviced. If a new
request arrives while the memory is busy servicing a previous request, the memory
module sends a wait signal, through the memory controller, to the processor making
the new request.
o In response, the requesting processor may hold its request on the line until the
memory becomes free or it may repeat its request some time later.
o If the arbitration unit receives two requests, it selects one of them and passes it to
the memory controller. Again, the denied request can be either held to be served
next or it may be repeated some time later.

Uniform Memory Access (UMA)

Figure 3: Bus-based UMA (SMP) shared memory system.

 In the UMA system a shared memory is accessible by all processors through an


interconnection network in the same way a single processor accesses its memory.
 All processors have equal access time to any memory location. The interconnection network
used in the UMA can be a single bus, multiple buses, or a crossbar switch.
 Because access to shared memory is balanced, these systems are also called SMP (symmetric
multiprocessor) systems. Each processor has equal opportunity to read/write to memory,
including equal access speed.
o A typical bus-structured SMP computer, as shown in Fig. 3, attempts to reduce
contention for the bus by fetching instructions and data directly from each individual
cache, as much as possible.
o In the extreme, the bus contention might be reduced to zero after the cache
memories are loaded from the global memory, because it is possible for all
instructions and data to be completely contained within the cache.
 This memory organization is the most popular among shared memory systems.
 Examples of this architecture are Sun Starfire servers, HP V series, and Compaq AlphaServer
GS, Silicon Graphics Inc. multiprocessor servers.

Nonuniform Memory Access (NUMA)

Figure 4: NUMA shared memory system.

 In the NUMA system, each processor has part of the shared memory attached (see Fig. 4).
 The memory has a single address space. Therefore, any processor could access any memory
location directly using its real address. However, the access time to modules depends on the
distance to the processor. This results in a nonuniform memory access time.
 A number of architectures are used to interconnect processors to memory modules in a
NUMA. Among these are the tree and the hierarchical bus networks.
 Examples of NUMA architecture are BBN TC-2000, SGI Origin 3000, and Cray T3E.

Common questions

Powered by AI

In a bus-based Uniform Memory Access (UMA) system, reducing contention is achieved by fetching instructions and data directly from each processor's cache as much as possible. By doing so, after loading cache memories from the global memory, the need to access the common bus decreases significantly, potentially reducing bus contention to zero. The primary advantage of this system is providing equal access time for all processors to any memory location, making it a balanced and symmetric (SMP) shared memory system .

NUMA (Nonuniform Memory Access) systems are distinct from UMA (Uniform Memory Access) systems primarily in the way memory is accessed. In NUMA architectures, each processor is attached to its own local memory, leading to varying memory access times depending on the proximity of a processor to the memory module. This architectural feature results in non-uniform access times but allows for greater scalability by dividing memory into processor-local segments. In contrast, UMA systems feature uniform access times since memory is a centralized component shared equally among all processors. This fundamental difference impacts how efficiently a system can scale and the strategies used to minimize access latency, especially in large-scale systems .

Uniform Memory Access (UMA) systems ensure that all processors have equal access time to any memory location, which leads to a balanced performance suitable for applications where uniform speed is critical. In contrast, Nonuniform Memory Access (NUMA) systems allow processor-specific memory, resulting in variable access times depending on the processor's distance to the memory module. While UMA provides consistent access times, NUMA systems can potentially offer better scalability and performance for large systems by reducing the reliance on centralized memory structures, albeit at the requirement of careful memory locality management to maintain performance .

NUMA architecture would outperform UMA in scenarios where applications can benefit from exploiting data locality and require scalable solutions. In applications where data can be partitioned to correspond to processor-local memory, such as large databases or high-performance computing tasks, a NUMA system can reduce memory access latency by localizing data access to processors' nearby memory modules. This reduces the bottleneck effect seen in UMA systems, making NUMA more suitable for environments requiring extensive parallel processing and high scalability, despite the complexity brought by variable memory access times and the need for sophisticated memory management .

In a shared memory system with two processors accessing a common memory module, the arbitration unit plays a critical role in managing simultaneous requests. When only one request arrives, the arbitration unit passes it to the memory controller immediately. However, if two requests arrive at the same time, the arbitration unit selects one to pass on while the other waits. This helps in orderly processing of requests, ensuring one processor's request does not indefinitely block the other's, thus managing access to the shared resource effectively .

In UMA systems, the type of interconnection network, such as a single bus, multiple buses, or a crossbar switch, significantly impacts the system's ability to balance memory access. A single bus may become a bottleneck under high demand, limiting the number of processors that can efficiently access memory concurrently. By contrast, multiple buses or a crossbar switch can improve concurrency by providing additional paths for data transfer, thus reducing contention. These configurations can more effectively manage increased traffic and provide balanced access times for multiple processors, enhancing overall system throughput .

Examples of systems using UMA architecture include Sun Starfire servers, HP V series, and Compaq AlphaServer GS systems. This architecture is favored in applications requiring balanced and equal memory access times for all processors, such as symmetric multiprocessing environments. It provides uniform memory access, which simplifies the development of parallel applications by ensuring predictable access speeds and uniform resource distribution .

While caches are often used in shared memory systems to mitigate contention by reducing direct memory access, they introduce potential coherence issues as a trade-off. Multiple cache copies of data can lead to inconsistency if one copy is updated while others are not. To resolve this, mechanisms such as cache coherence protocols are essential to maintain data consistency across different processor caches. These mechanisms, while solving the inconsistencies, can also introduce additional overhead and complexity into system design and operation, influencing overall system performance and efficiency .

The two main challenges in designing a shared memory system are performance degradation due to contention and coherence problems. Contention occurs when multiple processors attempt to access the shared memory simultaneously, leading to performance issues. This can be mitigated by implementing cache memory systems to handle simultaneous accesses more efficiently. Coherence problems arise when different processors have copies of the same data in their caches, leading to inconsistencies if one processor updates its copy. Ensuring coherence requires mechanisms like cache coherence protocols to maintain consistency across caches .

The scalability issue in shared memory systems arises due to the challenges in efficiently managing and coordinating access to shared resources as the number of processors increases. This challenge becomes a significant drawback because increased contention and coherence traffic can lead to bottlenecks, severely impacting system performance. The bus or interconnection network used to link processors to shared memory can become a limiting factor if it cannot accommodate the demand, which limits the system’s ability to scale effectively as more processors are added .

You might also like