Hardware Synchronization: Lecture 15.
Each thread requires a single core.
A single to have multiple thread means it must have time sharing
A data race occurs when two or more threads in a concurrent program access the
same memory location concurrently, where at least one access is a write
operation. The result of the program is then non-deterministic and unpredictable, as
the order in which the threads execute their operations is not guaranteed.
For example, suppose there are two threads, T1 and T2, both accessing the same
memory location X, and T1 writes a value to X while T2 reads from X. If the
operations of T1 and T2 occur in an interleaved manner, such that T2 reads X before
T1 writes to it or vice versa, the program's output may vary depending on the order of
execution of the threads. This is a data race.
To avoid data races and ensure deterministic behavior, concurrent programs use
synchronization mechanisms. Synchronization is typically done through user-level
routines that rely on hardware synchronization instructions provided by the
processor. These instructions ensure that multiple threads access the same
memory location atomically and in a defined order.
For example, in the case of the two threads T1 and T2 accessing X, a
synchronization mechanism such as a lock or a semaphore can be used to ensure
that only one thread at a time accesses X. This ensures that the order of
execution of the threads is well-defined, and the program's output is deterministic and
predictable.
In summary, avoiding data races is essential in concurrent programming to ensure
deterministic and predictable behavior. This is achieved through synchronization
mechanisms that rely on hardware instructions to ensure atomic and ordered access to
shared memory locations.
Shared Memory and Caches
Shared memory and cache are important concepts in multi-core and multi-processor systems, which
can significantly impact performance.
In a multi-core system, each core has its own cache, which is a small amount of memory that stores
frequently accessed data. When a thread or process accesses data in shared memory, it is stored in
the cache of the core that made the request. If another core requests the same data, it can access it
from the cache of the first core, rather than accessing the shared memory directly. This can greatly
improve performance, as accessing data from the cache is much faster than accessing it from shared
memory.
However, this caching data consistency mechanism can also introduce issues with. If one core writes
to a shared memory location, the other cores may not immediately see the updated value in their
cache, as they may still be using an older cached value. This can lead to race conditions and other
synchronization issues. To avoid this, multi-core and multi-processor systems typically use hardware
mechanisms such as cache coherence protocols to ensure that all caches have consistent data.
In a multi-processor system, shared memory can also refer to a physically shared memory pool that
can be accessed by all processors. This type of system typically requires more sophisticated cache
coherence protocols to ensure that all processors have a consistent view of memory.
Overall, shared memory and cache are important concepts in multi-core and multi-processor systems
that can greatly impact performance. Caching can improve performance by reducing the need to
access shared memory, but it can also introduce issues with data consistency that must be carefully
managed through cache coherence protocols.
Q1 – How do they share data?
Multiprocessor systems share data through shared memory, which is a memory pool accessible by all
the processors. Each processor can read and write data to this shared memory pool, allowing for
interprocessor communication and synchronization.
Q2 – How do they coordinate?
Multiprocessor systems use various methods to coordinate their activities, including message passing
and hardware synchronization mechanisms. Message passing involves passing messages between
processes or threads, typically using a communication channel or mailbox. Hardware synchronization
mechanisms, such as locks, semaphores, and atomic operations, can be used to synchronize access
to shared resources, ensuring that only one processor accesses a shared resource at a time.
Q3 – How many processors can be supported?
The number of processors that can be supported in a multiprocessor system depends on several
factors, including the hardware architecture and the operating system used. In theory, modern
multiprocessor systems can support hundreds or even thousands of processors, but the practical
limits may be much lower due to factors such as memory bandwidth, cache coherence, and
synchronization overhead. The number of processors that can be supported also depends on the
application workload and the degree of parallelism that can be achieved. In general, the more
parallelizable the workload, the more processors can be effectively utilized.
Common Cache Coherency Protocol:
The MOESI protocol is a cache coherence protocol used in shared-memory multiprocessor systems.
It is an extension of the MSI protocol, which stands for Modified, Shared, and Invalid. The MOESI
protocol adds an additional state called Exclusive to improve performance.
In the MOESI protocol, each cache line in a processor's cache can be in one of five states:
- Modified (M): Indicates that the cache line has been modified locally and is not consistent with the
memory.
- Owned (O): Indicates that the cache line has been modified locally and is consistent with the
memory. Other processors may still have the same cache line in the Shared state.
This means that other processors have a copy of the same data in their caches, but their copies may
not be up-to-date with the latest modifications made by the processor in the Owned state.
- Exclusive (E): Indicates that the cache line is clean and only exists in the local cache. Other
processors do not have a copy of the cache line.
- Shared (S): Indicates that the cache line is clean and may exist in other caches as well. Multiple
processors may have the same cache line in the Shared state.
- Invalid (I): Indicates that the cache line is not valid and cannot be used.
Overall, the MOESI protocol is a cache coherence protocol that enables multiple processors to access
and modify shared memory in a coordinated manner by maintaining a consistent view of memory
across all caches in the system.
Let's say that Cache A has a copy of a cache line that contains some data, and caches B, C, and D also
have copies of the same cache line. At this point, all caches have the cache line in the Shared (S)
state.
If Cache A modifies the data in its copy of the cache line, it updates the cache line to the Modified
(M) state, indicating that the cache line has been modified locally and is not consistent with the
memory.
Now, if Cache B requests the same cache line while it is in the Shared state, it must first invalidate its
copy of the cache line, causing it to transition to the Invalid (I) state. This ensures that Cache B does
not have an outdated copy of the cache line in its cache.
If Cache C or D requests the same cache line while it is in the Shared state, the same process occurs,
and those caches also invalidate their copies of the cache line.
If Cache B, C, or D requests the same cache line while it is in the Modified state, the requesting cache
must first invalidate its copy of the cache line, causing it to transition to the Invalid state. Cache A
then sends the most up-to-date version of the cache line to the requesting cache, causing it to
transition to the Shared state.
If Cache B, C, or D requests the same cache line in the Exclusive (E) state, the requesting cache must
first invalidate its copy of the cache line, causing it to transition to the Invalid state. Cache A then
sends the most up-to-date version of the cache line to the requesting cache, causing it to transition
to the Exclusive state.
If Cache B, C, or D requests the same cache line in the Owned (O) state, the requesting cache can
keep its copy of the cache line in the Owned state since it is already consistent with the memory.
Cache A sends the cache line to the requesting cache, and the cache line remains in the Owned state.
This process ensures that all caches in the system have a consistent view of the shared memory by
using messages to maintain coherence between caches. The MOESI protocol enables multiple
processors to access and modify shared memory in a coordinated manner while maintaining a
consistent view of memory across all caches in the system.
problem of maintaining consistency between multiple
copies of the same memory block stored in different
caches.
False sharing is a common effect of cache coherence that occurs when multiple processors access
different variables that are located in the same cache block, causing the block to be transferred
back and forth between the caches unnecessarily.
For example, let's consider three processors, P0, P1, and P2, with a shared memory system. P0 is
writing to variable X, located at memory address 4000, and P1 is writing to variable Y, located at
memory address 4012. Suppose the block size is 32 bytes and the cache line containing X also
contains Y.
Initially, P1 and P2 read the cache line containing X and Y from memory into their respective caches.
Meanwhile, P0 writes a new value to X in its cache and invalidates all other copies of the cache line in
other caches.
When P1 tries to access Y, it discovers that the cache line containing Y is invalid due to the
invalidation caused by P0's write to X. Therefore, P1 has to fetch the cache line containing Y from
memory again, even though it didn't need to access X at all. Similarly, when P2 tries to access X, it
has to fetch the cache line from memory again, even though it didn't need to access Y at all. This
constant transfer of the same cache block between the caches is called false sharing.
To prevent false sharing, we can apply the 3Cs approach:
1. Compulsory: Increase the block size so that each cache line can hold more variables. This reduces
the likelihood of false sharing but may increase the miss penalty.
2. Capacity: Increase the cache size to reduce the number of cache misses. This may increase the
access time.
3. Conflict: Increase the associativity or improve the replacement policy to reduce the likelihood of
multiple memory locations mapping to the same cache location. This may increase the access time as
well.
For example, if we increase the cache line size to 64 bytes, X and Y will be in separate cache lines,
and false sharing will no longer occur. Alternatively, if we increase the cache size or the
associativity, we can reduce the likelihood of multiple memory locations mapping to the same
cache location and reduce the occurrence of false sharing. However, these solutions may increase
the access time, so it's important to strike a balance between cache size, associativity, and access
time.