0% found this document useful (0 votes)

5 views6 pages

Understanding Hardware Synchronization

The document discusses hardware synchronization in concurrent programming, emphasizing the importance of avoiding data races through synchronization mechanisms like locks and semaphores. It also covers shared memory and cache in multi-core systems, highlighting the role of cache coherence protocols, such as MOESI, in maintaining data consistency. Additionally, it addresses issues like false sharing and suggests strategies to mitigate it by adjusting cache line sizes and configurations.

Uploaded by

mashhood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Understanding Hardware Synchronization

Uploaded by

mashhood

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Hardware Synchronization: Lecture 15.

Each thread requires a single core.

A single to have multiple thread means it must have time sharing

A data race occurs when two or more threads in a concurrent program access the
same memory location concurrently, where at least one access is a write
operation. The result of the program is then non-deterministic and unpredictable, as
the order in which the threads execute their operations is not guaranteed.

For example, suppose there are two threads, T1 and T2, both accessing the same
memory location X, and T1 writes a value to X while T2 reads from X. If the
operations of T1 and T2 occur in an interleaved manner, such that T2 reads X before
T1 writes to it or vice versa, the program's output may vary depending on the order of
execution of the threads. This is a data race.
To avoid data races and ensure deterministic behavior, concurrent programs use
synchronization mechanisms. Synchronization is typically done through user-level
routines that rely on hardware synchronization instructions provided by the
processor. These instructions ensure that multiple threads access the same
memory location atomically and in a defined order.

For example, in the case of the two threads T1 and T2 accessing X, a

synchronization mechanism such as a lock or a semaphore can be used to ensure
that only one thread at a time accesses X. This ensures that the order of

execution of the threads is well-defined, and the program's output is deterministic and
predictable.

In summary, avoiding data races is essential in concurrent programming to ensure

deterministic and predictable behavior. This is achieved through synchronization
mechanisms that rely on hardware instructions to ensure atomic and ordered access to
shared memory locations.

Shared Memory and Caches

Shared memory and cache are important concepts in multi-core and multi-processor systems, which
can significantly impact performance.

In a multi-core system, each core has its own cache, which is a small amount of memory that stores
frequently accessed data. When a thread or process accesses data in shared memory, it is stored in
the cache of the core that made the request. If another core requests the same data, it can access it
from the cache of the first core, rather than accessing the shared memory directly. This can greatly
improve performance, as accessing data from the cache is much faster than accessing it from shared
memory.

However, this caching data consistency mechanism can also introduce issues with. If one core writes
to a shared memory location, the other cores may not immediately see the updated value in their
cache, as they may still be using an older cached value. This can lead to race conditions and other
synchronization issues. To avoid this, multi-core and multi-processor systems typically use hardware
mechanisms such as cache coherence protocols to ensure that all caches have consistent data.

In a multi-processor system, shared memory can also refer to a physically shared memory pool that
can be accessed by all processors. This type of system typically requires more sophisticated cache
coherence protocols to ensure that all processors have a consistent view of memory.

Overall, shared memory and cache are important concepts in multi-core and multi-processor systems
that can greatly impact performance. Caching can improve performance by reducing the need to
access shared memory, but it can also introduce issues with data consistency that must be carefully
managed through cache coherence protocols.

Q1 – How do they share data?

Multiprocessor systems share data through shared memory, which is a memory pool accessible by all
the processors. Each processor can read and write data to this shared memory pool, allowing for
interprocessor communication and synchronization.

Q2 – How do they coordinate?

Multiprocessor systems use various methods to coordinate their activities, including message passing
and hardware synchronization mechanisms. Message passing involves passing messages between
processes or threads, typically using a communication channel or mailbox. Hardware synchronization
mechanisms, such as locks, semaphores, and atomic operations, can be used to synchronize access
to shared resources, ensuring that only one processor accesses a shared resource at a time.

Q3 – How many processors can be supported?

The number of processors that can be supported in a multiprocessor system depends on several
factors, including the hardware architecture and the operating system used. In theory, modern
multiprocessor systems can support hundreds or even thousands of processors, but the practical
limits may be much lower due to factors such as memory bandwidth, cache coherence, and
synchronization overhead. The number of processors that can be supported also depends on the
application workload and the degree of parallelism that can be achieved. In general, the more
parallelizable the workload, the more processors can be effectively utilized.
Common Cache Coherency Protocol:

The MOESI protocol is a cache coherence protocol used in shared-memory multiprocessor systems.
It is an extension of the MSI protocol, which stands for Modified, Shared, and Invalid. The MOESI
protocol adds an additional state called Exclusive to improve performance.

In the MOESI protocol, each cache line in a processor's cache can be in one of five states:

- Modified (M): Indicates that the cache line has been modified locally and is not consistent with the
memory.

- Owned (O): Indicates that the cache line has been modified locally and is consistent with the
memory. Other processors may still have the same cache line in the Shared state.

This means that other processors have a copy of the same data in their caches, but their copies may
not be up-to-date with the latest modifications made by the processor in the Owned state.

- Exclusive (E): Indicates that the cache line is clean and only exists in the local cache. Other
processors do not have a copy of the cache line.

- Shared (S): Indicates that the cache line is clean and may exist in other caches as well. Multiple
processors may have the same cache line in the Shared state.

- Invalid (I): Indicates that the cache line is not valid and cannot be used.

Overall, the MOESI protocol is a cache coherence protocol that enables multiple processors to access
and modify shared memory in a coordinated manner by maintaining a consistent view of memory
across all caches in the system.

Let's say that Cache A has a copy of a cache line that contains some data, and caches B, C, and D also
have copies of the same cache line. At this point, all caches have the cache line in the Shared (S)
state.
If Cache A modifies the data in its copy of the cache line, it updates the cache line to the Modified
(M) state, indicating that the cache line has been modified locally and is not consistent with the
memory.

Now, if Cache B requests the same cache line while it is in the Shared state, it must first invalidate its
copy of the cache line, causing it to transition to the Invalid (I) state. This ensures that Cache B does
not have an outdated copy of the cache line in its cache.

If Cache C or D requests the same cache line while it is in the Shared state, the same process occurs,
and those caches also invalidate their copies of the cache line.

If Cache B, C, or D requests the same cache line while it is in the Modified state, the requesting cache
must first invalidate its copy of the cache line, causing it to transition to the Invalid state. Cache A
then sends the most up-to-date version of the cache line to the requesting cache, causing it to
transition to the Shared state.

If Cache B, C, or D requests the same cache line in the Exclusive (E) state, the requesting cache must
first invalidate its copy of the cache line, causing it to transition to the Invalid state. Cache A then
sends the most up-to-date version of the cache line to the requesting cache, causing it to transition
to the Exclusive state.

If Cache B, C, or D requests the same cache line in the Owned (O) state, the requesting cache can
keep its copy of the cache line in the Owned state since it is already consistent with the memory.
Cache A sends the cache line to the requesting cache, and the cache line remains in the Owned state.

This process ensures that all caches in the system have a consistent view of the shared memory by
using messages to maintain coherence between caches. The MOESI protocol enables multiple
processors to access and modify shared memory in a coordinated manner while maintaining a
consistent view of memory across all caches in the system.

problem of maintaining consistency between multiple

copies of the same memory block stored in different
caches.
False sharing is a common effect of cache coherence that occurs when multiple processors access
different variables that are located in the same cache block, causing the block to be transferred
back and forth between the caches unnecessarily.

For example, let's consider three processors, P0, P1, and P2, with a shared memory system. P0 is
writing to variable X, located at memory address 4000, and P1 is writing to variable Y, located at
memory address 4012. Suppose the block size is 32 bytes and the cache line containing X also
contains Y.
Initially, P1 and P2 read the cache line containing X and Y from memory into their respective caches.
Meanwhile, P0 writes a new value to X in its cache and invalidates all other copies of the cache line in
other caches.

When P1 tries to access Y, it discovers that the cache line containing Y is invalid due to the
invalidation caused by P0's write to X. Therefore, P1 has to fetch the cache line containing Y from
memory again, even though it didn't need to access X at all. Similarly, when P2 tries to access X, it
has to fetch the cache line from memory again, even though it didn't need to access Y at all. This
constant transfer of the same cache block between the caches is called false sharing.

To prevent false sharing, we can apply the 3Cs approach:

1. Compulsory: Increase the block size so that each cache line can hold more variables. This reduces
the likelihood of false sharing but may increase the miss penalty.

2. Capacity: Increase the cache size to reduce the number of cache misses. This may increase the
access time.

3. Conflict: Increase the associativity or improve the replacement policy to reduce the likelihood of
multiple memory locations mapping to the same cache location. This may increase the access time as
well.

For example, if we increase the cache line size to 64 bytes, X and Y will be in separate cache lines,
and false sharing will no longer occur. Alternatively, if we increase the cache size or the
associativity, we can reduce the likelihood of multiple memory locations mapping to the same
cache location and reduce the occurrence of false sharing. However, these solutions may increase
the access time, so it's important to strike a balance between cache size, associativity, and access
time.

Understanding Thread Level Parallelism
No ratings yet
Understanding Thread Level Parallelism
41 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Cache Coherence
No ratings yet
Cache Coherence
5 pages
Cache Coherence Protocols Explained
No ratings yet
Cache Coherence Protocols Explained
6 pages
Multiprocessor Architecture Overview
No ratings yet
Multiprocessor Architecture Overview
11 pages
IJARCCE-46 Cachemesiwithverilog
No ratings yet
IJARCCE-46 Cachemesiwithverilog
5 pages
CPU Architecture Assignment 3 Solutions
No ratings yet
CPU Architecture Assignment 3 Solutions
9 pages
Memory Consistency and Cache Coherence
No ratings yet
Memory Consistency and Cache Coherence
21 pages
Cache Coherence
No ratings yet
Cache Coherence
3 pages
Cache Coherence in Multiprocessor Systems
No ratings yet
Cache Coherence in Multiprocessor Systems
3 pages
Memory Buffering Techniques in Switches
No ratings yet
Memory Buffering Techniques in Switches
25 pages
Cache Coherence in Multi-Core Processors
No ratings yet
Cache Coherence in Multi-Core Processors
12 pages
Understanding Cache Coherence in Multiprocessors
No ratings yet
Understanding Cache Coherence in Multiprocessors
11 pages
Cache Coherence in Multiprocessors
No ratings yet
Cache Coherence in Multiprocessors
13 pages
Pentium Cache MESI Protocol Case Study
No ratings yet
Pentium Cache MESI Protocol Case Study
4 pages
MESI Protocol for FPGA Multicore Processors
No ratings yet
MESI Protocol for FPGA Multicore Processors
10 pages
MIMD Multiprocessors and Cache Coherence
No ratings yet
MIMD Multiprocessors and Cache Coherence
4 pages
Understanding Cache Coherence Mechanisms
No ratings yet
Understanding Cache Coherence Mechanisms
4 pages
Cache Coherence in MIMD Architectures
No ratings yet
Cache Coherence in MIMD Architectures
20 pages
Understanding Parallel Processing Types
No ratings yet
Understanding Parallel Processing Types
20 pages
Cache Coherence Protocols Explained
No ratings yet
Cache Coherence Protocols Explained
13 pages
Cache Coherence in Multiprocessor Systems
No ratings yet
Cache Coherence in Multiprocessor Systems
3 pages
Understanding Cache Coherence Issues
No ratings yet
Understanding Cache Coherence Issues
4 pages
Mainak Sir Coherence
No ratings yet
Mainak Sir Coherence
35 pages
Cache Memory Design Challenges
No ratings yet
Cache Memory Design Challenges
7 pages
Understanding Cache Memories in Computing
No ratings yet
Understanding Cache Memories in Computing
5 pages
Symmetric Shared-Memory Architectures
No ratings yet
Symmetric Shared-Memory Architectures
16 pages
Cache Coherence and Synchronization in Parallel Computing
No ratings yet
Cache Coherence and Synchronization in Parallel Computing
16 pages
Cache Coherence Protocols Overview
No ratings yet
Cache Coherence Protocols Overview
5 pages
Cache Coherence 1
No ratings yet
Cache Coherence 1
24 pages
Cache Coherent Interconnect Network Design
No ratings yet
Cache Coherent Interconnect Network Design
15 pages
Understanding Cache Coherence Protocols
No ratings yet
Understanding Cache Coherence Protocols
8 pages
Cache Coherence in Multiprocessor Systems
No ratings yet
Cache Coherence in Multiprocessor Systems
14 pages
Shared-Memory Architecture Overview
No ratings yet
Shared-Memory Architecture Overview
33 pages
Understanding Symmetric Shared Memory
No ratings yet
Understanding Symmetric Shared Memory
12 pages
Parallel Computing: Need & Concepts
No ratings yet
Parallel Computing: Need & Concepts
71 pages
Understanding the MESI Cache Coherence Protocol
No ratings yet
Understanding the MESI Cache Coherence Protocol
9 pages
MESI Protocol in SMP Cache Coherence
No ratings yet
MESI Protocol in SMP Cache Coherence
15 pages
Thread-Level Parallelism in Multiprocessors
No ratings yet
Thread-Level Parallelism in Multiprocessors
74 pages
Module4 Thattykoooot
No ratings yet
Module4 Thattykoooot
23 pages
MESI Protocol in Cache Coherence
No ratings yet
MESI Protocol in Cache Coherence
33 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Centralized Shared Memory Architecture
No ratings yet
Centralized Shared Memory Architecture
31 pages
Overview of the MESI Cache Protocol
100% (1)
Overview of the MESI Cache Protocol
4 pages
Shared vs Distributed Memory Architectures
No ratings yet
Shared vs Distributed Memory Architectures
33 pages
Cache Coherence in Multiprocessor Systems
No ratings yet
Cache Coherence in Multiprocessor Systems
19 pages
Snoopy Bus Architecture Overview
No ratings yet
Snoopy Bus Architecture Overview
54 pages
Ieee Wsi95
No ratings yet
Ieee Wsi95
10 pages
MESI Cache Coherence Protocol Explained
No ratings yet
MESI Cache Coherence Protocol Explained
9 pages
Parallel Processors
No ratings yet
Parallel Processors
19 pages
Cache Coherence in Multiprocessors
No ratings yet
Cache Coherence in Multiprocessors
37 pages
Shared vs Distributed Memory Architectures
79% (19)
Shared vs Distributed Memory Architectures
29 pages
Cache Coherence and Sequential Consistency
No ratings yet
Cache Coherence and Sequential Consistency
29 pages
Understanding Cache Coherency in Multiprocessors
No ratings yet
Understanding Cache Coherency in Multiprocessors
15 pages
Snooping Cache Protocols Explained
No ratings yet
Snooping Cache Protocols Explained
59 pages
FPGA Cache Coherence Protocols for Dual CPUs
No ratings yet
FPGA Cache Coherence Protocols for Dual CPUs
11 pages
Cache Coherence Protocols Overview
No ratings yet
Cache Coherence Protocols Overview
4 pages
CH20 COA11e
No ratings yet
CH20 COA11e
40 pages
SMP and Cache Coherence Overview
No ratings yet
SMP and Cache Coherence Overview
34 pages
Edexcel IAL Physics Syllabus Overview
100% (1)
Edexcel IAL Physics Syllabus Overview
25 pages
Pipelined Processor Design and Hazards
No ratings yet
Pipelined Processor Design and Hazards
21 pages
XV6 File System and Disk Emulation Overview
No ratings yet
XV6 File System and Disk Emulation Overview
25 pages
Understanding Memory Cache Performance
No ratings yet
Understanding Memory Cache Performance
2 pages
Understanding Limited Direct Execution
No ratings yet
Understanding Limited Direct Execution
19 pages
Self-Organizing Teams in Agile Development
No ratings yet
Self-Organizing Teams in Agile Development
2 pages
Employee-Department Relationship Model
No ratings yet
Employee-Department Relationship Model
5 pages
Key Phases of Software Development Process
No ratings yet
Key Phases of Software Development Process
8 pages
PL/SQL Environment Overview and Structure
No ratings yet
PL/SQL Environment Overview and Structure
10 pages
RNCCS11640 1
No ratings yet
RNCCS11640 1
501 pages
CS501 Midterm Exam Solutions and References
No ratings yet
CS501 Midterm Exam Solutions and References
7 pages
32-Bit RISC Microprocessor Design
No ratings yet
32-Bit RISC Microprocessor Design
8 pages
Integrating NIC with OS for Performance
No ratings yet
Integrating NIC with OS for Performance
6 pages
Microcontroller Arithmetic & Data Transfer Instructions
No ratings yet
Microcontroller Arithmetic & Data Transfer Instructions
13 pages
Microprocessors and Microcontrollers Guide
No ratings yet
Microprocessors and Microcontrollers Guide
3 pages
SGGS B.Tech CSE Second Year Syllabus
No ratings yet
SGGS B.Tech CSE Second Year Syllabus
12 pages
Microprocessor Types: RISC vs CISC Explained
No ratings yet
Microprocessor Types: RISC vs CISC Explained
4 pages
3V0-21.23 VMware VSphere 8x Advanced Design Training Course Study Guide
No ratings yet
3V0-21.23 VMware VSphere 8x Advanced Design Training Course Study Guide
29 pages
Computer Architecture Classifications
No ratings yet
Computer Architecture Classifications
22 pages
RISC-V: Open Computing Revolution
No ratings yet
RISC-V: Open Computing Revolution
30 pages
Lenovo Internship Report Overview
No ratings yet
Lenovo Internship Report Overview
19 pages
Deep Learning Training On Multi-Instance GPUs
No ratings yet
Deep Learning Training On Multi-Instance GPUs
13 pages
Understanding Pipelining in Computing
No ratings yet
Understanding Pipelining in Computing
21 pages
Introduction to Computer Basics
No ratings yet
Introduction to Computer Basics
53 pages
Inside the Computer System Unit
No ratings yet
Inside the Computer System Unit
31 pages
Melhor Placa de Vídeo Custo-Benefício 2025
No ratings yet
Melhor Placa de Vídeo Custo-Benefício 2025
78 pages
IT Concepts and Communication Syllabus
No ratings yet
IT Concepts and Communication Syllabus
7 pages
PLC Components and Working Overview
No ratings yet
PLC Components and Working Overview
8 pages
Computer Packages Training Notes
No ratings yet
Computer Packages Training Notes
150 pages
Computer Architecture and Multicore Design
No ratings yet
Computer Architecture and Multicore Design
23 pages
Data-Level Parallelism in SIMD Architectures
No ratings yet
Data-Level Parallelism in SIMD Architectures
92 pages
MP-2 Question Paper
No ratings yet
MP-2 Question Paper
3 pages
Overview of Operating Systems History
No ratings yet
Overview of Operating Systems History
6 pages
CS232 Midterm Exam 2 Overview
No ratings yet
CS232 Midterm Exam 2 Overview
8 pages
Overview of Operating System Functions
No ratings yet
Overview of Operating System Functions
31 pages
ARM Cortex M3 Processor Overview
No ratings yet
ARM Cortex M3 Processor Overview
2 pages
IMD163 PC Development Documentation
No ratings yet
IMD163 PC Development Documentation
30 pages
Virtualization in Cloud Computing Basics
No ratings yet
Virtualization in Cloud Computing Basics
113 pages

Understanding Hardware Synchronization

Uploaded by

Understanding Hardware Synchronization

Uploaded by

Hardware Synchronization: Lecture 15.

Each thread requires a single core.

A single to have multiple thread means it must have time sharing

For example, in the case of the two threads T1 and T2 accessing X, a

In summary, avoiding data races is essential in concurrent programming to ensure

Shared Memory and Caches

Q1 – How do they share data?

Q2 – How do they coordinate?

Q3 – How many processors can be supported?

problem of maintaining consistency between multiple

To prevent false sharing, we can apply the 3Cs approach:

You might also like