0% found this document useful (0 votes)

4 views31 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

mallikagu506

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views31 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

mallikagu506

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module 4

Shared-memory programming with OpenMP

Shared-Memory Programming:
● In shared-memory programming, multiple threads (lightweight processes) run concurrently and share a common
address space.
● All threads can access global variables and memory, unlike distributed-memory programming (e.g., MPI) where each
process has its own memory.
● Synchronization is needed to avoid race conditions when threads access shared data simultaneously.

OpenMP:
● OpenMP (Open Multi-Processing) is an API (Application Programming Interface) that supports parallel
programming on shared-memory systems.
● It allows programmers to add directives (pragmas) in C, C++, and Fortran code to enable parallel execution.
Features of OpenMP

● Compiler directives (#pragma omp ... in C/C++).

● Runtime library routines (functions for thread management, timing, etc.).

● Environment variables (to control execution, e.g., number of threads).

OpenMP Execution Model

● Fork-Join Model:

○ The program starts with a single thread (master thread).

○ At a parallel region, additional worker threads are created ("fork").

○ All threads execute the parallel code.

○ At the end of the region, threads synchronize and only the master thread continues ("join").
Example OpenMP Program

Compiling and running OpenMP programs

$ gcc -fopenmp my_program.c -o my_program

Run the Executable

./my_program.exe

Output:

Hello from thread 0

Hello from thread 1

Hello from thread 2 ● #pragma omp parallel tells compiler to create multiple
threads.
Hello from thread 3
● Each thread executes the code inside the block.
● omp_get_thread_num() gives the thread ID.
The trapezoidal rule of OpenMP

● Assignment of trapezoids to threads.

#pragma omp critical
● A critical section is a block of code that must be executed by only one thread at a time.

● OpenMP provides the directive:

● When a thread enters the critical section, other threads wait until it is free.

● It prevents race conditions (when multiple threads try to update a shared variable at the same time).
Example Without Critical Section

● Multiple threads may update sum simultaneously →

wrong result.
Example Using #pragma omp critical

● Now, only one thread at a time modiﬁes sum, ensuring

correctness.

When to Use critical

● When multiple threads update a shared resource (e.g.,

global variable, ﬁle, array element).

● Avoids race conditions.

Alternative to critical

● reduction(+:variable) → faster for sums/products.

● #pragma omp atomic → lighter weight than critical for

single-variable updates.
Scope of Variables in OpenMP

When you parallelize code with OpenMP, variables can be

Example
shared among threads or private to each thread.

1. Shared Variables

● Default in OpenMP (unless speciﬁed otherwise).

● All threads see the same memory location.
● Updates by one thread are visible to all others → can
cause race conditions.
● Result is unpredictable due to concurrent writes
Example:
2. Private Variables

● Each thread gets its own copy of the variable.

● Copies are uninitialized unless speciﬁed.
● Changes are local to the thread and discarded after
the parallel region.

Output:
The reduction clause

● The reduction clause lets threads safely update a shared variable (like sum, product, min, max) without race conditions.

● Each thread gets a private copy of the variable, performs updates locally, and at the end, OpenMP combines (reduces)

results into the shared variable using the speciﬁed operator.

General Syntax

● op = operator (e.g., +, *, -, &, |, max, min)

● variable = the shared variable that is reduced

Example – Summation

Correct result = 55

● Without reduction, this would cause a race condition

because multiple threads update sum simultaneously.
OpenMP Directives

1. Parallel Region
2. Work-Sharing Constructs
3. Synchronization Directives
4. Data Scope Directives
5. Reduction
6. Tasking
1. Parallel Region:
● Marks a block of code to be executed by multiple threads.
Syntax:

2. Work-Sharing Constructs:
● Divide work among threads.

a) for (C/C++) / do (Fortran)

● Parallelize loop iterations.
Syntax:
b) sections
● Different threads execute different code sections.
Syntax:

d) master
● Executed only by the master thread (thread 0).
Syntax:

c) single
● Only one thread executes the block (others skip).
Syntax:
3. Synchronization Directives: Prevent race conditions.

c) barrier
a) critical
● All threads wait until everyone reaches this point.
● Only one thread executes this block at a time.
Syntax:
Syntax:

b) atomic d) ordered
● Lightweight protection for simple updates. ● Forces loop iterations to execute in order.
Syntax: Syntax:
4. Data Scope Directives

- Control whether variables are shared or private.

● private(var) → Each thread gets its own copy.

● ﬁrstprivate(var) → Private, but initialized with master’s value.

● lastprivate(var) → Private, but last iteration’s value copied back.

● shared(var) → Variable is shared by all threads.

● default(shared|none) → Deﬁnes default sharing policy.

5. Reduction 6. Tasking
● Performs safe accumulation. ● Create independent tasks (not tied to loop iterations).

● single → restricts creation of tasks to one thread.

● task → allows execution of those tasks by any available

thread in the parallel team.
Data Dependence

● Data dependence occurs when one statement or loop iteration uses data that is produced, modiﬁed, or needed by
another statement or iteration.
● In parallel programming (like OpenMP), data dependence determines whether statements or loop iterations can be
executed independently and in parallel or must be executed sequentially.

2 types:

1. No Dependency (Parallelizable)

2. Loop-Carried Dependency (Not Parallelizable Directly)

1. No Dependency (Parallelizable)

● All iterations are independent.

2. Loop-Carried Dependency (Not Parallelizable Directly)

● One iteration depends on results from previous iterations.

● Cannot be parallelized directly because each step depends on the previous step.
Finding loop-carried dependencies

● A loop-carried dependence exists if an iteration of a loop depends on data from a previous iteration.

A loop has a dependence if there exist iterations i and j (i ≠ j) such that:

● Both iterations access the same memory location.

● At least one access is a write.

● The order of execution matters (i < j).

Steps to Find Loop-Carried Dependence

1. Identify memory accesses inside the loop (reads and writes).

2. Check if the same memory location is accessed in different iterations.

3. Check if one of those accesses is a write.

4. If yes → there’s a loop-carried dependence.

Example 1: Independent (No Dependence) Example 2: Loop-Carried Dependence

● Iteration i writes to A[i] only. ● Iteration i reads A[i-1] (produced in the previous
iteration).
● Each iteration touches a different memory location.
● Loop-carried dependence exists → not directly
● No loop-carried dependence → parallelizable. parallelizable.
Scheduling loops

● Scheduling loops refers to the method used by OpenMP to divide the iterations of a loop among multiple threads when

executing in parallel.

Types of Scheduling:

1. Static Scheduling

2. Dynamic Scheduling

3. Guided Scheduling

4. Runtime Scheduling
1. Static Scheduling

● Iterations are divided before execution.

● Each thread gets a ﬁxed chunk of iterations.

● Low overhead but can lead to load imbalance if work per iteration varies.

Example:

● Loop has 100 iterations, 4 threads, chunk=25 →

○ Thread 0: 0–24

○ Thread 1: 25–49

○ Thread 2: 50–74

○ Thread 3: 75–99
2. Dynamic Scheduling

● Iterations are assigned dynamically at runtime.

● When a thread ﬁnishes its chunk, it grabs the next available chunk.

● Good for irregular workloads but slightly more overhead.

Example:

● Loop has 100 iterations, 4 threads, chunk=10 →

○ Thread 0 takes 0–9,

○ when done, it grabs 40–49 (if free).

3. Guided Scheduling

● Threads start with large chunks that shrink exponentially.

● Ensures fast early load distribution and balanced later work.

● Useful for highly irregular workloads.

Example:

● First thread may get 50 iterations,

● then 25,

● then 12, … until chunk size is reached.

4. Runtime Scheduling

● Scheduling is decided at runtime using environment variable.

● With dynamic scheduling, it is uneven but balanced assignment.

● With static scheduling, threads get ﬁxed blocks of iterations.

4. Auto Scheduling

● The compiler/runtime system decides the best schedule.

● Leaves control to the OpenMP implementation.

Critical Sections and Locks
a) Locks
● When multiple threads run in parallel, they often need to access shared resources (like variables, arrays, files, or queues).
● More ﬂexible than critical.
● Use OpenMP lock routines:
omp_lock_t mylock; // critical work here

● Declares a lock variable. ● Place any code here that must be executed by only
● Think of it as a "door key" that threads use to control one thread at a time.
access. ● Example: updating a shared counter, modifying a
queue, writing to a ﬁle.
omp_init_lock(&mylock);
omp_unset_lock(&mylock);
● Initializes the lock.
● Must be done before any thread tries to use the lock. ● Releases the lock.
● Other threads waiting at omp_set_lock can now
omp_set_lock(&mylock);
acquire it and run their critical work.

● A thread tries to acquire the lock.

omp_destroy_lock(&mylock);
● If another thread already holds it, this thread waits
(blocks) until it becomes free. ● Cleans up the lock when it’s no longer needed.
● Once acquired, the thread can enter the critical section ● Should be called once, usually after the parallel region
safely. ends.
Example Without Lock: With Lock:

OUTPUT:

PC Module 4
No ratings yet
PC Module 4
31 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
40 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
91 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
OpenMP Basics and Parallel Processing
No ratings yet
OpenMP Basics and Parallel Processing
93 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
OpenMP Programming: A Comprehensive Guide
No ratings yet
OpenMP Programming: A Comprehensive Guide
61 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
40 pages
MTCS201 Unit4 Notes
No ratings yet
MTCS201 Unit4 Notes
10 pages
OpenMP Tutorial for Parallel Computing
No ratings yet
OpenMP Tutorial for Parallel Computing
23 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
123 pages
Programming Shared Memory With-OpenMP
No ratings yet
Programming Shared Memory With-OpenMP
48 pages
Introduction to OpenMP Basics
No ratings yet
Introduction to OpenMP Basics
152 pages
Chapter 5
No ratings yet
Chapter 5
92 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
25 pages
OpenMP: A Guide to Parallel Programming
No ratings yet
OpenMP: A Guide to Parallel Programming
52 pages
OpenMP Programming Model Overview
No ratings yet
OpenMP Programming Model Overview
89 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
35 pages
OpenMP Basics: Multi-Threading Explained
No ratings yet
OpenMP Basics: Multi-Threading Explained
43 pages
OpenMP: Multithreading Overview
No ratings yet
OpenMP: Multithreading Overview
61 pages
Understanding OpenMP Basics
No ratings yet
Understanding OpenMP Basics
46 pages
OpenMP Basics for Shared-Memory Programming
No ratings yet
OpenMP Basics for Shared-Memory Programming
48 pages
Cornel
No ratings yet
Cornel
87 pages
OpenMP Worksharing Techniques Explained
No ratings yet
OpenMP Worksharing Techniques Explained
23 pages
OpenMP Basics for Matrix-Vector Ops
No ratings yet
OpenMP Basics for Matrix-Vector Ops
28 pages
High Performance Computing Course Overview
No ratings yet
High Performance Computing Course Overview
131 pages
OpenMP: Shared Memory Parallelism Guide
No ratings yet
OpenMP: Shared Memory Parallelism Guide
30 pages
OpenMP Intro
No ratings yet
OpenMP Intro
35 pages
Open - MP Lecture 1
No ratings yet
Open - MP Lecture 1
28 pages
OpenMP Reduction Clause Benefits
No ratings yet
OpenMP Reduction Clause Benefits
71 pages
Lecture 04
No ratings yet
Lecture 04
35 pages
OpenMP Guide: Parallel Computing Basics
No ratings yet
OpenMP Guide: Parallel Computing Basics
37 pages
Understanding OpenMP Standards and Usage
No ratings yet
Understanding OpenMP Standards and Usage
42 pages
Parallel 4
No ratings yet
Parallel 4
9 pages
Introduction to OpenMP Programming
No ratings yet
Introduction to OpenMP Programming
35 pages
6CS005 Lecture6
No ratings yet
6CS005 Lecture6
38 pages
Lec4-Parallel Programming Models3
No ratings yet
Lec4-Parallel Programming Models3
22 pages
OpenMP Shared-Memory Programming in C
No ratings yet
OpenMP Shared-Memory Programming in C
73 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
15 pages
OpenMP Cheat Sheet for Parallel Programming
No ratings yet
OpenMP Cheat Sheet for Parallel Programming
61 pages
OpenMP Parallel Regions Explained
No ratings yet
OpenMP Parallel Regions Explained
19 pages
OpenMP Scheduling and Synchronization Guide
No ratings yet
OpenMP Scheduling and Synchronization Guide
51 pages
OpenMP Parallel Computing Tutorial
No ratings yet
OpenMP Parallel Computing Tutorial
58 pages
OpenMP: A Guide to Parallel Programming
No ratings yet
OpenMP: A Guide to Parallel Programming
54 pages
OpenMP Basics for Parallel Programming
No ratings yet
OpenMP Basics for Parallel Programming
11 pages
Understanding Shared Memory and OpenMP
No ratings yet
Understanding Shared Memory and OpenMP
86 pages
Lab 1
No ratings yet
Lab 1
11 pages
OpenMP Parallel Work Replication
No ratings yet
OpenMP Parallel Work Replication
19 pages
Dynamic Scheduling in OpenMP
No ratings yet
Dynamic Scheduling in OpenMP
15 pages
OpenMP: Shared Memory Parallelism Guide
No ratings yet
OpenMP: Shared Memory Parallelism Guide
35 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
32 pages
OpenMP Scheduling and Synchronization Guide
No ratings yet
OpenMP Scheduling and Synchronization Guide
49 pages
OpenMP Synchronization and Barriers Guide
No ratings yet
OpenMP Synchronization and Barriers Guide
40 pages
OpenMP Scheduling and Synchronization Guide
No ratings yet
OpenMP Scheduling and Synchronization Guide
56 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
88 pages
Growth and Development - GDP,GNP,NI 1778949630
No ratings yet
Growth and Development - GDP,GNP,NI 1778949630
86 pages
MPI Programming for Distributed Systems
No ratings yet
MPI Programming for Distributed Systems
44 pages
Parallel Computing Question Bank BCS702
No ratings yet
Parallel Computing Question Bank BCS702
2 pages
GPU Programming and MIMD Systems Guide
No ratings yet
GPU Programming and MIMD Systems Guide
10 pages
Enhancing Business Process Agility
No ratings yet
Enhancing Business Process Agility
8 pages
Routine Hub: Task Management System
No ratings yet
Routine Hub: Task Management System
4 pages
Python Full Stack Developer Resume
No ratings yet
Python Full Stack Developer Resume
2 pages
Mesh-TensorFlow: Advanced DNN Training
No ratings yet
Mesh-TensorFlow: Advanced DNN Training
10 pages
IoT Multiple Choice Questions Guide
50% (2)
IoT Multiple Choice Questions Guide
5 pages
Online Quiz Application Project Report
No ratings yet
Online Quiz Application Project Report
21 pages
Python Control Flow and Loops Guide
No ratings yet
Python Control Flow and Loops Guide
28 pages
Shankar Mukherjee's Filmography Overview
No ratings yet
Shankar Mukherjee's Filmography Overview
2 pages
Types of Network Communication Explained
No ratings yet
Types of Network Communication Explained
20 pages
Digimon Card Game Rulebook V2.3
No ratings yet
Digimon Card Game Rulebook V2.3
30 pages
Understanding Unit Testing Concepts
No ratings yet
Understanding Unit Testing Concepts
57 pages
IoT Smart Cradle for Baby Monitoring
No ratings yet
IoT Smart Cradle for Baby Monitoring
4 pages
ZKTeco-Brochure Esa Tytans Uganda
No ratings yet
ZKTeco-Brochure Esa Tytans Uganda
16 pages
Maya 2023 Camera and Object Controls
No ratings yet
Maya 2023 Camera and Object Controls
6 pages
Algebra and Number Theory Problem Set
No ratings yet
Algebra and Number Theory Problem Set
4 pages
Digital Torque-Angle Adaptor Manual
No ratings yet
Digital Torque-Angle Adaptor Manual
2 pages
Library Management System Overview
No ratings yet
Library Management System Overview
3 pages
Memory Hierarchy Design Fundamentals
No ratings yet
Memory Hierarchy Design Fundamentals
52 pages
Automatic Vehicle Classification Systems
No ratings yet
Automatic Vehicle Classification Systems
4 pages
C Programming Quiz Answers and Keys
No ratings yet
C Programming Quiz Answers and Keys
4 pages
RISC-V Jump and Branch Instructions Guide
No ratings yet
RISC-V Jump and Branch Instructions Guide
5 pages
k-Rotation Symmetric Boolean Functions
No ratings yet
k-Rotation Symmetric Boolean Functions
30 pages
Beats Fit Pro True Wireless Earbuds - Black - Apple
No ratings yet
Beats Fit Pro True Wireless Earbuds - Black - Apple
1 page
View PDF in VFP Form
No ratings yet
View PDF in VFP Form
7 pages
Facade Lighting Tender for Mettupalayam
No ratings yet
Facade Lighting Tender for Mettupalayam
10 pages
Internship Report at OMIIST Ltd
No ratings yet
Internship Report at OMIIST Ltd
49 pages
5 Key Insights for Developer-PM Collaboration
No ratings yet
5 Key Insights for Developer-PM Collaboration
9 pages
Microsoft Courseware Usage Agreement
No ratings yet
Microsoft Courseware Usage Agreement
1 page
Job Openings: Developers & AI/ML Roles
No ratings yet
Job Openings: Developers & AI/ML Roles
21 pages
7800/7700 MultiFrame Converter Manual
No ratings yet
7800/7700 MultiFrame Converter Manual
214 pages
Model Predictive Control in DSM for Buildings
No ratings yet
Model Predictive Control in DSM for Buildings
15 pages
LG 55UF7600 Main Board Overview
No ratings yet
LG 55UF7600 Main Board Overview
6 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

OpenMP Shared-Memory Programming Guide

Uploaded by

Module 4

Shared-memory programming with OpenMP

● Compiler directives (#pragma omp ... in C/C++).

● Runtime library routines (functions for thread management, timing, etc.).

● Environment variables (to control execution, e.g., number of threads).

OpenMP Execution Model

○ The program starts with a single thread (master thread).

○ At a parallel region, additional worker threads are created ("fork").

○ All threads execute the parallel code.

Compiling and running OpenMP programs

$ gcc -fopenmp my_program.c -o my_program

Run the Executable

Hello from thread 0

Hello from thread 1

● Assignment of trapezoids to threads.

● OpenMP provides the directive:

● Multiple threads may update sum simultaneously →

● Now, only one thread at a time modiﬁes sum, ensuring

When to Use critical

● When multiple threads update a shared resource (e.g.,

● Avoids race conditions.

● reduction(+:variable) → faster for sums/products.

● #pragma omp atomic → lighter weight than critical for

When you parallelize code with OpenMP, variables can be

● Default in OpenMP (unless speciﬁed otherwise).

● Each thread gets its own copy of the variable.

results into the shared variable using the speciﬁed operator.

● op = operator (e.g., +, *, -, &, |, max, min)

● variable = the shared variable that is reduced

● Without reduction, this would cause a race condition

a) for (C/C++) / do (Fortran)

- Control whether variables are shared or private.

● private(var) → Each thread gets its own copy.

● ﬁrstprivate(var) → Private, but initialized with master’s value.

● lastprivate(var) → Private, but last iteration’s value copied back.

● shared(var) → Variable is shared by all threads.

● default(shared|none) → Deﬁnes default sharing policy.

● single → restricts creation of tasks to one thread.

● task → allows execution of those tasks by any available

2. Loop-Carried Dependency (Not Parallelizable Directly)

● All iterations are independent.

2. Loop-Carried Dependency (Not Parallelizable Directly)

● One iteration depends on results from previous iterations.

A loop has a dependence if there exist iterations i and j (i ≠ j) such that:

● Both iterations access the same memory location.

● At least one access is a write.

● The order of execution matters (i < j).

Steps to Find Loop-Carried Dependence

1. Identify memory accesses inside the loop (reads and writes).

2. Check if the same memory location is accessed in different iterations.

3. Check if one of those accesses is a write.

4. If yes → there’s a loop-carried dependence.

● Iterations are divided before execution.

● Each thread gets a ﬁxed chunk of iterations.

● Loop has 100 iterations, 4 threads, chunk=25 →

● Iterations are assigned dynamically at runtime.

● Good for irregular workloads but slightly more overhead.

● Loop has 100 iterations, 4 threads, chunk=10 →

○ Thread 0 takes 0–9,

○ when done, it grabs 40–49 (if free).

● Threads start with large chunks that shrink exponentially.

● Ensures fast early load distribution and balanced later work.

● Useful for highly irregular workloads.

● First thread may get 50 iterations,

● then 12, … until chunk size is reached.

● Scheduling is decided at runtime using environment variable.

● With dynamic scheduling, it is uneven but balanced assignment.

● With static scheduling, threads get ﬁxed blocks of iterations.

● The compiler/runtime system decides the best schedule.

● Leaves control to the OpenMP implementation.

● A thread tries to acquire the lock.

You might also like