0% found this document useful (0 votes)
4 views31 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

mallikagu506
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views31 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

mallikagu506
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 4

Shared-memory programming with OpenMP

Shared-Memory Programming:
● In shared-memory programming, multiple threads (lightweight processes) run concurrently and share a common
address space.
● All threads can access global variables and memory, unlike distributed-memory programming (e.g., MPI) where each
process has its own memory.
● Synchronization is needed to avoid race conditions when threads access shared data simultaneously.

OpenMP:
● OpenMP (Open Multi-Processing) is an API (Application Programming Interface) that supports parallel
programming on shared-memory systems.
● It allows programmers to add directives (pragmas) in C, C++, and Fortran code to enable parallel execution.
Features of OpenMP

● Compiler directives (#pragma omp ... in C/C++).

● Runtime library routines (functions for thread management, timing, etc.).

● Environment variables (to control execution, e.g., number of threads).

OpenMP Execution Model

● Fork-Join Model:

○ The program starts with a single thread (master thread).

○ At a parallel region, additional worker threads are created ("fork").

○ All threads execute the parallel code.

○ At the end of the region, threads synchronize and only the master thread continues ("join").
Example OpenMP Program

Compiling and running OpenMP programs

$ gcc -fopenmp my_program.c -o my_program

Run the Executable


./my_program.exe

Output:

Hello from thread 0

Hello from thread 1

Hello from thread 2 ● #pragma omp parallel tells compiler to create multiple
threads.
Hello from thread 3
● Each thread executes the code inside the block.
● omp_get_thread_num() gives the thread ID.
The trapezoidal rule of OpenMP

● Assignment of trapezoids to threads.


#pragma omp critical
● A critical section is a block of code that must be executed by only one thread at a time.

● OpenMP provides the directive:

● When a thread enters the critical section, other threads wait until it is free.

● It prevents race conditions (when multiple threads try to update a shared variable at the same time).
Example Without Critical Section

● Multiple threads may update sum simultaneously →


wrong result.
Example Using #pragma omp critical

● Now, only one thread at a time modifies sum, ensuring


correctness.

When to Use critical

● When multiple threads update a shared resource (e.g.,


global variable, file, array element).

● Avoids race conditions.

Alternative to critical

● reduction(+:variable) → faster for sums/products.

● #pragma omp atomic → lighter weight than critical for


single-variable updates.
Scope of Variables in OpenMP

When you parallelize code with OpenMP, variables can be


Example
shared among threads or private to each thread.

1. Shared Variables

● Default in OpenMP (unless specified otherwise).


● All threads see the same memory location.
● Updates by one thread are visible to all others → can
cause race conditions.
● Result is unpredictable due to concurrent writes
Example:
2. Private Variables

● Each thread gets its own copy of the variable.


● Copies are uninitialized unless specified.
● Changes are local to the thread and discarded after
the parallel region.

Output:
The reduction clause

● The reduction clause lets threads safely update a shared variable (like sum, product, min, max) without race conditions.

● Each thread gets a private copy of the variable, performs updates locally, and at the end, OpenMP combines (reduces)

results into the shared variable using the specified operator.

General Syntax

● op = operator (e.g., +, *, -, &, |, max, min)

● variable = the shared variable that is reduced


Example – Summation

Correct result = 55

● Without reduction, this would cause a race condition


because multiple threads update sum simultaneously.
OpenMP Directives

1. Parallel Region
2. Work-Sharing Constructs
3. Synchronization Directives
4. Data Scope Directives
5. Reduction
6. Tasking
1. Parallel Region:
● Marks a block of code to be executed by multiple threads.
Syntax:

2. Work-Sharing Constructs:
● Divide work among threads.

a) for (C/C++) / do (Fortran)


● Parallelize loop iterations.
Syntax:
b) sections
● Different threads execute different code sections.
Syntax:

d) master
● Executed only by the master thread (thread 0).
Syntax:

c) single
● Only one thread executes the block (others skip).
Syntax:
3. Synchronization Directives: Prevent race conditions.

c) barrier
a) critical
● All threads wait until everyone reaches this point.
● Only one thread executes this block at a time.
Syntax:
Syntax:

b) atomic d) ordered
● Lightweight protection for simple updates. ● Forces loop iterations to execute in order.
Syntax: Syntax:
4. Data Scope Directives

- Control whether variables are shared or private.

● private(var) → Each thread gets its own copy.

● firstprivate(var) → Private, but initialized with master’s value.

● lastprivate(var) → Private, but last iteration’s value copied back.

● shared(var) → Variable is shared by all threads.

● default(shared|none) → Defines default sharing policy.


5. Reduction 6. Tasking
● Performs safe accumulation. ● Create independent tasks (not tied to loop iterations).

● single → restricts creation of tasks to one thread.

● task → allows execution of those tasks by any available


thread in the parallel team.
Data Dependence

● Data dependence occurs when one statement or loop iteration uses data that is produced, modified, or needed by
another statement or iteration.
● In parallel programming (like OpenMP), data dependence determines whether statements or loop iterations can be
executed independently and in parallel or must be executed sequentially.

2 types:

1. No Dependency (Parallelizable)

2. Loop-Carried Dependency (Not Parallelizable Directly)


1. No Dependency (Parallelizable)

● All iterations are independent.

2. Loop-Carried Dependency (Not Parallelizable Directly)

● One iteration depends on results from previous iterations.


● Cannot be parallelized directly because each step depends on the previous step.
Finding loop-carried dependencies

● A loop-carried dependence exists if an iteration of a loop depends on data from a previous iteration.

A loop has a dependence if there exist iterations i and j (i ≠ j) such that:

● Both iterations access the same memory location.

● At least one access is a write.

● The order of execution matters (i < j).

Steps to Find Loop-Carried Dependence

1. Identify memory accesses inside the loop (reads and writes).

2. Check if the same memory location is accessed in different iterations.

3. Check if one of those accesses is a write.

4. If yes → there’s a loop-carried dependence.


Example 1: Independent (No Dependence) Example 2: Loop-Carried Dependence

● Iteration i writes to A[i] only. ● Iteration i reads A[i-1] (produced in the previous
iteration).
● Each iteration touches a different memory location.
● Loop-carried dependence exists → not directly
● No loop-carried dependence → parallelizable. parallelizable.
Scheduling loops

● Scheduling loops refers to the method used by OpenMP to divide the iterations of a loop among multiple threads when

executing in parallel.

Types of Scheduling:

1. Static Scheduling

2. Dynamic Scheduling

3. Guided Scheduling

4. Runtime Scheduling
1. Static Scheduling

● Iterations are divided before execution.

● Each thread gets a fixed chunk of iterations.

● Low overhead but can lead to load imbalance if work per iteration varies.

Example:

● Loop has 100 iterations, 4 threads, chunk=25 →

○ Thread 0: 0–24

○ Thread 1: 25–49

○ Thread 2: 50–74

○ Thread 3: 75–99
2. Dynamic Scheduling

● Iterations are assigned dynamically at runtime.

● When a thread finishes its chunk, it grabs the next available chunk.

● Good for irregular workloads but slightly more overhead.

Example:

● Loop has 100 iterations, 4 threads, chunk=10 →

○ Thread 0 takes 0–9,

○ when done, it grabs 40–49 (if free).


3. Guided Scheduling

● Threads start with large chunks that shrink exponentially.

● Ensures fast early load distribution and balanced later work.

● Useful for highly irregular workloads.

Example:

● First thread may get 50 iterations,

● then 25,

● then 12, … until chunk size is reached.


4. Runtime Scheduling

● Scheduling is decided at runtime using environment variable.

● With dynamic scheduling, it is uneven but balanced assignment.

● With static scheduling, threads get fixed blocks of iterations.


4. Auto Scheduling

● The compiler/runtime system decides the best schedule.

● Leaves control to the OpenMP implementation.


Critical Sections and Locks
a) Locks
● When multiple threads run in parallel, they often need to access shared resources (like variables, arrays, files, or queues).
● More flexible than critical.
● Use OpenMP lock routines:
omp_lock_t mylock; // critical work here

● Declares a lock variable. ● Place any code here that must be executed by only
● Think of it as a "door key" that threads use to control one thread at a time.
access. ● Example: updating a shared counter, modifying a
queue, writing to a file.
omp_init_lock(&mylock);
omp_unset_lock(&mylock);
● Initializes the lock.
● Must be done before any thread tries to use the lock. ● Releases the lock.
● Other threads waiting at omp_set_lock can now
omp_set_lock(&mylock);
acquire it and run their critical work.

● A thread tries to acquire the lock.


omp_destroy_lock(&mylock);
● If another thread already holds it, this thread waits
(blocks) until it becomes free. ● Cleans up the lock when it’s no longer needed.
● Once acquired, the thread can enter the critical section ● Should be called once, usually after the parallel region
safely. ends.
Example Without Lock: With Lock:

OUTPUT:

OUTPUT:

You might also like