0% found this document useful (0 votes)

10 views28 pages

OpenMP Shared-Memory Programming Guide

The document provides a comprehensive overview of OpenMP, an API for shared-memory parallel programming, detailing its programming model, key features, and applications. It covers essential concepts such as pragma directives, variable scoping, synchronization mechanisms, and performance optimization techniques. Additionally, it includes practical examples and best practices for implementing parallel algorithms effectively.

Uploaded by

Raghavendra gs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views28 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

Raghavendra gs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

OpenMP Shared-Memory Programming

Slide 1: Title Slide

OpenMP Shared-Memory Programming Parallel Programming with
OpenMP
Learning Objectives:
• Understand OpenMP programming model and execution
• Master pragma directives and their applications
• Implement parallel algorithms effectively
• Handle variable scoping and data sharing
• Optimize performance through proper scheduling
• Manage synchronization and avoid race conditions
• Understand cache effects and memory hierarchy
• Implement task-based parallelism
Prerequisites: Basic C/C++, multi-threading concepts, computer architecture
Introduction to OpenMP
What is OpenMP? OpenMP (Open Multi-Processing) is an API for shared-
memory parallel programming using compiler directives, library routines, and
environment variables.
Key Features:
• Pragma-based: Uses compiler directives to parallelize code
• Fork-Join Model: Master thread creates worker threads as needed
• Shared Memory: All threads can access the same memory space
• Incremental Parallelization: Add parallelism gradually to existing code
• Portable: Supported by most modern compilers
Compilation:
// Compile with OpenMP support
gcc -fopenmp program.c -o program
icc -qopenmp program.c -o program
clang -fopenmp program.c -o program
// Set number of threads
export OMP_NUM_THREADS=4
./program
Applications: Scientific computing, image processing, financial modeling,
weather simulation, machine learning
OpenMP Pragmas and Directives
Pragma Syntax:
#pragma omp directive [clause [clause]...]
structured-block
Basic Example:
#include <omp.h>
#include <stdio.h>
int main() {
printf("Before parallel region\n");

#pragma omp parallel

{
int thread_id = omp_get_thread_num();
int total_threads = omp_get_num_threads();
printf("Hello from thread %d of %d\n", thread_id, total_threads);
}

printf("After parallel region\n");

return 0;
}
Execution Model:
1. Program starts with master thread (thread 0)
2. Master creates team of threads at parallel region
3. All threads execute structured block in parallel
4. Threads synchronize at end, all except master terminate
5. Execution continues serially with master thread
Common Functions:
int omp_get_num_threads(); // Get number of threads
int omp_get_thread_num(); // Get current thread ID
void omp_set_num_threads(int); // Set number of threads
double omp_get_wtime(); // Get wall clock time
Trapezoidal Rule - Theory
Numerical Integration Concept: The trapezoidal rule approximates definite
integrals by dividing the area under a curve into trapezoids.
Formula: ∫[a to b] f(x) dx ≈ (h/2)[f(a) + 2∑f(xi) + f(b)] where h = (b-a)/n and xi
= a + i*h
Sequential Implementation:
c
double f(double x) {
return x * x; // f(x) = x²
}

double sequential_trap(double a, double b, int n) {

double h = (b - a) / n;
double integral = (f(a) + f(b)) / 2.0;

for (int i = 1; i <= n-1; i++) {

double x_i = a + i * h;
integral += f(x_i);
}
return integral * h;
}
Analysis:
• Time Complexity: O(n)
• Accuracy: Error decreases as O(h²)
• Parallelization Potential: Each iteration is independent
Trapezoidal Rule - Parallel Implementation
Parallel Strategy: Distribute computation of trapezoids among threads, then
combine results.
Method 1: Using reduction
double parallel_trap_v1(double a, double b, int n) {
double h = (b - a) / n;
double integral = 0.0;

#pragma omp parallel for reduction(+:integral)

for (int i = 0; i < n; i++) {
double x_i = a + (i + 0.5) * h;
integral += f(x_i);
}
return integral * h;
}
Method 2: Manual distribution
double parallel_trap_v2(double a, double b, int n) {
double h = (b - a) / n;
double global_integral = 0.0;

#pragma omp parallel

{
int thread_id = omp_get_thread_num();
int num_threads = omp_get_num_threads();
int local_n = n / num_threads;
int start = thread_id * local_n;
int end = (thread_id == num_threads - 1) ? n : start + local_n;
double local_integral = 0.0;
for (int i = start; i < end; i++) {
double x_i = a + (i + 0.5) * h;
local_integral += f(x_i);
}

#pragma omp critical

{
global_integral += local_integral;
}
}
return global_integral * h;
}
Performance Tip: Method 1 with reduction is more efficient
Variable Scoping
Variable Scoping Rules: Determines how variables are shared or distributed
among threads.
Default Rules:
• Shared: Variables declared outside parallel region
• Private: Variables declared inside parallel region
• Private: Loop index variables in parallel for loops
Scoping Clauses:
Clause Behavior Use Case

All threads access same

shared(var) Data accessed by all threads
memory

Each thread gets uninitialized

private(var) Temporary variables
copy

Private copy initialized from

firstprivate(var) Variables needing initial values
master

Private copy, final value copied Variables whose final value

lastprivate(var)
back needed
Example:
int main() {
int a = 10, b = 20, c = 30, d = 40;
#pragma omp parallel for private(a) firstprivate(b) lastprivate(c) shared(d)
for (int i = 0; i < 4; i++) {
int thread_id = omp_get_thread_num();
a = thread_id; // private (uninitialized)
b += thread_id; // firstprivate (initialized to 20)
c = thread_id * 100; // lastprivate (copied back)
// d is shared by all threads
}
return 0;
}
Reduction Clause
What is Reduction? Common pattern where multiple threads contribute to
single result through commutative and associative operation.
How It Works:
1. Each thread gets private copy of reduction variable
2. Private copies initialized based on operation (0 for +, 1 for *)
3. Each thread operates on private copy
4. All private copies combined using specified operation
5. Final result stored in original variable
Supported Operations:
c
// Arithmetic
double sum = 0.0;
int product = 1;
#pragma omp parallel for reduction(+:sum) reduction(*:product)
for (int i = 0; i < n; i++) {
sum += array[i];
product *= array[i];
}

// Min/Max
double min_val = 1e9, max_val = -1e9;
#pragma omp parallel for reduction(min:min_val) reduction(max:max_val)
for (int i = 0; i < n; i++) {
if (array[i] < min_val) min_val = array[i];
if (array[i] > max_val) max_val = array[i];
}
// Logical
int all_positive = 1;
#pragma omp parallel for reduction(&&:all_positive)
for (int i = 0; i < n; i++) {
all_positive = all_positive && (array[i] > 0);
}
Practical Examples:
• Dot product of vectors
• Counting elements satisfying condition
• Computing vector norm
Loop Carried Dependencies
Definition: Loop-carried dependency exists when an iteration depends on
results of previous iteration. These loops cannot be parallelized directly.
Types of Dependencies:
• True Dependency (RAW): Read After Write
• Anti Dependency (WAR): Write After Read
• Output Dependency (WAW): Write After Write
Problematic Examples:
c
// Cannot parallelize - has dependencies
for (int i = 1; i < n; i++) {
a[i] = a[i-1] + b[i]; // Depends on previous iteration
}

// Fibonacci - classic dependency

fib[0] = 0; fib[1] = 1;
for (int i = 2; i < n; i++) {
fib[i] = fib[i-1] + fib[i-2]; // Cannot parallelize
}
Safe Examples:
c
// Can parallelize - no dependencies
#pragma omp parallel for
for (int i = 0; i < n; i++) {
c[i] = a[i] + b[i]; // Independent iterations
}
// Matrix multiplication
#pragma omp parallel for collapse(2)
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++) {
double sum = 0.0;
for (int k = 0; k < inner; k++) {
sum += A[i][k] * B[k][j];
}
C[i][j] = sum;
}
}
Scheduling
Purpose: Determines how loop iterations are distributed among threads.
Scheduling Types:
• Static: Iterations divided evenly at compile time
• Dynamic: Iterations assigned at runtime for load balancing
• Guided: Decreasing chunk sizes over time
• Auto: Compiler/runtime decides
Examples:
c
// Static scheduling
#pragma omp parallel for schedule(static, 100)
for (int i = 0; i < n; i++) {
process_data(i); // Even workload distribution
}

// Dynamic scheduling
#pragma omp parallel for schedule(dynamic, 10)
for (int i = 0; i < n; i++) {
complex_computation(i); // Good for irregular workloads
}

// Guided scheduling
#pragma omp parallel for schedule(guided, 50)
for (int i = 0; i < n; i++) {
variable_work(i); // Chunk sizes decrease over time
}
// Runtime scheduling
#pragma omp parallel for schedule(runtime)
for (int i = 0; i < n; i++) {
work(i); // Uses OMP_SCHEDULE environment variable
}
When to Use:
• Static: Equal workload per iteration
• Dynamic: Highly variable workload
• Guided: Moderate load imbalance
Producers and Consumers
Producer-Consumer Pattern: Synchronization pattern where some threads
produce data while others consume it.
Implementation with Critical Sections:
c
#define BUFFER_SIZE 100
int buffer[BUFFER_SIZE];
int count = 0, in = 0, out = 0;

void producer(int item) {

#pragma omp critical(buffer_access)
{
while (count == BUFFER_SIZE) {
// Buffer full - wait
}
buffer[in] = item;
in = (in + 1) % BUFFER_SIZE;
count++;
}
}

int consumer() {
int item;
#pragma omp critical(buffer_access)
{
while (count == 0) {
// Buffer empty - wait
}
item = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
}
return item;
}
Using Atomic Operations:
c
int shared_counter = 0;

#pragma omp parallel

{
#pragma omp for
for (int i = 0; i < 1000; i++) {
#pragma omp atomic
shared_counter++;
}
}
Caches and Cache Coherence
Cache Coherence: Ensures all processors see the same value for shared
memory locations.
Memory Hierarchy Impact:
• L1 Cache: Fastest, private to each core
• L2 Cache: Shared between cores on same chip
• L3 Cache: Shared among all cores
• Main Memory: Slowest, shared by all
Cache-Friendly Programming:
c
// Good: Sequential access pattern
#pragma omp parallel for
for (int i = 0; i < n; i++) {
array[i] = compute(i); // Good cache locality
}

// Poor: Random access pattern

#pragma omp parallel for
for (int i = 0; i < n; i++) {
int idx = random_index[i];
array[idx] = compute(idx); // Poor cache locality
}
Avoiding Cache Line Conflicts:
c
// Use padding to avoid conflicts
struct {
int counter;
char pad[60]; // Padding to cache line boundary
} thread_data[MAX_THREADS];

#pragma omp parallel

{
int id = omp_get_thread_num();
for (int i = 0; i < iterations; i++) {
thread_data[id].counter++;
}
}
False Sharing
Definition: Multiple threads access different variables that share the same cache
line, causing unnecessary cache invalidations.
Problem Example:
c
// BAD: False sharing occurs
int counters[MAX_THREADS]; // Adjacent in memory

#pragma omp parallel

{
int id = omp_get_thread_num();
for (int i = 0; i < iterations; i++) {
counters[id]++; // False sharing between threads
}
}
Solutions:
1. Padding:
c
struct padded_counter {
int count;
char pad[64 - sizeof(int)]; // Cache line padding
};
struct padded_counter counters[MAX_THREADS];
2. Thread-local storage:
c
#pragma omp parallel
{
int local_counter = 0; // Private to each thread

for (int i = 0; i < iterations; i++) {

local_counter++;
}

#pragma omp critical

{
global_counter += local_counter;
}
}
3. Use reduction:
c
int total = 0;
#pragma omp parallel for reduction(+:total)
for (int i = 0; i < n; i++) {
total += work(i);
}
OpenMP Tasking
Purpose: Tasks enable irregular parallelism and dynamic work creation.
Basic Task Creation:
c
#pragma omp parallel
{
#pragma omp single
{
for (int i = 0; i < num_tasks; i++) {
#pragma omp task
{
process_item(i); // Each iteration becomes a task
}
}
}
}
Recursive Tasks - Fibonacci:
c
int fibonacci_task(int n) {
if (n < 2) return n;

int x, y;

#pragma omp task shared(x)

x = fibonacci_task(n - 1);
#pragma omp task shared(y)
y = fibonacci_task(n - 2);

#pragma omp taskwait // Wait for both tasks

return x + y;
}

int main() {
int result;
#pragma omp parallel
{
#pragma omp single
result = fibonacci_task(10);
}
return 0;
}
Task Dependencies:
c
#pragma omp parallel
{
#pragma omp single
{
#pragma omp task depend(out:a)
a = compute_a();
#pragma omp task depend(out:b)
b = compute_b();

#pragma omp task depend(in:a,b) depend(out:c)

c = combine(a, b);
}
}
Thread Safety
Thread Safety Mechanisms:
• Critical Sections: Mutual exclusion
• Atomic Operations: Indivisible operations
• Locks: Explicit synchronization
• Barriers: Synchronization points
Synchronization Examples:
c
// Critical section
#pragma omp critical
{
shared_resource++;
}

// Named critical section

#pragma omp critical(update_max)
{
if (value > global_max) global_max = value;
}

// Atomic operations
#pragma omp atomic
counter++;

#pragma omp atomic read

local_value = shared_value;
#pragma omp atomic write
shared_value = new_value;

// Barriers
#pragma omp parallel
{
phase_1_work();
#pragma omp barrier // All threads wait here
phase_2_work();
}
Explicit Locks:
c
#include <omp.h>
omp_lock_t my_lock;

int main() {
omp_init_lock(&my_lock);

#pragma omp parallel

{
omp_set_lock(&my_lock);
shared_work(); // Critical section
omp_unset_lock(&my_lock);
}
omp_destroy_lock(&my_lock);
return 0;
}
Performance Hierarchy: Atomic > Critical > Locks (choose based on needs)
Best Practices and Summary
Best Practices:
1. Start Simple: Begin with basic parallel for loops
2. Minimize Synchronization: Use reduction instead of critical sections
3. Consider Data Locality: Keep related data close in memory
4. Avoid False Sharing: Use padding or thread-local storage
5. Choose Right Scheduling: Static for balanced loads, dynamic for
irregular
6. Profile Your Code: Measure performance improvements
7. Handle Race Conditions: Use proper synchronization
Common Pitfalls:
• Assuming linear speedup
• Overusing critical sections
• Ignoring cache effects
• Not considering overhead of thread creation
• Parallelizing too small loops
Performance Considerations:
• Thread creation overhead
• Memory bandwidth limitations
• Cache coherence costs
• Load balancing issues
• Amdahl's Law limitations
Summary: OpenMP provides powerful tools for shared-memory
parallelization. Success requires understanding of:
• Proper variable scoping
• Effective synchronization
• Cache-aware programming
• Task-based parallelism for irregular problems
• Performance measurement and optimization

OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
OpenMP Performance Optimization Techniques
No ratings yet
OpenMP Performance Optimization Techniques
64 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
91 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
19 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
30 pages
Programming Shared Memory With-OpenMP
No ratings yet
Programming Shared Memory With-OpenMP
48 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
31 pages
OpenMP Basics for Shared-Memory Programming
No ratings yet
OpenMP Basics for Shared-Memory Programming
48 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
34 pages
OpenMP for High-Performance Computing
No ratings yet
OpenMP for High-Performance Computing
35 pages
Lec4-Parallel Programming Models3
No ratings yet
Lec4-Parallel Programming Models3
22 pages
OpenMP and MPI: Multithreading Basics
No ratings yet
OpenMP and MPI: Multithreading Basics
23 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
25 pages
PPModule 4 Notes
No ratings yet
PPModule 4 Notes
44 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
34 pages
OpenMP Directives and Trapezoidal Rule
No ratings yet
OpenMP Directives and Trapezoidal Rule
34 pages
Week 10 Slides
No ratings yet
Week 10 Slides
62 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
98 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
112 pages
OpenMP Programming Model Overview
No ratings yet
OpenMP Programming Model Overview
89 pages
PC Module 4
No ratings yet
PC Module 4
31 pages
OpenMP Worksharing Techniques Explained
No ratings yet
OpenMP Worksharing Techniques Explained
23 pages
OpenMP Basics for Parallel Programming
No ratings yet
OpenMP Basics for Parallel Programming
11 pages
OpenMP
No ratings yet
OpenMP
17 pages
BCS702 Module 4
No ratings yet
BCS702 Module 4
26 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
40 pages
OpenMP Basics for Parallel Programming
No ratings yet
OpenMP Basics for Parallel Programming
14 pages
OpenMP Programming: A Comprehensive Guide
No ratings yet
OpenMP Programming: A Comprehensive Guide
61 pages
OpenMP Multicore Programming Guide
No ratings yet
OpenMP Multicore Programming Guide
17 pages
Parallel 4
No ratings yet
Parallel 4
9 pages
OpenMP Shared-Memory Programming in C
No ratings yet
OpenMP Shared-Memory Programming in C
73 pages
Lecture 10
No ratings yet
Lecture 10
43 pages
OpenMP Basics and Parallel Processing
No ratings yet
OpenMP Basics and Parallel Processing
93 pages
OpenMP Tutorial for Parallel Computing
No ratings yet
OpenMP Tutorial for Parallel Computing
23 pages
Multicore Architecture Programming Guide
No ratings yet
Multicore Architecture Programming Guide
24 pages
Module 4 Notes Dr. NML
No ratings yet
Module 4 Notes Dr. NML
41 pages
OpenMP Programming Overview by Nakano
No ratings yet
OpenMP Programming Overview by Nakano
10 pages
Mcap Record Abhi
No ratings yet
Mcap Record Abhi
21 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
52 pages
OpenMP Prefix Sum Implementation
No ratings yet
OpenMP Prefix Sum Implementation
39 pages
Open MP
No ratings yet
Open MP
13 pages
Unit 3 PPP
No ratings yet
Unit 3 PPP
22 pages
OpenMP: Concurrency in Programming
No ratings yet
OpenMP: Concurrency in Programming
8 pages
OpenMP Synchronization and Barriers Guide
No ratings yet
OpenMP Synchronization and Barriers Guide
40 pages
OpenMP Shared Memory Programming
No ratings yet
OpenMP Shared Memory Programming
13 pages
OpenMP Parallel Computing Tutorial
No ratings yet
OpenMP Parallel Computing Tutorial
58 pages
MTCS201 Unit4 Notes
No ratings yet
MTCS201 Unit4 Notes
10 pages
OpenMP: Multi-Threading API Guide
No ratings yet
OpenMP: Multi-Threading API Guide
15 pages
OpenMP Programming for NUMA Systems
No ratings yet
OpenMP Programming for NUMA Systems
51 pages
MPI Trapezoidal Rule Integration Guide
No ratings yet
MPI Trapezoidal Rule Integration Guide
13 pages
Module-04 Bcs702 (Parallel Computing) Search Creators
No ratings yet
Module-04 Bcs702 (Parallel Computing) Search Creators
46 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
42 pages
OpenMP Parallel Programming Techniques
No ratings yet
OpenMP Parallel Programming Techniques
17 pages
Shared Memory Programming in HPC
No ratings yet
Shared Memory Programming in HPC
40 pages
Module 3 Chapter 3
No ratings yet
Module 3 Chapter 3
18 pages
Regression - Module 3 Chapter 5
No ratings yet
Regression - Module 3 Chapter 5
6 pages
Understanding GPGPU and CUDA Architecture
No ratings yet
Understanding GPGPU and CUDA Architecture
15 pages
Viva Questions BCSL606
No ratings yet
Viva Questions BCSL606
3 pages
Understanding Document Databases
No ratings yet
Understanding Document Databases
61 pages
AI Planning Techniques Explained
No ratings yet
AI Planning Techniques Explained
26 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
7 pages
BCS702 Parallel Computing Syllabus
No ratings yet
BCS702 Parallel Computing Syllabus
4 pages
AI Planning with PDDL and Examples
No ratings yet
AI Planning with PDDL and Examples
34 pages
OpenMP 6.0 API Syntax Reference Guide
No ratings yet
OpenMP 6.0 API Syntax Reference Guide
20 pages
ER Model for Employee Management System
No ratings yet
ER Model for Employee Management System
98 pages
Circular Park and Tangent Problems
No ratings yet
Circular Park and Tangent Problems
55 pages
Smith & Nephew 450P/460P Camera Manual
No ratings yet
Smith & Nephew 450P/460P Camera Manual
60 pages
Android Location Manager Settings Overview
No ratings yet
Android Location Manager Settings Overview
18 pages
Apache Kafka
No ratings yet
Apache Kafka
245 pages
History of PHP Programming Language
No ratings yet
History of PHP Programming Language
4 pages
FIR Filter Design with MATLAB Analysis
No ratings yet
FIR Filter Design with MATLAB Analysis
16 pages
Oracle ATP Database Exam Guide 1Z0-931-20
No ratings yet
Oracle ATP Database Exam Guide 1Z0-931-20
6 pages
VLSI Design Training Program in Hyderabad
No ratings yet
VLSI Design Training Program in Hyderabad
2 pages
Advanced Digital System Design Exam 2012
No ratings yet
Advanced Digital System Design Exam 2012
3 pages
Status Code 213: Storage Unit Issues
No ratings yet
Status Code 213: Storage Unit Issues
3 pages
Understanding the Recycle Bin in Windows
No ratings yet
Understanding the Recycle Bin in Windows
14 pages
Vaisala MW51 Sounding System Overview
No ratings yet
Vaisala MW51 Sounding System Overview
2 pages
PLC Automation for Chemical Mixing
No ratings yet
PLC Automation for Chemical Mixing
8 pages
FloBoss S600
No ratings yet
FloBoss S600
11 pages
MSI MS-7149 Motherboard Overview
No ratings yet
MSI MS-7149 Motherboard Overview
30 pages
Flash ADC Design Challenges and Solutions
No ratings yet
Flash ADC Design Challenges and Solutions
5 pages
Essential CCNA 1 Commands Guide
No ratings yet
Essential CCNA 1 Commands Guide
9 pages
Etna Pano Kontrol Sistemleri en Brosur01
No ratings yet
Etna Pano Kontrol Sistemleri en Brosur01
8 pages
Understanding Splunk Indexes and Data
No ratings yet
Understanding Splunk Indexes and Data
79 pages
DC Stop 793 301 Technical Overview
No ratings yet
DC Stop 793 301 Technical Overview
1 page
Abstractlevelppt
No ratings yet
Abstractlevelppt
16 pages
X86 Software Cracking Techniques Guide
No ratings yet
X86 Software Cracking Techniques Guide
3 pages
Website SEO Progress Report 2024
No ratings yet
Website SEO Progress Report 2024
23 pages
IMDPS: Satellite Data Processing Overview
No ratings yet
IMDPS: Satellite Data Processing Overview
36 pages
A Training Report On Bascs of Python
No ratings yet
A Training Report On Bascs of Python
53 pages
Low Pass Filter Circuit Analysis
No ratings yet
Low Pass Filter Circuit Analysis
9 pages
Integrated Learning Environment Overview
No ratings yet
Integrated Learning Environment Overview
32 pages
CPU and Memory Essentials Guide
No ratings yet
CPU and Memory Essentials Guide
26 pages
Texas Instruments Technical Exam Questions
100% (1)
Texas Instruments Technical Exam Questions
14 pages
Advantages of Digital vs Analogue Signals
No ratings yet
Advantages of Digital vs Analogue Signals
32 pages
Zebranet Bridge Fact Sheet en Us GL A4
No ratings yet
Zebranet Bridge Fact Sheet en Us GL A4
3 pages

OpenMP Shared-Memory Programming Guide

Uploaded by

OpenMP Shared-Memory Programming Guide

Uploaded by

OpenMP Shared-Memory Programming

Slide 1: Title Slide

#pragma omp parallel

printf("After parallel region\n");

double sequential_trap(double a, double b, int n) {

for (int i = 1; i <= n-1; i++) {

#pragma omp parallel for reduction(+:integral)

#pragma omp parallel

#pragma omp critical

All threads access same

Each thread gets uninitialized

Private copy initialized from

Private copy, final value copied Variables whose final value

// Fibonacci - classic dependency

void producer(int item) {

#pragma omp parallel

// Poor: Random access pattern

#pragma omp parallel

#pragma omp parallel

for (int i = 0; i < iterations; i++) {

#pragma omp critical

#pragma omp task shared(x)

#pragma omp taskwait // Wait for both tasks

#pragma omp task depend(in:a,b) depend(out:c)

// Named critical section

#pragma omp atomic read

#pragma omp parallel

You might also like