0% found this document useful (0 votes)

4 views43 pages

Lecture 10

The document provides an introduction to distributed and parallel computing, focusing on OpenMP, an API for shared-memory parallel programming in C, C++, and Fortran. It explains key features, execution models, and practical examples of using OpenMP for parallelizing code, including directives, thread management, and scheduling policies. Additionally, it discusses the importance of avoiding race conditions when multiple threads access shared data.

Uploaded by

deep03rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views43 pages

Lecture 10

Uploaded by

deep03rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Distributed and Parallel

Computing
CS-302

Dr. Sanjay Saxena

Assistant Professor, CSE, IIIT Vadodara
Post doc – University of Pennsylvania, USA
PhD – Indian Institute of Technology(BHU), Varanasi
1
Parallel Computing/Parallel Programming

• OpenMP → Shared memory (one machine, many cores)

• MPI → Distributed memory (many machines, connected via network)

• CUDA → GPU computing (thousands of lightweight cores)

2
What is OpenMP?
• OpenMP (Open Multi-Processing) is an API that supports multi-platform shared-memory parallel
programming in C, C++, and Fortran.

• It allows developers to parallelize code easily by adding simple compiler directives (like #pragma
omp).
Key Features
API → It’s not a new language; it’s a set of functions, compiler directives, and environment variables.
Directives → Written as #pragma omp ... in C/C++, Example: #pragma omp parallel
Supported Languages → C, C++, and Fortran (standard in scientific computing).
Execution Model →
1. Follows a fork-join model:
1. Master thread forks worker threads to do tasks in parallel.
2. They later join back at synchronization points.
Portability → Works across multiple platforms (Linux, Windows, macOS) with compilers like GCC, Clang. 3
Example
Why it is Popular
• Ease of use: Add pragmas to existing
#include <stdio.h>
sequential code.
#include <omp.h>
• Minimal code changes: Same program can run
serially or in parallel.
int main() {
• Great for shared-memory systems: Fits
#pragma omp parallel
laptops, desktops, and multi-core servers.
{
• Widely used in scientific & engineering
printf("Hello from thread %d out of %d\n",
applications: simulations, image processing,
omp_get_thread_num(), omp_get_num_threads());
machine learning preprocessing.
}
return 0;
}

With just one directive (#pragma omp parallel), the program runs in parallel.
4
OpenMP execution model
1. Master Thread
•The program always starts with one thread (the master).
•Master thread executes the code sequentially (like normal C/C++).
•When a parallel region is encountered (#pragma omp parallel), the
master forks new worker threads.

2. Worker Threads
•Created when the master enters a parallel region.
•Each worker executes the parallelized portion of the code.
•All threads run concurrently and may handle different parts of the data or tasks.

3. Fork-Join Model
[Link] → Master spawns a team of worker threads.
[Link] → Each thread does its assigned task.
[Link] → Workers finish and master continues execution.
5
Contd..

6
7
If we run a program on a machine with 8 CPU cores, will it
always create exactly 8 threads? Why or why not?

Why does the order of printed lines change every time we

run this program, even though the code is the same?

8
Example Execution Flow
#include <stdio.h>
#include <omp.h> What happens here?

int main() { • First line runs only by master thread.

printf("Program starts with Master thread only.\n"); • When #pragma omp parallel is reached →
threads fork.
#pragma omp parallel
• Each thread (master + workers) prints “Hello
{
from thread …”.
int tid = omp_get_thread_num();
• After parallel region, all workers join back →
int total = omp_get_num_threads();
only master continues.
printf("Hello from thread %d out of %d\n", tid, total);
}
printf("Back to Master thread only.\n");
return 0;
} 9
Syntax overview: #pragma omp ...
OpenMP in C/C++ uses compiler directives that begin with:
#pragma omp <directive> [clauses]
#pragma → tells compiler it’s a special instruction.
omp → specifies it’s an OpenMP directive.
<directive> → what kind of parallel action to take.
[clauses] → extra options to control behavior (optional).

General Structure
#pragma omp parallel [clauses]
{
// Code block to run in parallel
}
parallel → keyword meaning "run this block with multiple threads".
The block { ... } is executed by all threads.
Without clauses, the number of threads depends on system or OMP_NUM_THREADS env variable.

10
Parallel Region
#pragma omp parallel
{ Sections
#pragma omp parallel sections
printf("Thread ID: %d\n", omp_get_thread_num());
{
}
#pragma omp section
Creates multiple threads, each executes the block.
{ task1(); }

for / parallel for #pragma omp section

{ task2(); }
#pragma omp parallel for
}
for(int i=0; i<100; i++) {
a[i] = i*i;
}

11
Fibonacci, Factorial, and Prime – Using sections
#include <stdio.h> int main() {
#include <omp.h>
#pragma omp parallel sections
// Fibonacci {
int fibonacci(int n) { #pragma omp section
if (n <= 1) return n; {
return fibonacci(n-1) + fibonacci(n-2); printf("Thread %d: Fibonacci(10) = %d\n",
} omp_get_thread_num(), fibonacci(10));
}
// Factorial
int factorial(int n) { #pragma omp section
int fact = 1; {
for(int i = 1; i <= n; i++) printf("Thread %d: Factorial(5) = %d\n",
fact *= i; omp_get_thread_num(), factorial(5));
return fact; }
}
#pragma omp section
// Prime check {
int isPrime(int n) { printf("Thread %d: Prime check (29) = %s\n",
if(n <= 1) return 0; omp_get_thread_num(),
for(int i = 2; i*i <= n; i++) isPrime(29) ? "Prime" : "Not Prime");
if(n % i == 0) return 0; }
return 1; } 12
} return 0;
Sum of Array, Maximum Element, and Average (Independent Tasks)
#pragma omp section
#include <stdio.h> {
#include <omp.h> for(int i = 1; i < 10; i++)
if(a[i] > max)
int main() { max = a[i];
int a[10] = {2, 5, 8, 3, 10, 6, 4, 9, 1, 7}; printf("Thread %d computed Max = %d\n",
int sum = 0, max = a[0]; omp_get_thread_num(), max);
double avg; }

#pragma omp parallel sections #pragma omp section

{ {
#pragma omp section for(int i = 0; i < 10; i++)
{ avg += a[i];
for(int i = 0; i < 10; i++) avg = avg / 10;
sum += a[i]; printf("Thread %d computed Average = %.2f\n",
printf("Thread %d computed Sum = %d\n", omp_get_thread_num(), avg);
omp_get_thread_num(), sum); }
} }

return 0;
13
}
Each thread will execute single task(section)?

Answer - Not necessarily.

14
How OpenMP Handles parallel sections
•OpenMP creates a team of threads Case 2: Threads < Sections
•Each section is treated as an independent task Example:
3 sections, 2 threads
•Tasks are assigned to available threads Thread 0 → Section 1, then Section 3
So: Thread 1 → Section 2
•One section → exactly one thread Case 3: Threads = Sections
•One thread → may run one or multiple sections Example:
3 sections, 3 threads
Thread 0 → Section 1
Case 1: Threads ≥ Sections Thread 1 → Section 2
Example: Thread 2 → Section 3
3 sections, 4 threads

Thread 0 → Section 1
Thread 1 → Section 2
It is NOT fixed (not final).
Thread 2 → Section 3
It works dynamically based on thread availability.
Thread 3 → Idle

15
If four threads are created, will each thread receive the same set of
iterations?

16
Example Putting It Together
Useful Clauses #include <stdio.h>
• num_threads(n) → set number of threads. #include <omp.h>
• #pragma omp parallel num_threads(4) int main() {

• private(var) → each thread gets its own copy of int sum = 0;

variable. #pragma omp parallel for reduction(+:sum) num_threads(4)
• shared(var) → variable is shared across all threads. for(int i=1; i<=100; i++) {

• reduction(op:var) → safely combine results (like sum += i;

sum, min, max). }

#pragma omp parallel for reduction(+:sum) printf("Sum = %d\n", sum);

return 0;
reduction(+:sum)
}
reduction(*:prod)
reduction(max:maxval)
reduction(min:minval)

17
Explanation of output Possible Outputs
Run with 4 threads (export OMP_NUM_THREADS=4):
#include <stdio.h> Output (Run 1):
#include <omp.h> Hello from thread 0 out of 4 threads
Hello from thread 2 out of 4 threads
int main() { Hello from thread 1 out of 4 threads
#pragma omp parallel Hello from thread 3 out of 4 threads
Output (Run 2):
{
Hello from thread 1 out of 4 threads
int tid = omp_get_thread_num(); Hello from thread 3 out of 4 threads
int nthreads = omp_get_num_threads(); Hello from thread 0 out of 4 threads
Hello from thread 2 out of 4 threads
printf("Hello from thread %d out of %d
threads\n", tid, nthreads);
}
return 0;
}

18
Functions: omp_get_thread_num(), omp_get_num_threads()
1. omp_get_thread_num()
Definition: Returns the ID number of the thread within the parallel region.
Range: Always 0 to N-1 (where N = total number of threads).
Thread 0 is the master thread.
Usage: Identify which thread is doing the work.

2. omp_get_num_threads()
•Definition: Returns the total number of threads currently running in the parallel region.
•If called outside a parallel region → it will return 1 (only master thread is active).
•Usage: Know how many threads are participating.

19
Together
#include <stdio.h>
#include <omp.h>
int main() {
#pragma omp parallel
{
int tid = omp_get_thread_num(); // thread ID
int total = omp_get_num_threads(); // total threads
printf("Hello from thread %d out of %d\n", tid, total);
}
return 0;
}

20
for loop parallelization
Why Parallelize Loops?
• Many scientific/engineering programs spend most of their time in loops.
• If each iteration is independent, multiple threads can work on different iterations at the same time.
• OpenMP makes this very simple using:
#pragma omp parallel for
for (int i = 0; i < N; i++) {
// loop body (must be independent per iteration)
}

Each iteration of the loop is assigned to a thread.

Threads execute different parts of the loop concurrently.
At the end, there is an implicit barrier (all threads must finish before continuing).

21
Vector Addition
#include <stdio.h>
#include <omp.h>
int main() { Thread 0 computed c[0] = 0
int N = 10; Thread 1 computed c[1] = 3
int a[10], b[10], c[10]; Thread 3 computed c[3] = 9
// initialize arrays Thread 2 computed c[2] = 6
for (int i = 0; i < N; i++) { Thread 1 computed c[5] = 15
a[i] = i; Thread 0 computed c[4] = 12
b[i] = 2*i; ...
}
// parallel for loop
#pragma omp parallel for
for (int i = 0; i < N; i++) {
c[i] = a[i] + b[i];
printf("Thread %d computed c[%d] = %d\n",omp_get_thread_num(), i, c[i]);
}
return 0;
}

22
Scheduling policies: static, dynamic, guided
When you parallelize a loop using: #pragma omp parallel for
• OpenMP must decide which thread executes which iterations. This distribution is called a scheduling policy.
• OpenMP provides three main scheduling options: static, dynamic, and guided.

1. Static Scheduling
•How it works:
• The loop iterations are divided evenly into contiguous chunks before execution starts.
• Each thread gets a fixed set of iterations.
•Good for:
• Iterations that take equal time (uniform workload).
• Low scheduling overhead (assignments decided in advance).
•Directive: #pragma omp parallel for schedule(static, chunk_size)
If chunk_size is not specified → iterations are divided evenly among threads.
If specified → each thread gets chunks of that size.
23
Example
#pragma omp parallel for schedule(static, 2)
for (int i = 0; i < 10; i++) {
printf("Thread %d does iteration %d\n",
omp_get_thread_num(), i);
}

•With 4 threads and chunk_size=2:

• Thread 0 → iterations 0,1
• Thread 1 → iterations 2,3
• Thread 2 → iterations 4,5
• Thread 3 → iterations 6,7
• Remaining distributed similarly

24
2. Dynamic Scheduling
How it works:
•Iterations are given to threads in chunks at runtime.
•When a thread finishes its chunk, it requests the next available chunk.
Good for:
•Iterations that take different amounts of time.
•Helps balance workload across threads.
•Has higher overhead (threads ask for new work dynamically).
Directive: #pragma omp parallel for schedule(dynamic, chunk_size)
Default chunk size = 1 (each iteration assigned separately).
Example:
#pragma omp parallel for schedule(dynamic, 2)
for (int i = 0; i < 10; i++) {
printf("Thread %d does iteration %d\n",
omp_get_thread_num(), i);
}

Execution order will vary → whichever thread finishes first grabs the next chunk.

25
3. Guided Scheduling
How it works:
•Iterations are assigned in decreasing chunk sizes.
•Large chunks are assigned first, then smaller chunks later.
•This reduces overhead compared to dynamic (fewer assignments early).
Good for:
•Iterations with highly variable workload.
•Situations where early chunks can keep all threads busy, and later chunks are finer for balancing.
Directive: #pragma omp parallel for schedule(guided, chunk_size)
•Default chunk size = 1 (minimum chunk).
Example:
#pragma omp parallel for schedule(guided, 2)
for (int i = 0; i < 20; i++) {
printf("Thread %d does iteration %d\n", omp_get_thread_num(), i);
}
Distribution:
•First → large chunks (e.g., 10 iterations).
•Then → smaller chunks (e.g., 5, 3, 2…).
•Finally → chunk size not smaller than the specified minimum.
26
Comparison

Policy Assignment Overhead Best For

Static Pre-determined, equal blocks Very low Uniform workload
Dynamic Assigned on-the-fly Higher Irregular workload
Guided Large chunks first, smaller later Medium Highly variable workload

27
Race conditions (when multiple threads update shared data)
•A race condition occurs when two or more threads access and update shared data at the same time, and the final result
depends on the order of execution.
•Because threads run concurrently, the execution order is not guaranteed, so results can vary between runs.

#include <stdio.h> Final sum = 500500 // (1+2+...+1000)

#include <omp.h>

int main() {
int sum = 0;
Actual Results (varies):
#pragma omp parallel for Final sum = 471202
for (int i = 1; i <= 1000; i++) { Final sum = 498761
sum = sum + i; // shared variable! Final sum = 500500
}
printf("Final sum = %d\n", sum);
return 0;
}
28
Why Does This Happen?
•sum = sum + i; looks simple, but it’s not atomic.
•It expands into three steps:
• Load sum into register.
• Add i.
• Store result back to sum.
•If two threads execute these steps at the same time, they may overwrite each other’s updates.
•Result → lost updates → incorrect final sum.

29
Ways to Fix Race Conditions in OpenMP

• 1. Critical Section 2. Atomic Operation

#pragma omp parallel for
#pragma omp parallel for
for (int i = 1; i <= 1000; i++) {
for (int i = 1; i <= 1000; i++) {
#pragma omp atomic
#pragma omp critical
sum = sum + i;
{
}
sum = sum + i;
} Ensures the update to sum is performed atomically
} (without interruption).
Only one thread at a time can execute inside #pragma omp Faster than critical for simple updates.
critical.
Slower, but guarantees correctness.
30
3. Reduction Clause (Best for Summation)
int sum = 0;

#pragma omp parallel for reduction(+:sum)

for (int i = 1; i <= 1000; i++) {

sum = sum + i;

• Each thread keeps a private copy of sum.

• At the end, OpenMP automatically reduces (adds) all partial sums into the global sum.

• Efficient and clean.

31
Sections construct (different tasks in parallel)
•Used when threads should perform different tasks (work-sharing).
•Unlike parallel for, which splits iterations of one loop, sections assigns different code blocks to different threads.
•Each section is executed by only one thread.
•All sections inside a sections block are executed in parallel.

#pragma omp parallel sections

{
#pragma omp section
{
// Task 1
}
#pragma omp section
{
// Task 2
}
#pragma omp section
{
// Task 3
}
} 32
#include <stdio.h>
#include <omp.h>
Contd.. int main() {
#pragma omp parallel sections
{
#pragma omp section
{
printf("Task 1 executed by thread %d\n", omp_get_thread_num());
}

#pragma omp section

{
printf("Task 2 executed by thread %d\n", omp_get_thread_num());
}

#pragma omp section

{
printf("Task 3 executed by thread %d\n", omp_get_thread_num());
}
}
return 0;
} 33
Nested parallelism (parallel inside parallel)
In many OpenMP implementations, nested parallelism is disabled by default (inner parallel regions collapse into
serial execution) #include <stdio.h>
#include <omp.h>
omp_set_nested(1); // C function int main() {
omp_set_nested(1); // enable nested parallelism
Nested Parallel Regions
#pragma omp parallel num_threads(2)
{
int outer_tid = omp_get_thread_num();
printf("Outer thread %d\n", outer_tid);
Outer thread 0
Inner thread 0 (inside outer thread 0) #pragma omp parallel num_threads(3)
Inner thread 1 (inside outer thread 0) {
Inner thread 2 (inside outer thread 0) int inner_tid = omp_get_thread_num();
Outer thread 1 printf(" Inner thread %d (inside outer thread %d)\n",
Inner thread 0 (inside outer thread 1) inner_tid, outer_tid);
Inner thread 1 (inside outer thread 1) }
Inner thread 2 (inside outer thread 1) }
return 0; } 34
Collapse clause (nested loops parallelized)
• Used with nested loops inside a parallel for.
• It merges multiple loops into a single large iteration space.
• OpenMP then distributes these merged iterations across threads.
• This often improves load balancing and parallel efficiency.

#pragma omp parallel for collapse(n)

for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++) {
// loop body
}
}

Here collapse(2) tells OpenMP:

•Combine both loops (i and j) into one big loop with N*M iterations.
•Distribute those iterations among threads.

35
Example 1: Without Collapse Example 2: With Collapse
#pragma omp parallel for #pragma omp parallel for collapse(2)

for (int i = 0; i < 4; i++) { for (int i = 0; i < 4; i++) {

for (int j = 0; j < 4; j++) { for (int j = 0; j < 4; j++) {

printf("Thread %d does iteration (%d,%d)\n", printf("Thread %d does iteration (%d,%d)\n",

omp_get_thread_num(), i, j); omp_get_thread_num(), i, j);

} }

Now both loops are collapsed into 16 iterations total.

OpenMP parallelizes only the outer loop (i).
OpenMP distributes (i,j) pairs across all threads.
With 4 threads → each thread gets 1 full row of j
Much better load balancing since threads get mixed work.
iterations.
If workload inside inner loop varies, some threads may
finish earlier
36
Output Difference (with 4 threads)

Without Collapse: With Collapse:

Thread 0: (0,0) (0,1) (0,2) (0,3) Thread 0: (0,0) (0,1) (1,0) (1,1)
Thread 1: (1,0) (1,1) (1,2) (1,3) Thread 1: (0,2) (0,3) (1,2) (1,3)
Thread 2: (2,0) (2,1) (2,2) (2,3) Thread 2: (2,0) (2,1) (3,0) (3,1)
Thread 3: (3,0) (3,1) (3,2) (3,3) Thread 3: (2,2) (2,3) (3,2) (3,3)

37
Barrier, Master, Single directives (synchronization control)
Synchronization is the backbone of safe parallel programming.

#pragma omp parallel

{ What it does:
int tid = omp_get_thread_num(); •All threads stop here and wait.
printf("Thread %d reached step 1\n", tid);
•Execution resumes only when all threads reach the barrier.
#pragma omp barrier // all threads must arrive here Use case: To ensure all threads finish a phase before moving to

printf("Thread %d reached step 2\n", tid); the next.

}

38
Master

What it does:
Only the master thread executes this block.
Other threads skip it.
No implicit barrier at the end (different from single).
Use case: To perform tasks only once (like I/O, printing, initializing).

#pragma omp parallel

{
#pragma omp master
{
printf("This is executed only by master thread\n");
}
}

39
Single Directive
#pragma omp parallel
{
#pragma omp single
{
printf("This is executed by one thread only\n");
}
}
• What it does:
• Exactly one thread executes this block.
• Other threads skip it but wait at the implicit barrier (unless nowait is used).
• Use case: To ensure only one thread performs some task, but not necessarily the master.

40
Serial vs Parallel execution
Parallel Execution
Serial Execution
•Program runs using multiple threads/cores simultaneously.
•Program runs on a single thread (CPU core).
•Work is divided among threads, which execute concurrently.
•Each task is executed step by step, sequentially.
•Aim: reduce execution time and process larger datasets.
•Limitation: Only one task at a time → slow for large problems.
sum = 0; sum = 0;
for (i = 0; i < N; i++) { #pragma omp parallel for reduction(+:sum)
sum += a[i]; for (i = 0; i < N; i++) {
} sum += a[i];
}

Definition:
Speedup = (Execution time of Serial) ÷ (Execution time of Parallel)
Ideal Speedup:
•With P processors → maximum theoretical speedup = P.
•Example: If 4 processors → we hope to get ~4× faster.
Reality (Amdahl’s Law):
•Some part of the program is inherently serial (can’t be parallelized).
•Synchronization, communication, and overhead reduce actual speedup.
•Thus, speedup curve flattens as processors increase.
41
Assignment

• Parallelize sum of array

• Parallelize sorting
• Compare execution with different threads

42
Thank
You

Small aim is a crime; have great aim.

Bharat-Ratan A. P. J. Abdul Kalam 43

Open - MP Lecture 1
No ratings yet
Open - MP Lecture 1
28 pages
OpenMP Programming Basics Guide
No ratings yet
OpenMP Programming Basics Guide
20 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
37 pages
OpenMP Basics: Multi-Threading in C/C++
No ratings yet
OpenMP Basics: Multi-Threading in C/C++
13 pages
OpenMP Basics for Parallel Programming
No ratings yet
OpenMP Basics for Parallel Programming
11 pages
OpenMP for High-Performance Computing
No ratings yet
OpenMP for High-Performance Computing
35 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
30 pages
OpenMP Parallel Programming Examples
No ratings yet
OpenMP Parallel Programming Examples
19 pages
Lec4-Parallel Programming Models3
No ratings yet
Lec4-Parallel Programming Models3
22 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
34 pages
OpenMP Basics: Multi-Threading Explained
No ratings yet
OpenMP Basics: Multi-Threading Explained
43 pages
OpenMP Sections for Parallel Programming
No ratings yet
OpenMP Sections for Parallel Programming
153 pages
OpenMP: A Hands-on Tutorial
No ratings yet
OpenMP: A Hands-on Tutorial
153 pages
OpenMP: Shared Memory Parallelism Guide
No ratings yet
OpenMP: Shared Memory Parallelism Guide
76 pages
OpenMP Programming Overview by Nakano
No ratings yet
OpenMP Programming Overview by Nakano
10 pages
OpenMP Programming Model Overview
No ratings yet
OpenMP Programming Model Overview
89 pages
OpenMP: Parallel Programming Guide
No ratings yet
OpenMP: Parallel Programming Guide
40 pages
6CS005 Lecture6
No ratings yet
6CS005 Lecture6
38 pages
OpenMP Parallel Programming Basics
No ratings yet
OpenMP Parallel Programming Basics
10 pages
OpenMP for Shared Memory Programming
No ratings yet
OpenMP for Shared Memory Programming
35 pages
OpenMP: A Guide to Parallel Programming
No ratings yet
OpenMP: A Guide to Parallel Programming
54 pages
OpenMP Cheat Sheet for Parallel Programming
No ratings yet
OpenMP Cheat Sheet for Parallel Programming
61 pages
Open MP Programming
No ratings yet
Open MP Programming
15 pages
OpenMP Multithreading Tutorial
100% (1)
OpenMP Multithreading Tutorial
82 pages
Cornel
No ratings yet
Cornel
87 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
41 pages
Introduction to OpenMP Programming
No ratings yet
Introduction to OpenMP Programming
35 pages
OpenMP Parallel Programming Lab Guide
No ratings yet
OpenMP Parallel Programming Lab Guide
11 pages
OpenMP Programming Assignment Report
No ratings yet
OpenMP Programming Assignment Report
19 pages
Parallel Programming Essentials Guide
No ratings yet
Parallel Programming Essentials Guide
50 pages
VTU Parallel Computing Lab Manual
No ratings yet
VTU Parallel Computing Lab Manual
20 pages
Introduction to OpenMP Basics
No ratings yet
Introduction to OpenMP Basics
152 pages
Open MP 2022
No ratings yet
Open MP 2022
76 pages
Assighment Lab Parallel
No ratings yet
Assighment Lab Parallel
24 pages
Open MP
No ratings yet
Open MP
13 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
65 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
27 pages
Openmp Trepozioidal
No ratings yet
Openmp Trepozioidal
82 pages
OpenMP Parallel Programming Guide
No ratings yet
OpenMP Parallel Programming Guide
4 pages
OpenMP: Multi-Processing API Guide
No ratings yet
OpenMP: Multi-Processing API Guide
13 pages
OpenMP Reduction Clause Benefits
No ratings yet
OpenMP Reduction Clause Benefits
71 pages
OpenMP Basics: Parallel Programming Guide
No ratings yet
OpenMP Basics: Parallel Programming Guide
9 pages
OpenMP Lab Manual for C/C++ Programming
No ratings yet
OpenMP Lab Manual for C/C++ Programming
9 pages
OpenMP Shared-Memory Programming Guide
No ratings yet
OpenMP Shared-Memory Programming Guide
112 pages
OpenMP Shared Memory Programming Guide
No ratings yet
OpenMP Shared Memory Programming Guide
34 pages
OpenMP for Parallel Programming Guide
No ratings yet
OpenMP for Parallel Programming Guide
10 pages
OpenMP Fork-Join Model Explained
No ratings yet
OpenMP Fork-Join Model Explained
24 pages
OpenMP Basics for Shared-Memory Programming
No ratings yet
OpenMP Basics for Shared-Memory Programming
48 pages
Lecture 10-11-12
No ratings yet
Lecture 10-11-12
159 pages
Introduction to High Performance Computing
No ratings yet
Introduction to High Performance Computing
130 pages
Association Rules in Itemset Analysis
No ratings yet
Association Rules in Itemset Analysis
2 pages
Advanced Database Management Syllabus
100% (1)
Advanced Database Management Syllabus
1 page
Beamer Presentation on Graphs
No ratings yet
Beamer Presentation on Graphs
14 pages
Class 10 AI Notes & Important Q&A
No ratings yet
Class 10 AI Notes & Important Q&A
8 pages
Securing Public Cloud with SSA
No ratings yet
Securing Public Cloud with SSA
6 pages
DBMS Advantages and Disadvantages
100% (2)
DBMS Advantages and Disadvantages
2 pages
Backbone.js Integration with Drupal
No ratings yet
Backbone.js Integration with Drupal
58 pages
Reinforcement Learning in Finance
No ratings yet
Reinforcement Learning in Finance
47 pages
SM-T561 Level 3 Repair Guide
No ratings yet
SM-T561 Level 3 Repair Guide
47 pages
Dynamic Programming for Unit Commitment
No ratings yet
Dynamic Programming for Unit Commitment
14 pages
AIFZ 2024: Call for Papers
No ratings yet
AIFZ 2024: Call for Papers
2 pages
ITI Consolidated Marksheet 2019
No ratings yet
ITI Consolidated Marksheet 2019
1 page
Hospital Management Information Systems in Health Sector and Development in Turkey
No ratings yet
Hospital Management Information Systems in Health Sector and Development in Turkey
15 pages
Array ADT Implementation in C
No ratings yet
Array ADT Implementation in C
8 pages
1-2-3 Numbering System for Defensive Linemen
No ratings yet
1-2-3 Numbering System for Defensive Linemen
6 pages
R Vectors, Lists, Matrices, and Arrays Guide
No ratings yet
R Vectors, Lists, Matrices, and Arrays Guide
4 pages
UOttawa Electrical Engineering Curriculum
No ratings yet
UOttawa Electrical Engineering Curriculum
4 pages
Sleep vs. Wait in Java Threads
No ratings yet
Sleep vs. Wait in Java Threads
10 pages
32-Bit Prefix Adder Design Assignment
No ratings yet
32-Bit Prefix Adder Design Assignment
9 pages
Wiegand To RS232 Converters
No ratings yet
Wiegand To RS232 Converters
4 pages
Civil Engineering Programming Course Overview
No ratings yet
Civil Engineering Programming Course Overview
15 pages
DFA Minimization and Lexical Analyzers
No ratings yet
DFA Minimization and Lexical Analyzers
26 pages
Microcontroller Timer Delay Techniques
No ratings yet
Microcontroller Timer Delay Techniques
18 pages
Agile Core Banking Implementation With Oracle FlexCube
No ratings yet
Agile Core Banking Implementation With Oracle FlexCube
2 pages
Hasil Uji Univariat Statistik
No ratings yet
Hasil Uji Univariat Statistik
3 pages
Graphics Functions for Shape Drawing
No ratings yet
Graphics Functions for Shape Drawing
19 pages
Bec 23 CRT Modules GCP
No ratings yet
Bec 23 CRT Modules GCP
7 pages
20762C ENU TrainerHandbook PDF
100% (1)
20762C ENU TrainerHandbook PDF
636 pages
Vdocuments - MX Wiley Computer Science Information Technology 2013 Catalog
No ratings yet
Vdocuments - MX Wiley Computer Science Information Technology 2013 Catalog
44 pages
Application of A Simplex Method To Find The Optimal Solution
100% (1)
Application of A Simplex Method To Find The Optimal Solution
4 pages

Lecture 10

Uploaded by

Lecture 10

Uploaded by

Introduction to Distributed and Parallel

Dr. Sanjay Saxena

• OpenMP → Shared memory (one machine, many cores)

• MPI → Distributed memory (many machines, connected via network)

• CUDA → GPU computing (thousands of lightweight cores)

Why does the order of printed lines change every time we

int main() { • First line runs only by master thread.

for / parallel for #pragma omp section

#pragma omp parallel sections #pragma omp section

Answer - Not necessarily.

• private(var) → each thread gets its own copy of int sum = 0;

• reduction(op:var) → safely combine results (like sum += i;

#pragma omp parallel for reduction(+:sum) printf("Sum = %d\n", sum);

Each iteration of the loop is assigned to a thread.

•With 4 threads and chunk_size=2:

Policy Assignment Overhead Best For

#include <stdio.h> Final sum = 500500 // (1+2+...+1000)

• 1. Critical Section 2. Atomic Operation

#pragma omp parallel for reduction(+:sum)

for (int i = 1; i <= 1000; i++) {

• Each thread keeps a private copy of sum.

• Efficient and clean.

#pragma omp parallel sections

#pragma omp section

#pragma omp section

#pragma omp parallel for collapse(n)

Here collapse(2) tells OpenMP:

for (int i = 0; i < 4; i++) { for (int i = 0; i < 4; i++) {

for (int j = 0; j < 4; j++) { for (int j = 0; j < 4; j++) {

printf("Thread %d does iteration (%d,%d)\n", printf("Thread %d does iteration (%d,%d)\n",

omp_get_thread_num(), i, j); omp_get_thread_num(), i, j);

Now both loops are collapsed into 16 iterations total.

Without Collapse: With Collapse:

#pragma omp parallel

printf("Thread %d reached step 2\n", tid); the next.

#pragma omp parallel

• Parallelize sum of array

Small aim is a crime; have great aim.

You might also like