0% found this document useful (0 votes)
12 views67 pages

Structured Patterns in Parallel Programming

The document discusses structured parallel programming patterns, which are codifications of best practices that enable reusable solutions for specific problems in parallel algorithm design. It covers various control flow patterns, parallel control patterns, data management patterns, and advanced patterns, emphasizing their applications and benefits in creating scalable and maintainable systems. The conclusion highlights the importance of exploring real-world applications and experimenting with frameworks like OpenMP and TBB to enhance parallel programming skills.

Uploaded by

winnerdam47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views67 pages

Structured Patterns in Parallel Programming

The document discusses structured parallel programming patterns, which are codifications of best practices that enable reusable solutions for specific problems in parallel algorithm design. It covers various control flow patterns, parallel control patterns, data management patterns, and advanced patterns, emphasizing their applications and benefits in creating scalable and maintainable systems. The conclusion highlights the importance of exploring real-world applications and experimenting with frameworks like OpenMP and TBB to enhance parallel programming skills.

Uploaded by

winnerdam47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Patterns

Based on Structured Parallel Programming

Dr.-Ing Frezewd Lemma


Goal

• Covers Patterns

1
Introduction to Patterns
What Are Patterns? i

• Codification of best practices in software engineering.


• Enable reusable solutions for specific problems.
• In parallel programming: Task distribution and data access
combinations that solves a specific problem in parallel
algorithm design.
• We give each pattern a specific name, which makes it much
easier to succinctly describe, discuss, and compare various
parallel algorithms.
• Algorithms are typically designed by composing patterns
• Provides you a high-level vocabulary for designing your own
algorithms and for understanding other people’s algorithms

2
Why Use Patterns? i

• Universality: Not tied to hardware or language.


• Simplicity: Provide a clear structure for designing algorithms.
• Determinism: Ensures repeatable results for easier debugging
and testing.

3
Nesting Patterns
Nesting Patterns i

• Fundamental compositional pattern in serial and parallel


programming.
• Enables hierarchical and recursive composition of patterns.
• Supports modular and scalable code structures.
• Most programming models, at present support only a fixed
number of nesting levels and may even map these levels
directly onto hardware components.
• Mapping the hierarchy of the program directly onto the
hardware hierarchy also makes code less future-proofed.
• Cilk Plus and TBB support arbitrary nesting while efficiently
mapping parallelism to physical hardware.

4
Nesting Patterns ii

• Examples of systems with limited nesting include:


◦ OpenCL, CUDA, C++ AMP, OpenMP

5
Nesting Example

Figure 1: Example of Nesting Patterns

6
Structured Serial Control Flow
Patterns
Control Flow Patterns: Sequence i

• Tasks execute in a specified order.


• Enforces ordered side effects (e.g., output).
• Example: ‘T = f(A); S = g(T); B = h(S);‘
• restrict the order to be the same as the texture order.
• T = f(A); S = g(A); B = h(S,T);
• the sequence pattern would still require executing g after f
• Superscalar Sequence : a parallel generalization of sequence

7
Control Flow Patterns: Selection i

• Executes one of two branches based on a condition.


• Exactly one branch is executed, ensuring mutual exclusivity.
• Example:
if (c) { a; } else { b; }

• There is a parallel generalization of selection, the speculative


selection pattern, which execute all of a, b, and c in parallel,

8
Control Flow Patterns: Iteration i

• Executes a task repeatedly while a condition holds true.


• The loop body typically modifies a state that the condition
uses for termination testing.
• Iteration assumes data dependencies between loop iterations
are manageable, although parallelization may be complicated
by these dependencies.
• Can use ‘for‘ or ‘while‘ loops.
• Example: ‘while (c) a; ‘

9
Control Flow Patterns: Recursion i

• A function calls itself, directly or indirectly, , enabling nested


tasks.
• It is usually associated with stack-based memory allocation or,
if higher-order functions are supported, closures (see Section
3.4.4) which are objects allocated on the heap.
• Tail recursion is a special form of recursion that can be
converted into iteration
• Examples:
◦ Factorial calculation
◦ Fibonacci sequence
◦ Divide-and-conquer algorithms (e.g., quicksort,
mergesort)

10
Control Flow Patterns: Recursion ii

• Dynamic form of nesting often implemented with stack-based


memory.

11
Parallel Control Patterns
Parallel Control Patterns: Introduction i

• Parallel control patterns extend the serial control patterns


• Parallel control patterns relaxes the assumptions of the serial
control patterns in various ways

12
Fork–Join i

• Splits control flow into parallel tasks and later joins them.
• Applications
◦ Used in frameworks like OpenMP and Cilk Plus.
◦ Divide and Conquer Algorithms: QuickSort: Divide the
array into parts, sort them in parallel, and join the
results. MergeSort: Recursively divide the array, sort the
halves, and merge them in parallel.
◦ Graph algorithms, Dynamic programming etc

13
Map Pattern i

• Applies a function to each element of a collection


independently.

Figure 2: The Map pattern

• Produces a new collection of the same shape.


• Embarrassingly parallel (no dependencies between operations).
• No communication or synchronization needed between tasks.

14
Map Pattern ii

• Ideal for SIMD (Single Instruction, Multiple Data)


architectures.
• Coarse-grained parallelism.
• Scales well with the number of processing units.
• Appllications
◦ Image Processing:
∗ Gamma correction.
∗ Thresholding.
∗ Color space conversions.
◦ Monte Carlo Sampling:
∗ Generating random samples for simulations.
◦ Ray Tracing:

15
Map Pattern iii

∗ Calculating pixel colors in rendering algorithms.


◦ Image filtering, array transformations, or functional
programming.

16
Stencil Pattern i

• Generalization of map pattern with neighborhood access.


• Applications:
◦ Image Filtering:
∗ Convolution.
∗ Median filtering.
◦ Video Processing:
∗ Motion estimation in video encoding.
◦ Scientific Simulations:
∗ Fluid flow simulations.
∗ Electromagnetic partial differential equation (PDE)
solvers.

17
Stencil Pattern ii

∗ Lattice quantum chromodynamics (QCD).


∗ Cellular automata (e.g., lattice Boltzmann methods).
◦ Noise Reduction:
∗ Isotropic diffusion.

18
Reduction i

• Combines all elements of a collection into a single value.


• Uses an associative combiner function (e.g., addition,
multiplication).
• Applications
◦ Numerical Applications:
∗ Summing elements of a collection.
∗ Dot products in matrix multiplication.
∗ Averaging Monte Carlo samples for integration.
◦ Convergence Testing:
∗ Used in iterative solutions of linear equations (e.g.,
conjugate gradient methods).

19
Reduction ii

◦ Image Processing:
∗ Image comparison metrics (e.g., in video encoding).
◦ Other Mathematical Operations:
∗ Maximum, minimum, multiplication, Boolean
AND/OR/XOR, etc.

Pseudocode Example
function reduce(array):
if [Link] == 1: return array[0]
else:
left = reduce(array[:mid])
right = reduce(array[mid:])
return combine(left, right)

20
Scan i

• Computes all partial reductions of a collection.


• Produces an array where each element is the reduction of all
previous elements.
• Applications:
◦ Integration:
∗ Computing cumulative sums or other cumulative
operations.
◦ Financial Modeling:
∗ Sequential decision simulations in option pricing.
◦ Random Number Generation:

21
Scan ii

∗ Generating sequences of random numbers (if


traditional pseudorandom generators are
parallelized).
◦ Data Packing:
∗ Combined with scatter for efficient memory
management.
• Associativity of the operation is crucial for parallelization.

22
Recurrence i

• Generalization of iteration where loop iterations depend on


previous computations.
• Causality: There must be a serial ordering for computation.
• Parallelization:
◦ Associative operations (e.g., prefix sums) can be
parallelized.
◦ Multi-dimensional recurrences are often parallelizable
using sweeps.
• Applications:
◦ Matrix Factorization:
∗ Numerical linear algebra.

23
Recurrence ii

◦ Image Processing:
∗ Advanced filtering operations.
◦ PDE Solvers:
∗ Solving partial differential equations in scientific
computations.
◦ Sequence Alignment:
∗ Used in bioinformatics and computational biology.

24
Serial Data Management Patterns
Random Read and Write i

• Uses pointers or indices for direct memory access.


• Challenges: Aliasing can complicate parallelization.

25
Heap and Stack Allocation i

• Stack: LIFO memory allocation (fast and locality-preserving).


• Heap: Dynamic allocation (slower, can fragment memory).

26
Closures and Objects i

• Closures
◦ Closures are function objects that can be constructed and
managed like data. Lambda functions
◦ When closures are built, they can often be used to
“capture” the state of non-local variables that they
reference
• Objects
◦ Objects are a language construct that associate data with
the code to act on and manage that data.
◦ Multiple functions may be associated with an object and
these functions are called the methods or member
functions of that object.

27
PARALLEL DATA MANAGEMENT
PATTERNS
Pack i

• Eliminates unused or irrelevant elements in a collection.


• Input elements marked with Boolean values:
◦ True: Element is retained.
◦ False: Element is discarded.
• Output: Contiguous sequence of selected elements in the
same order.
• Common Applications:
◦ Filtering results in image processing.
◦ Collision detection in simulations.

28
Pipeline i

• Connects tasks in a producer-consumer relationship.


• Each stage processes data and passes results to the next.
• All stages can run simultaneously, maintaining independent
states.
• Common Applications:
◦ Video encoding and decoding.
◦ Real-time data processing (e.g., audio streams).

29
Geometric Decomposition i

• Divides data into subcollections (overlapping or


non-overlapping).
• Example: Partitioning domains for stencil computations.

30
Partition Pattern i

• Special case of geometric decomposition with non-overlapping


regions.
• Common in matrix operations and image processing.

31
Gather i

• Collects elements from an input collection using an index


array.
• The shape of the output collection matches the index array.
• Optimizations:
◦ Coherent patterns (e.g., shifting or stencil) are more
efficient.
• Common Applications:
◦ Sparse matrix operations.
◦ Ray tracing and volume rendering.

32
Scatter i

• Writes input data to specific locations in a collection using an


index array.
• The inverse of the Gather pattern.
• Challenges:
◦ Handling collisions when multiple writes target the same
index.
◦ Requires a deterministic resolution strategy.
• Common Applications:
◦ Writing results of simulations.
◦ Updating sparse matrices.

33
Scatter ii

34
Advanced Patterns
Superscalar Sequences i

• Allows tasks to execute in parallel as long as data


dependencies are met.
• Example: Out-of-order execution in modern processors.

35
Futures and Speculative Selection i

• Futures: Non-hierarchical task management.


• Speculative Selection: Parallel execution of branches with
cancellation.

36
Workpile i

• Generalization of the Map pattern where tasks can


dynamically generate more tasks.
• Tasks are added to a “workpile” and processed as they arrive.
• Commonly used in:
◦ Recursive algorithms like tree searches.
◦ Dynamic workloads where task count is not predefined.
• Challenges:
◦ Efficient task scheduling.
◦ Avoiding bottlenecks in task generation.

37
Workpile ii

38
Search i

• Finds elements in a collection that meet specific criteria.


• Criteria can vary from simple key matches to complex
logical/arithmetic constraints.
• Applications:
◦ Databases: Relational queries (e.g., SQL).
◦ AI: Pathfinding and heuristic searches.
• Optimization:
◦ Sorting data for faster searches.
◦ Using parallel algorithms for large datasets.

39
Segmentation i

• Divides a collection into multiple segments for independent


processing.
• Each segment can be treated as a separate collection.
• Example:
◦ Summing elements in segments of an array.
• Applications:
◦ Multi-threaded data processing.
◦ Data-parallel algorithms in distributed systems.

40
Expand i

• Creates new data elements or collections based on an existing


one.
• The number of outputs is not predetermined and can vary
based on input.
• Common Applications:
◦ Generating neighboring states in simulations.
◦ Recursive enumeration (e.g., generating permutations or
combinations).
• Challenges:
◦ Managing memory for dynamically generated outputs.
◦ Ensuring scalability with irregular data generation.

41
Expand ii

42
Category Reduction i

• Groups data into categories and applies a reduction operation


within each category.
• Generalization of the Reduction pattern.
• Common Applications:
◦ Grouping sales data by region and computing total
revenue.
◦ Histograms and frequency counts.
• Challenges:
◦ Handling a large number of categories efficiently.
◦ Balancing load in parallel computations.

43
Category Reduction ii

44
PROGRAMMING MODEL
SUPPORT FOR PATTERNS
Programming Model Supports for Patterns i

• TBB: Nesting, Recursion, Fork-Join, Map, Workpile,


Reduction, Scan, Pipeline, Speculative Selection, Branch and
Bound
• OpenMP: Map, Workpile, Reduction, Fork-Join, Stencil,
Gemotric Decompostion, Gather, Scatter
• OpenCL: Map, Gather, Scatter, Reduction, Scan, Pack,
Expand, Stencil, Workpile, Superacalar Sequences, Geometric
Decompostion, Closures

45
Conclusion
Summary i

• Patterns provide reusable solutions for parallel programming.


• Enable scalable, maintainable, and deterministic designs.
• Foundation for developing high-performance systems.

46
Next Steps i

• Explore applications of patterns in real-world systems.


• Experiment with frameworks like OpenMP and TBB.
• Practice composing patterns for complex parallel algorithms.

47
The Map Pattern Implementation i

Definition:

• Applies a function to every element of a data collection in


parallel.
• Executes function invocations independently with no side
effects.

Key Characteristics:

• Foundation for parallelism with no interdependencies.


• Enables maximum concurrency with flexible scheduling.
• Basis of many efficient applications and patterns.

Applications:

48
The Map Pattern Implementation ii

• Embarrassingly parallel tasks and unrelated problem-solving.


• Combined with other patterns like gather, reduction, and scan.

Implementation Considerations:

• Avoids using separate threads for lightweight tasks.


• Parallelizes synchronization overhead efficiently.
• Supports vectorization and serial control flow nesting.

Examples:

• Simplified implementations using TBB, Cilk Plus, OpenMP,


etc.
• Map-Reduce as a practical extension (e.g., Google’s systems).

49
Implementing the Map Pattern Using Different Programming
Models i

• Example Problem: SCALED VECTOR ADDITION - SAXPY


◦ Definition: SAXPY stands for ”Single-Precision A·X
Plus Y.” It performs a scaled vector addition where a
vector x is scaled by a scalar a and added to another
vector y , elementwise.

y ← ax + y

◦ Components:
∗ Inputs: Vector x, scalar a, and vector y .
∗ Output: Updated vector y , with old values
overwritten.

50
Implementing the Map Pattern Using Different Programming
Models ii

◦ Use Cases: Common in linear algebra operations, such


as row cancellation in Gaussian elimination.
◦ Uniform and Varying Parameters:
∗ Uniform Parameters: Scalar a, which remains
constant across operations.
∗ Varying Parameters: Elements of vectors x and y ,
which vary across operations.
◦ Elemental Function Representation: The operation
for each element can be described as:

f (t, p, q) = tp + q

where t = a, p = xi , and q = yi .
51
Implementing the Map Pattern Using Different Programming
Models iii

◦ Parallelization:
∗ Suitable for parallel implementation using the map
pattern, as each iteration is independent.
∗ Arithmetic intensity is low, which limits scalability,
making SAXPY less suitable for highly parallel
systems compared to more complex operations.
◦ Scalability: As a Level 1 BLAS routine, SAXPY does
not scale well due to low arithmetic intensity.
Higher-level BLAS routines (Level 2 and 3) with more
work per unit of parallelism scale better.

52
Implementing the Map Pattern Using Different Programming
Models iv

◦ Implementation Examples: Demonstrated in multiple


parallel programming models, including TBB, OpenMP,
and OpenCL.
◦ Serial Implementation: Provides a baseline, where each
element is processed sequentially in a loop, with
independent iterations.
• Porgramming Models
◦ Serial
◦ TBB
◦ OpenMp
◦ OpenCl

53
Serial Implementation

Figure 3: SAXPY Serial Implementation


54
Map With TBB

Figure 4: SAXPY TBB Implementation 55


Map With OpenMP

Figure 5: SAXPY OpenMP Implementation

56
Map With OpenCl

Figure 6: SAXPY OpenCL Implementation

57

You might also like