Parallel Processors
1. Taxonomy and Topology
Taxonomy means classification — in parallel computing, systems are classified
based on how they handle instructions and data.
The most famous classification is Flynn’s Taxonomy:
1. Shared Memory Multiprocessors
In a shared memory system, all processors access a common global memory.
Characteristics:
• Memory is shared, accessible by all processors.
• Communication is done through reading/writing shared variables.
• Suitable for fine-grained parallelism (tight synchronization).
Types:
1. Uniform Memory Access (UMA):
o All processors have equal access time to memory.
o Example: symmetric multiprocessors (SMPs).
2. Non-Uniform Memory Access (NUMA):
o Access time varies based on memory location (local vs remote).
o Example: modern multicore systems like AMD EPYC, Intel Xeon.
2. Distributed Memory Networks
In distributed memory systems, each processor has its own local memory and
communicates via message passing.
Characteristics:
• No shared memory.
• Communication via message passing (MPI).
• Highly scalable and used in clusters or supercomputers.
Examples:
• Computer clusters
• Cloud computing servers
• GPU networks
Advantages:
• Scalable for large systems.
• Each node can operate independently.
Challenges:
• Harder to program due to explicit communication.
• Synchronization and data consistency must be manually handled.
Processor Organization
Defines how processors and memory units are structured and connected.
• Centralized organization: One main memory shared by all processors.
• Distributed organization: Each processor has its own memory and
communicates through a network.
• Hybrid organization: Combines shared and distributed memory models.
Static Interconnection Network
Static interconnection networks are fixed. In a unidirectional static interconnection
network connections between nodes allow communication to occur in only one
direction. So the data can be transmitted from one node to another node but not in the
reverse direction. However, in a bidirectional static interconnection network, the
connection between nodes allows communication to occur in both directions. The
choice between both connections depends on the specific requirements of the parallel
computing system.
There are also two type of Static Interconnection Network
Embeddings and Simulations
1. Embedding in Parallel Processing
Definition
In parallel computing, embedding means mapping one network (or topology) into another
so that a parallel algorithm designed for one architecture can run efficiently on another.
In simpler terms:
“Embedding is the process of placing (mapping) the nodes and links of one parallel
computer network (called the guest) onto another network (called the host) in an efficient
way.”
The Embedding is a function that maps the nodes and edges of the source graph (G) onto
the nodes and paths of the host graph (H).
Embedding: G->H
Why Embedding is Needed
• Different parallel computers have different topologies (like mesh, ring, hypercube,
etc.).
• A parallel algorithm may be optimized for one topology, but your hardware might
use another.
• Instead of redesigning the algorithm, we embed one topology into another.
Example:
If an algorithm was designed for a ring network, but your computer uses a mesh topology,
you embed the ring into the mesh and execute it there.
Applications of Embedding
• To simulate algorithms from one architecture on another.
• To design universal architectures that can support multiple algorithm types.
• To measure efficiency of network topologies for various applications.
2. Simulation in Parallel Processing
Definition
Simulation is the process of using one parallel architecture (the host) to imitate or reproduce
the behavior of another (the guest).
In short:
Simulation allows one type of parallel computer to behave as if it were another type.
Why Simulation is Needed
• Algorithms are often designed for specific architectures (e.g., hypercube, mesh,
shared memory).
• Real hardware may have a different structure.
• By simulating, we can execute algorithms meant for one model on another without
rewriting them.
Types of Simulation
1. Network Simulation – Simulating one network topology on another (e.g., hypercube
on mesh).
2. Memory Simulation – Simulating shared memory using distributed memory (or vice
versa).
3. Algorithmic Simulation – Running an algorithm designed for one model on another.
Example
Suppose we want to simulate a shared memory system on a distributed memory machine.
We can:
• Use message passing to emulate shared variables.
• Create a memory consistency protocol so that updates appear synchronized.
• Each processor acts as if it has access to all memory, even though physically, data is
exchanged by messages.
This is exactly how MPI + OpenMP hybrid systems work in practice.
In short
• Embedding = Mapping (structural relation)
• Simulation = Behavior imitation (functional execution)
• A good embedding leads to an efficient simulation.
•
1. Shared Memory Programming
Concept
In shared memory programming, multiple processors or threads share a single, common
memory space.
• Every processor can directly read and write to the same memory.
• Communication happens implicitly through memory — no explicit messages.
• Used in multicore processors, SMP (Symmetric Multiprocessing) systems.
Architecture
+---------+
| Memory |
+---------+
/ | \
/ | \
CPU1 CPU2 CPU3
->All CPUs access the same global memory via a bus or interconnection network.
-> Each CPU may also have local cache for speed.
Advantages
Simple to program (shared variables).
Fast communication (no network overhead).
Good for fine-grained parallel tasks.
Disadvantages
Limited scalability (can’t grow beyond one machine easily).
Synchronization overhead.
Debugging race conditions is difficult.
2. Distributed Memory Programming
Concept
In distributed memory programming, each processor has its own private local memory.
Processors communicate by explicitly sending and receiving messages.
• No global address space.
• Used in clusters, supercomputers, cloud computing, HPC (High-Performance
Computing.
Architecture
Diagram concept:
+---------+ +---------+ +---------+
| CPU1 |<--->| CPU2 |<--->| CPU3 |
| Memory1 | | Memory2 | | Memory3 |
+---------+ +---------+ +---------+
Each node has:
• Its own CPU(s)
• Its own memory
• A network interface to communicate with other nodes
Features
• Each process has its own memory space.
• Communication done via message passing.
• You must partition data and explicitly coordinate between nodes.
Advantages
Highly scalable (can run on thousands of nodes).
Each node has its own memory — no contention.
Fault tolerance easier (failed node can be restarted).
Disadvantages
Programmer must handle communication manually.
Higher latency (network delay).
Harder to debug and maintain.
Object-Oriented Programming (OOP)
What it is:
OOP is a way of programming where we think in terms of objects rather than just
instructions.
OOP is a style of programming where everything is represented as objects. Each object has:
• Data (called attributes or properties)
• Behavior (called methods or functions)
Main Concepts of OOP:
1. Class:
o A blueprint or template for creating objects.
o Example: class Car { int speed; void drive() {} }
2. Object:
o A real instance of a class.
o Example: Car myCar = new Car();
3. Encapsulation:
o Keeping data (attributes) safe inside the object.
o Only allow access through methods.
4. Inheritance:
o A class can use features (properties & methods) of another class.
o Example: class ElectricCar extends Car {}
5. Polymorphism:
o Objects can behave in multiple ways.
o Example: Method drive() works differently for Car and Bike.
6. Abstraction:
o Show only important details, hide unnecessary details.
o Example: Using a remote without knowing how it works internally.
Example in Real Life:
• Class: Car
• Object: Your red Honda
• Attributes: Color, speed
• Methods: Start(), Stop(), Honk()
SYNTAX ->class Car {
public:
string color; // Attribute
int speed; // Attribute
// Method (Behavior)
void drive() {
cout << "Car is driving at speed " << speed << endl;
}
};
Advantages of OOP:
• Makes code reusable, modular, and easy to maintain.
• Models real-world objects and protects data.
Disadvantages of OOP:
• Can be complex, use more memory, and slower for small programs.
1. Data Parallel Programming
Definition:
• A type of parallel programming where the same operation is performed
simultaneously on multiple data elements.
Key Points:
• Focuses on data, not tasks.
• Each processor works on a portion of the data.
• Common in scientific computing, image processing, and simulations.
Example:
• Adding two arrays element by element in parallel.
2. Functional Programming
Definition:
• A programming style where programs are constructed using pure functions and
immutable data.
• Functions do not change state or have side effects.
Key Points:
• Emphasizes declarative style (what to do, not how).
• Functions can be passed as arguments, returned from other functions.
• Examples: Haskell, Scala, parts of Python, JavaScript.
3. Data Flow Programming
Definition: A programming paradigm where the program is represented as a graph of data
flowing between operations.
• Execution depends on availability of input data, not the order of instructions.
Key Points:
• Good for parallelism and reactive systems.
• Each node performs an operation and sends output to other nodes.
• Examples: LabVIEW, Apache NiFi.
Example:
• A graph where sensor data flows → filtered → analyzed → stored.
In short:
• Data Parallel: Same operation on many data items simultaneously.
• Functional: Uses pure functions, avoids changing data.
• Data Flow: Computation happens as data moves through a network of operations.
1. Scheduling in Parallel Programs
Definition:
Scheduling in parallel programming is the process of deciding the order in which tasks (or
threads) are executed on processors to improve efficiency and reduce execution time.
Goals:
• Minimize total execution time.
• Maximize processor utilization.
• Balance workload among processors.
Types of Scheduling:
1. Static Scheduling:
o Task assignments are made before execution.
o Works well when task sizes and execution times are predictable.
o Pros: Simple, less overhead.
o Cons: Not flexible if task times vary.
2. Dynamic Scheduling:
o Tasks are assigned during execution as processors become free.
o Handles unpredictable workloads efficiently.
o Pros: Flexible, better load balancing.
3.
o Cons: More overhead due to runtime decision-making.
2. Loop Scheduling in Parallel Programs
Definition:
Loop scheduling is a technique in parallel programming where iterations of a
loop are divided among multiple processors. This is important because loops
often dominate computation in scientific and data-intensive programs.
Common Loop Scheduling Methods:
1. Static (Block) Scheduling:
o Divide loop iterations into equal blocks and assign each block to a processor.
o Example: 16 iterations, 4 processors → each gets 4 iterations.
2. Cyclic (Round-Robin) Scheduling:
o Assign iterations one by one in a round-robin fashion.
o Example: Processor 1 gets iterations 1,5,9… Processor 2 gets 2,6,10…
3. Dynamic (Chunked) Scheduling:
o Divide loop into chunks and assign them to processors as they become free.
o Good for loops with varying iteration times.
Advantages of Loop Scheduling:
• Balances workload among processors.
• Reduces idle time and improves performance.
Parallelization of Sequential Programs
Definition:
Parallelization is the process of converting a sequential (one step at a time) program into a
parallel program so that multiple tasks can execute simultaneously on multiple processors.
Why Parallelize:
• To reduce execution time.
• To utilize multiple processors efficiently.
• To handle large data or complex computations faster.
Steps to Parallelize a Sequential Program:
1. Identify Independent Tasks:
o Find parts of the program that can run simultaneously without depending on
each other.
o Example: In a loop, iterations that do not affect each other.
2. Data Decomposition:
o Divide data into chunks to be processed by different processors.
o Example: Splitting an array into parts for addition.
3. Task Decomposition:
o Divide the program into independent tasks or functions that can run in
parallel.
4. Synchronization:
o Ensure proper coordination if tasks share data to avoid conflicts.
5. Implement Parallel Code:
o Use parallel programming constructs like threads, OpenMP, MPI, or GPU
kernels.
Benefits:
• Faster execution.
• Better resource utilization.
Challenges:
• Data dependencies may prevent full parallelization.
• Synchronization overhead.
How Parallel Programming Environment Supports Programs
A parallel programming environment provides the tools, libraries, and systems needed to
make writing and running parallel programs easier and efficient. It helps in the following
ways:
1. Programming Support:
o Provides languages or extensions that let programmers write parallel code
easily.
o Example: OpenMP, MPI, CUDA.
2. Task Management & Scheduling:
o Assigns tasks to processors efficiently.
o Ensures tasks run in parallel without conflicts.
3. Communication & Synchronization:
o Provides mechanisms to share data safely among tasks or processors.
o Examples: Message passing, locks, semaphores, barriers.
4. Runtime Support:
o Handles execution of parallel tasks, monitors progress, balances workload.
5. Debugging & Profiling Tools:
o Detects errors like race conditions and improves performance.
6. Hardware Abstraction:
o Hides hardware complexity (multi-core CPUs, clusters, GPUs) so programmers
can focus on parallel logic.