0% found this document useful (0 votes)

8 views8 pages

Parallel Computing and Supercomputers

The document discusses the shift from single processor performance to parallel computing due to limitations like the Power Wall, Memory Wall, and ILP Wall. It explains Amdahl's Law, which illustrates the speedup limitations in parallel processing based on the fraction of serial code, and categorizes parallel architectures using Flynn's Taxonomy. Additionally, it covers memory organization in parallel systems, cache coherence issues, and modern trends in interconnection networks for high-performance computing.

Uploaded by

عبد الوهاب الحمداني

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views8 pages

Parallel Computing and Supercomputers

Uploaded by

عبد الوهاب الحمداني

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The quest for performance has shifted from making a single processor faster to

using multiple processors together. This is due to physical and economic limits:
 Power Wall: Increasing clock speed dramatically increases power
consumption and heat.
 Memory Wall: The speed of memory has not kept up with processor speed.
 ILP Wall: It's becoming increasingly difficult to find more instructions to
execute simultaneously in a single stream of code.
Solution: Divide a large problem into smaller sub-problems and solve them
concurrently using multiple processing elements.

What is Parallel Architecture?

A parallel computer is a collection of processing elements that cooperate to solve
large problems fast. The key components are:
 Processors: The units that perform computations.
 Memory: Where data and instructions are stored.
 Interconnection Network: The system that allows processors to
communicate with each other and with memory.
The Fundamental Goal: To achieve a speedup, where the time on a parallel system
is less than the time on a single processor(sequenial).

Amdahl's Law Example (S = 0.3)

This document explains how Amdahl's Law limits the speedup of a program when 30% of
the code is serial (S = 0.3) and 70% is parallel (1-S = 0.7).

Formula
Amdahl’s Law states:

Speedup(P) = 1 / (S + (1-S)/P)

where:
• S = fraction of code that is serial
• (1-S) = fraction of code that is parallel
• P = number of processors

1. One Processor (P = 1)
Speedup(1) = 1 / (0.3 + 0.7/1)
= 1 / (0.3 + 0.7)
=1/1
=1

Natural, since with only one processor there is no speedup.

2. Four Processors (P = 4)
Speedup(4) = 1 / (0.3 + 0.7/4)
= 1 / (0.3 + 0.175)
= 1 / 0.475
≈ 2.1

This is less than the ideal 4× speedup, because 30% of the program is still sequential.

3. One Hundred Processors (P = 100)

Speedup(100) = 1 / (0.3 + 0.7/100)
= 1 / (0.3 + 0.007)
= 1 / 0.307
≈ 3.26

Even with 100 processors, the maximum speedup is only about 3.26×.

Maximum Speedup
As P → ∞ (in inite processors):

Speedup_max = 1 / S

If S = 0.3:
Speedup_max = 1 / 0.3 ≈ 3.33

Even with a million processors, you cannot exceed a 3.33× speedup when 30% of the
code is serial.

Conclusion
Amdahl’s Law shows that the serial portion (S) is the bottleneck in parallel computing.
Adding more processors improves performance only up to a limit. Reducing S is critical for
achieving higher speedup.
Parallel Architecture: Flynn's Taxonomy & The Memory Hierarchy
1- Classifying Parallel Systems: Flynn's Taxonomy
This is the classic model for categorizing computer architectures based on the number of
instruction and data streams.

Category Instruction Streams Data Streams Example

SISD Single Single Traditional Uniprocessor

SIMD Single Multiple GPUs, Vector Processors

MISD Multiple Single Rarely Used (Theoretical)

MIMD Multiple Multiple Modern Multicore CPUs, Clusters

 SIMD (Single Instruction, Multiple Data): All processors execute the same
instruction simultaneously, but on different data elements. Excellent for data-parallel
tasks (e.g., image processing, scientific simulations).

 MIMD (Multiple Instruction, Multiple Data): Each processor executes its own
instruction stream on its own data. This is the most common and flexible model,
encompassing modern multicore CPUs and supercomputers.

2-Memory Organization

How do the multiple processors in a system share and access memory? This
leads to the two primary architectural models.

 Shared Memory (Tightly Coupled):

o All processors share a single, unified address space.

o Communication between processors is implicit; they simply

read from and write to the same memory locations.

o Advantage: Ease of programming.

o Challenge: Requires cache coherency.

 Distributed Memory (Loosely Coupled):

o Each processor has its own private memory.

o There is no global address space.

o Communication between processors is explicit, via passing

messages over a network (e.g., using MPI - Message Passing
Interface).

o Advantage: Scalable to a very large number of processors.

o Challenge: More complex programming.

Shared Memory Multiprocessors

Further divided based on how memory is physically connected:

 UMA (Uniform Memory Access):

o Access time to any memory location is the same for all

processors.

o Often connected via a bus or a crossbar.

o Also known as SMP (Symmetric Multiprocessor).

o Limitation: The shared bus becomes a bottleneck as more

processors are added.

 NUMA (Non-Uniform Memory Access):

o Access time depends on the memory location relative to the

processor.

o Physically distributed, but logically shared memory.

o Example: A multi-socket server where each CPU socket has

its own local memory. Accessing local memory is fast;
accessing another socket's memory ("remote" memory) is
slower.

o Advantage: More scalable than UMA.

The Cache Coherence Problem:
In shared memory systems, each processor has a cache. If Processor A writes to address X, how
does Processor B know its cached copy of X is now stale? This is solved by a cache coherence
protocol, most commonly MESI (Modified, Exclusive, Shared, Invalid).

Distributed Memory Systems (Clusters)

 A cluster is a group of independent computers (nodes) connected by a high-

speed network (e.g., InfiniBand, Ethernet).

 Each node runs its own operating system.

 This is the architecture of most of the world's Top500 supercomputers.

 Programming is typically done using the Message Passing Model.

The Hybrid Model

Modern high-performance computing (HPC) often uses a hybrid approach:

 Distributed Memory across Nodes: A cluster of many nodes.

 Shared Memory within a Node: Each node is a multicore, shared-

memory computer (e.g., a dual-socket server with 64 cores each).
 Programming: Use MPI for communication between
nodes and OpenMP (a shared-memory API) for parallelism within a
node.

The Interconnection Network & Modern Trends

Interconnection Network:
When we have a large computing system such as a supercomputer or a cluster, the
processors or nodes need to communicate with each other.

This communication is done through special high-speed networks that connect the
processors, memory, and nodes.

Examples: Bus, Ring, Mesh, Torus, Hypercube, Fat-tree – these are all types of
interconnection networks.

The goal: Transfer data quickly and efficiently between the system’s components so that it
works as if it were one giant computer.

Modern Trends:
This refers to the latest advancements in the design of interconnection networks:
 Very high speeds (such as InfiniBand, NVLink), which are much faster than
regular Ethernet.

 Reduced latency, so data can be transferred in fractions of a second.

 Scalability, meaning the network can support thousands or even millions of

processors without performance collapse.

 Specialized networks for AI and GPU clusters, such as NVIDIA NVSwitch and
Google TPU interconnect.

PDC Notes by Zatch-1
No ratings yet
PDC Notes by Zatch-1
42 pages
Understanding Multi-Core Processor Architectures
No ratings yet
Understanding Multi-Core Processor Architectures
32 pages
Overview of Parallel Hardware Concepts
No ratings yet
Overview of Parallel Hardware Concepts
60 pages
Memory Management in Multiprocessor Systems
No ratings yet
Memory Management in Multiprocessor Systems
52 pages
Understanding Parallel Computing Concepts
No ratings yet
Understanding Parallel Computing Concepts
32 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
Principles of Parallel Computing Overview
No ratings yet
Principles of Parallel Computing Overview
28 pages
Parallel Processing Architectures Overview
No ratings yet
Parallel Processing Architectures Overview
36 pages
Understanding Amdahl's Law in Parallel Computing
No ratings yet
Understanding Amdahl's Law in Parallel Computing
36 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
Parallel Programming Models in Cloud Computing
No ratings yet
Parallel Programming Models in Cloud Computing
39 pages
Parallel Computer Models Overview
No ratings yet
Parallel Computer Models Overview
91 pages
Memory Performance in Parallel Computing
No ratings yet
Memory Performance in Parallel Computing
11 pages
Understanding Parallel Computer Architectures
No ratings yet
Understanding Parallel Computer Architectures
39 pages
Shared vs. Distributed Memory Systems
No ratings yet
Shared vs. Distributed Memory Systems
28 pages
CUDA Parallel Programming Overview
No ratings yet
CUDA Parallel Programming Overview
80 pages
Introduction to Parallel Processing Concepts
No ratings yet
Introduction to Parallel Processing Concepts
117 pages
Wa0024.
No ratings yet
Wa0024.
4 pages
Parallel 1
No ratings yet
Parallel 1
15 pages
Understanding Parallel Hardware Systems
No ratings yet
Understanding Parallel Hardware Systems
19 pages
Overview of Parallel Processing Systems
No ratings yet
Overview of Parallel Processing Systems
35 pages
Understanding Parallel Computing Concepts
No ratings yet
Understanding Parallel Computing Concepts
19 pages
Understanding Parallel Computing Architectures
No ratings yet
Understanding Parallel Computing Architectures
30 pages
Understanding Multicore and Multiprocessor Systems
No ratings yet
Understanding Multicore and Multiprocessor Systems
21 pages
Parallel Computing Concepts Explained
No ratings yet
Parallel Computing Concepts Explained
22 pages
Parallel Computer Models Overview
No ratings yet
Parallel Computer Models Overview
27 pages
Understanding Multiprocessors and Parallelism
No ratings yet
Understanding Multiprocessors and Parallelism
18 pages
Levels of Parallelism in Computing
No ratings yet
Levels of Parallelism in Computing
70 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
77 pages
Introduction to Parallel Computing Concepts
No ratings yet
Introduction to Parallel Computing Concepts
28 pages
Multiprocessors and Thread-Level Parallelism
No ratings yet
Multiprocessors and Thread-Level Parallelism
20 pages
Introduction to Parallel Programming Basics
No ratings yet
Introduction to Parallel Programming Basics
32 pages
Overview of Parallel and Distributed Computing
No ratings yet
Overview of Parallel and Distributed Computing
66 pages
Overview of Multiprocessor Systems
No ratings yet
Overview of Multiprocessor Systems
78 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
127 pages
PARALLEL PROGRAMMING Module 1
No ratings yet
PARALLEL PROGRAMMING Module 1
20 pages
CS 213: Parallel Processing Syllabus
No ratings yet
CS 213: Parallel Processing Syllabus
26 pages
Introduction to Parallel Computing Concepts
No ratings yet
Introduction to Parallel Computing Concepts
27 pages
University Questions on Parallel Computing
No ratings yet
University Questions on Parallel Computing
85 pages
Introduction to Parallel Processing
No ratings yet
Introduction to Parallel Processing
40 pages
Introduction to Parallel Computing Concepts
No ratings yet
Introduction to Parallel Computing Concepts
40 pages
Multicore Architecture Overview
No ratings yet
Multicore Architecture Overview
31 pages
Understanding Multiprocessor Systems
No ratings yet
Understanding Multiprocessor Systems
53 pages
Introduction to Parallel Programming
No ratings yet
Introduction to Parallel Programming
37 pages
Understanding Multiprocessor Models
No ratings yet
Understanding Multiprocessor Models
32 pages
Explicitly Parallel Instruction Computing
No ratings yet
Explicitly Parallel Instruction Computing
90 pages
Amdahl's Law and Parallel Computing Architectures
No ratings yet
Amdahl's Law and Parallel Computing Architectures
35 pages
Multiprocessor Architecture Overview
No ratings yet
Multiprocessor Architecture Overview
69 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Overview of Parallel Computing Concepts
No ratings yet
Overview of Parallel Computing Concepts
22 pages
Types of Processor Interconnects Explained
No ratings yet
Types of Processor Interconnects Explained
15 pages
Multiprocessor Systems and Parallel Programming
No ratings yet
Multiprocessor Systems and Parallel Programming
17 pages
CS 213: Parallel Processing Overview
No ratings yet
CS 213: Parallel Processing Overview
26 pages
Parallel and Distributed Computing Overview
No ratings yet
Parallel and Distributed Computing Overview
124 pages
Parallel and Distributed Computing Overview
No ratings yet
Parallel and Distributed Computing Overview
33 pages
Unit 1: Parallel Computing
No ratings yet
Unit 1: Parallel Computing
51 pages
Ca Unit - I
No ratings yet
Ca Unit - I
14 pages
Parallel Processing in Computer Design
No ratings yet
Parallel Processing in Computer Design
43 pages
Overview of Multiprocessor Systems
No ratings yet
Overview of Multiprocessor Systems
50 pages
SMS-Based Automated Irrigation Systems
No ratings yet
SMS-Based Automated Irrigation Systems
6 pages
Paragraph Design in MS Word
No ratings yet
Paragraph Design in MS Word
15 pages
DVM S Eco AM060HXMDBC/TC Specs
No ratings yet
DVM S Eco AM060HXMDBC/TC Specs
3 pages
BCA Graduate Resume for IT Roles
No ratings yet
BCA Graduate Resume for IT Roles
10 pages
Uras Vibrators: Usage and Safety Guide
No ratings yet
Uras Vibrators: Usage and Safety Guide
12 pages
UI/UX Designer Assessment Guide
No ratings yet
UI/UX Designer Assessment Guide
5 pages
Overview of ISO Standards and Accreditation
No ratings yet
Overview of ISO Standards and Accreditation
25 pages
Enactus REC Marketing Overview
No ratings yet
Enactus REC Marketing Overview
6 pages
MTU 16V4000 DS2050 Generator Specs
No ratings yet
MTU 16V4000 DS2050 Generator Specs
2 pages
Renkus-Heinz Loudspeaker Controllers
No ratings yet
Renkus-Heinz Loudspeaker Controllers
2 pages
Analog vs Digital Signals Explained
No ratings yet
Analog vs Digital Signals Explained
64 pages
EECCIS 2024 Program Overview
No ratings yet
EECCIS 2024 Program Overview
43 pages
Telephone Conversation Phrases Guide
100% (1)
Telephone Conversation Phrases Guide
17 pages
Lotus SmarTime Outdoor Watch Manual
No ratings yet
Lotus SmarTime Outdoor Watch Manual
37 pages
Pilgrim Anthropic Concern Overview
No ratings yet
Pilgrim Anthropic Concern Overview
35 pages
Sensorless Position Control of Brushed DC Motor Using Ripple Counting Technique 00003049A
No ratings yet
Sensorless Position Control of Brushed DC Motor Using Ripple Counting Technique 00003049A
50 pages
CADWorx Plant Engineering Insights
100% (1)
CADWorx Plant Engineering Insights
14 pages
User Manual 8113
No ratings yet
User Manual 8113
76 pages
EEF vs OPA in Project Management
No ratings yet
EEF vs OPA in Project Management
4 pages
Exhaust Line Normative Specification
No ratings yet
Exhaust Line Normative Specification
42 pages
To Do List
No ratings yet
To Do List
4 pages
SMS Grade Inquiry at Limay College
No ratings yet
SMS Grade Inquiry at Limay College
36 pages
VHDL Encoder Implementation Guide
No ratings yet
VHDL Encoder Implementation Guide
4 pages
Java Programming Lab Manual CS29004
No ratings yet
Java Programming Lab Manual CS29004
16 pages
Kaijo Ultrasonic Cleaning Manual
No ratings yet
Kaijo Ultrasonic Cleaning Manual
52 pages
Understanding Threshold Voltage in Relays
No ratings yet
Understanding Threshold Voltage in Relays
1 page
LNG Plant Maintenance Training Overview
100% (1)
LNG Plant Maintenance Training Overview
3 pages
Splunk Enterprise Security Lab Guide
No ratings yet
Splunk Enterprise Security Lab Guide
25 pages
JAKA Zu® Series Cobots Overview
No ratings yet
JAKA Zu® Series Cobots Overview
17 pages
Computer Organization & Assembly Basics
No ratings yet
Computer Organization & Assembly Basics
32 pages

Parallel Computing and Supercomputers

Uploaded by

Parallel Computing and Supercomputers

Uploaded by

The quest for performance has shifted from making a single processor faster to

What is Parallel Architecture?

Amdahl's Law Example (S = 0.3)

Natural, since with only one processor there is no speedup.

3. One Hundred Processors (P = 100)

Category Instruction Streams Data Streams Example

SISD Single Single Traditional Uniprocessor

SIMD Single Multiple GPUs, Vector Processors

MISD Multiple Single Rarely Used (Theoretical)

MIMD Multiple Multiple Modern Multicore CPUs, Clusters

 Shared Memory (Tightly Coupled):

o All processors share a single, unified address space.

o Communication between processors is implicit; they simply

o Advantage: Ease of programming.

o Challenge: Requires cache coherency.

o Each processor has its own private memory.

o There is no global address space.

o Communication between processors is explicit, via passing

o Advantage: Scalable to a very large number of processors.

o Challenge: More complex programming.

Further divided based on how memory is physically connected:

 UMA (Uniform Memory Access):

o Access time to any memory location is the same for all

o Often connected via a bus or a crossbar.

o Also known as SMP (Symmetric Multiprocessor).

o Limitation: The shared bus becomes a bottleneck as more

 NUMA (Non-Uniform Memory Access):

o Access time depends on the memory location relative to the

o Physically distributed, but logically shared memory.

o Example: A multi-socket server where each CPU socket has

o Advantage: More scalable than UMA.

Distributed Memory Systems (Clusters)

 A cluster is a group of independent computers (nodes) connected by a high-

 Each node runs its own operating system.

 This is the architecture of most of the world's Top500 supercomputers.

 Programming is typically done using the Message Passing Model.

The Hybrid Model

 Distributed Memory across Nodes: A cluster of many nodes.

 Shared Memory within a Node: Each node is a multicore, shared-

The Interconnection Network & Modern Trends

 Reduced latency, so data can be transferred in fractions of a second.

 Scalability, meaning the network can support thousands or even millions of

You might also like