Module III

The document discusses parallel processing and multicore architecture, detailing various processor organizations such as SISD, SIMD, MISD, and MIMD, along with memory architectures like UMA and NUMA. It highlights the advantages of multicore systems, including scalability and performance, while also addressing hardware and software performance issues. A case study on Intel Core processors illustrates the differences in performance and features among multicore options.

Uploaded by

niranjaniyerofi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

Module III

Uploaded by

niranjaniyerofi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MODULE- III:

Parallel processing and Multicore

architecture

1303102-4: Explore multiple processor organizations (SSID, SIMD, MISD, MIMD) and memory
architectures
Contents
Multiple Processor Organization: SISD, SIMD, MISD, MIMD. Uniform memory access (UMA), Non
uniform memory access (NUMA), CC-NUMA.

Multicore: Hardware and software performance issues, need of multicore. Multicore organization,
heterogeneous multicore organization: CPU and GPU. Case study: Intel Core i7 5960X Module
Multiple Processor Organization
• Types of Parallel Processor Systems: A taxonomy first introduced by Flynn. the following
categories of parallel computer systems are proposed.
Multiple Processor Organization
• Single instruction, single data (SISD) stream: A single processor executes a single
instruction stream to operate on data stored in a single memory. Uniprocessors fall into this
category.
• Single instruction, multiple data (SIMD) stream: A single machine instruction controls the
simultaneous execution of a number of processing elements on a lockstep basis. Each
processing element has an associated data memory, so that instructions are executed on
different sets of data by different processors. Vector and array processors fall into this
category
• Multiple instruction, single data (MISD) stream: A sequence of data is transmitted to a set
of processors, each of which executes a different instruction sequence. This structure is not
commercially implemented.
• Multiple instruction, multiple data (MIMD) stream: A set of processors simultaneously
execute different instruction sequences on different data sets. SMPs, clusters, and NUMA
systems fit into this category.
Multiple Processor Organization
• MIMDs can be fur ther sub divided by th e means in which the processors commu nicate (Figure 20.1)

• If the pro cessor s sh are a common memo ry, then each pro cesso r accesses p rogr ams and data stor ed in the shared memor y, and p rocessors co mmunicate with each other v ia that memo ry. The most common form of such systems is known as a symmetr ic
multiprocessor (SMP)

• Uniform memory access (UMA): All processors have access to all parts of main memory using loads and stores. The memory accesstime of a processor to all regions of memory is the
same. The access times experienced by different processors are the same. The SMP and cluster is example of UMA.

• Nonuniform memory access (NUMA): All processors have access to all parts of main memory using loads and stores. The memory ac cess time of a processor differs depending on which
region of main memory is accessed. The last statement is true for all processors; however, for different processors, which memory regions are slower and which are faster differs.

• Cache-coherent NUMA (CC-NUMA): A NUMA system in which cache coherence is maintained among the caches of the various processors.
SMP: symmetric multi processor
[Link] are two or more similar processors of comparable capability.
2. These processors share the same main memory and I/O facilities and are interconnected by a bus or
other internal connection scheme, such that memory access time is approximately the same for each
processor.
3. All processors share access to I/O devices, either through the same channels or through different
channels that provide paths to the same device.
4. All processors can perform the same functions (hence the term symmetric).
5. The system is controlled by an integrated operating system that provides interaction between
processors and their programs at the job, task, file, and data element levels.
SMP: symmetric multi processor
Cluster

An important and relatively recent

development in computer system
design is clustering. Clustering is
an alternative to symmetric
multiprocessing as an approach to
providing high performance and
high availability, and is particularly
attractive for server applications.
Advantage
• Absolute scalability: It is possible to create large clusters that far surpass the power of even the
largest standalone machines. A cluster can have tens, hundreds, or even thousands of machines,
each of which is a multiprocessor.
• Incremental scalability: A cluster is configured in such a way that it is possible to add new
systems to the cluster in small increments. Thus, a user can start out with a modest system and
expand it as needs grow, without having to go through a major upgrade in which an existing small
system is replaced with a larger system.
• High availability: Because each node in a cluster is a standalone computer, the failure of one node
does not mean loss of service. In many products, fault tolerance is handled automatically in
software.
• Superior price/performance: By using commodity building blocks, it is possible to put together a
cluster with equal or greater computing power than a single large machine, at much lower cost.
CC-NUMA
• A NUMA system without cache coherence is more or less equivalent to a cluster. The commercial
products that have received much attention recently are CC-NUMA systems, which are quite
distinct from both SMPs and clusters.
Motivation
• The processor limit in an SMP is one of the driving motivations behind the development of cluster
systems. However, with a cluster, each node has its own private main memory; applications do not
see a large global memory. In effect, coherence is maintained in software rather than hardware.
This memory granularity affects performance and, to achieve maximum performance, software
must be tailored to this environment. One approach to achieving large-scale multiprocessing while
retaining the flavor of SMP is NUMA.
CC-NUMA

Figure 20.12 depicts a typical CC-NUMA

organization. There are multiple independent
nodes, each of which is, in effect, an SMP
organization. Thus, each node contains
multiple processors, each with its own L1
and L2 caches, plus main memory. The node
is the basic building block of the overall CC-
NUMA organization.
CC-NUMA

Each node in the CC-NUMA system includes some main memory. From the point of view of the
processors, however, there is only a single addressable memory, with each location having a unique
system-wide address. When a processor initiates a memory access, if the requested memory location is
not in that processor’s cache, then the L2 cache initiates a fetch operation. If the desired line is in the
local portion of the main memory, the line is fetched across the local bus. If the desired line is in a remote
portion of the main memory, then an automatic request is sent out to fetch that line across the
interconnection network, deliver it to the local bus, and then deliver it to the requesting cache on that bus.
All of this activity is automatic and transparent to the processor and its cache
Disadvantages of CC-NUMA
Multicore
• A multicore processor, also known as a chip multiprocessor, combines two or more processor units
(called cores) on a single piece of silicon (called a die).
• Typically, each core consists of all of the components of an independent processor, such as
registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition
to the multiple cores, contemporary multicore chips also include L2 cache and, increasingly, L3
cache. The most highly integrated multicore processors, known as systems on chip (SoCs), also
include memory and peripheral controllers.
Hardware Performance Issues
Increase in Parallelism and Complexity
• Pipelining: Individual instructions are executed through a pipeline of stages so that while one
instruction is executing in one stage of the pipeline, another instruction is executing in another
stage of the pipeline.
• Superscalar: Multiple pipelines are constructed by replicating execution resources. This enables
parallel execution of instructions in parallel pipelines, so long as hazards are avoided.
• Simultaneous multithreading (SMT): Register banks are expanded so that multiple threads can
share the use of pipeline resources.
With each of these innovations, designers have over the years attempted to increase the performance
of the system by adding complexity. There is a practical limit to how far this trend can be taken,
because with more stages, there is the need for more logic, more interconnections, and more control
signals.
Hardware Performance Issues
Power Consumption
To maintain the trend of higher performance
,power requirements have grown exponentially as
chip density and clock frequency have risen.
Power considerations provide another motive for
moving toward a multicore organization.
Pollack’s Rule
“performance is roughly proportional to square
root of increase in complexity”

Before 2005 Performance improved mainly by increasing frequency and transistor count. Power Consumption Increased
with frequency and transistors until early [Link] that, power density issues forced designers to hold power levels
steady.
After 2005 Frequency and power hit physical limits. Designers turned to multicore architecture (increasing cores instead of
frequency).Multicore takes advantage of chip density while avoiding high power density.
Software Performance Issues
• small amounts of serial code impact performance.
• According to Amdahl’s law
Speedup= time to execute program on a single processor/ time to execute program on N parallel
processor
=1/((1-f)+f/N)
• where f is the fraction of code infinitely parallelizable with no schedule overhead
Main variable in a multicore organization:
Multicore organization Number of core processors on chip
Number of levels of cache on chip
Amount of shared cache
Case study
• Intel Core i3, i5, i7, i9 Processors Multicore Computers
• Main differences refer to:
• Performance/Heat
• Number of cores
• Maximum Main Memory/Cache Capacity
• Hyperthreading (yes/no)
• Turbo Boost (possibility of increasing frequency and heat)
• Built in Graphic Processor (yes/no)
• Price

Parallel Processing in Multicore Systems
No ratings yet
Parallel Processing in Multicore Systems
64 pages
Overview of Parallel Processing Units
No ratings yet
Overview of Parallel Processing Units
47 pages
Understanding Parallel Organization Systems
No ratings yet
Understanding Parallel Organization Systems
32 pages
Overview of Parallel Hardware Concepts
No ratings yet
Overview of Parallel Hardware Concepts
60 pages
Multiprocessor Systems Overview
No ratings yet
Multiprocessor Systems Overview
21 pages
Parallel Processor Architectures Overview
No ratings yet
Parallel Processor Architectures Overview
40 pages
Overview of Parallel Processor Types
No ratings yet
Overview of Parallel Processor Types
24 pages
Understanding Multi-Processor Systems
No ratings yet
Understanding Multi-Processor Systems
12 pages
Understanding Multi-Processor Systems
No ratings yet
Understanding Multi-Processor Systems
12 pages
SIMD and Parallel Processing Overview
No ratings yet
SIMD and Parallel Processing Overview
12 pages
Overview of Parallel Processing Techniques
No ratings yet
Overview of Parallel Processing Techniques
48 pages
Understanding Multiprocessor Operating Systems
No ratings yet
Understanding Multiprocessor Operating Systems
35 pages
Parallel Processing Architectures Explained
No ratings yet
Parallel Processing Architectures Explained
36 pages
Parallel Computing Architectures Explained
No ratings yet
Parallel Computing Architectures Explained
56 pages
CS-3006 3 ParallelArchitectures
No ratings yet
CS-3006 3 ParallelArchitectures
63 pages
Multi-Processor Systems and Parallel Processing
No ratings yet
Multi-Processor Systems and Parallel Processing
13 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
77 pages
Understanding Parallel Processing Systems
No ratings yet
Understanding Parallel Processing Systems
30 pages
Understanding Parallel Processor Architectures
No ratings yet
Understanding Parallel Processor Architectures
117 pages
Parallel Processing
No ratings yet
Parallel Processing
16 pages
Overview of Parallel Architectures
No ratings yet
Overview of Parallel Architectures
53 pages
Advantages of Multiprocessor Systems
No ratings yet
Advantages of Multiprocessor Systems
17 pages
Characteristics of Symmetric Multiprocessing
No ratings yet
Characteristics of Symmetric Multiprocessing
11 pages
Understanding Multiprocessor Architecture
No ratings yet
Understanding Multiprocessor Architecture
22 pages
Centralized vs Distributed Computer Systems
No ratings yet
Centralized vs Distributed Computer Systems
25 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
6 pages
Understanding Multiprocessor Systems
No ratings yet
Understanding Multiprocessor Systems
18 pages
Parallel Processing Architectures Explained
No ratings yet
Parallel Processing Architectures Explained
4 pages
Parallel Architectures Lecture 2.1
No ratings yet
Parallel Architectures Lecture 2.1
47 pages
Shared vs. Distributed Memory Systems
No ratings yet
Shared vs. Distributed Memory Systems
28 pages
Understanding NUMA in Multiprocessors
No ratings yet
Understanding NUMA in Multiprocessors
36 pages
Types and Challenges of Parallelism
No ratings yet
Types and Challenges of Parallelism
66 pages
Lec 3
No ratings yet
Lec 3
5 pages
Cluster Classification in HPC Systems
No ratings yet
Cluster Classification in HPC Systems
17 pages
Multicore Architecture Overview
No ratings yet
Multicore Architecture Overview
31 pages
Parallel Processing in Computer Design
No ratings yet
Parallel Processing in Computer Design
43 pages
Parallel Computing Architectures Explained
No ratings yet
Parallel Computing Architectures Explained
22 pages
Understanding Multiprocessor SoCs
No ratings yet
Understanding Multiprocessor SoCs
29 pages
Understanding Multiprocessor Systems
No ratings yet
Understanding Multiprocessor Systems
53 pages
Introduction to Parallel Computing Concepts
No ratings yet
Introduction to Parallel Computing Concepts
40 pages
Parallel 0 Distributed Computing
No ratings yet
Parallel 0 Distributed Computing
30 pages
Understanding Parallel Computing Concepts
No ratings yet
Understanding Parallel Computing Concepts
32 pages
Understanding Multi-Core Computing
No ratings yet
Understanding Multi-Core Computing
42 pages
Overview of Multi-Core Processors
No ratings yet
Overview of Multi-Core Processors
38 pages
Parallel vs Distributed Computing Explained
No ratings yet
Parallel vs Distributed Computing Explained
31 pages
Multiprocessor Architectures Overview
No ratings yet
Multiprocessor Architectures Overview
28 pages
Multiprocessors vs. Multicomputers Explained
No ratings yet
Multiprocessors vs. Multicomputers Explained
36 pages
Understanding Parallel Processor Architectures
No ratings yet
Understanding Parallel Processor Architectures
56 pages
Overview of Parallel Computing Concepts
No ratings yet
Overview of Parallel Computing Concepts
22 pages
Parallel Processing Architectures Overview
No ratings yet
Parallel Processing Architectures Overview
40 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
UMA vs NUMA: Key Differences Explained
No ratings yet
UMA vs NUMA: Key Differences Explained
10 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
80 pages
Symmetric & Distributed Memory Architectures
No ratings yet
Symmetric & Distributed Memory Architectures
31 pages
Single vs Multi-Core Processor Architectures
No ratings yet
Single vs Multi-Core Processor Architectures
125 pages
Understanding Instruction-Level Parallelism
No ratings yet
Understanding Instruction-Level Parallelism
16 pages
Parallel Processing Architectures Explained
No ratings yet
Parallel Processing Architectures Explained
28 pages
Understanding Multiprocessing Systems
No ratings yet
Understanding Multiprocessing Systems
14 pages
Computer Architecture and Pipelining Insights
No ratings yet
Computer Architecture and Pipelining Insights
15 pages
Properties of Operating Systems Explained
No ratings yet
Properties of Operating Systems Explained
27 pages
Advanced Operating Systems Overview
100% (1)
Advanced Operating Systems Overview
22 pages
Operating System Short Notes
No ratings yet
Operating System Short Notes
26 pages
USB Accessory Crash on Samsung A32
No ratings yet
USB Accessory Crash on Samsung A32
6 pages
Distributed Database Essentials
No ratings yet
Distributed Database Essentials
9 pages
Two-Phase Locking Techniques Explained
No ratings yet
Two-Phase Locking Techniques Explained
22 pages
Distributed Systems & Cloud Computing Exam
No ratings yet
Distributed Systems & Cloud Computing Exam
1 page
Understanding Processes in Operating Systems
No ratings yet
Understanding Processes in Operating Systems
10 pages
Understanding Binary Locks in DBMS
100% (1)
Understanding Binary Locks in DBMS
4 pages
Process Synchronization and Deadlocks
No ratings yet
Process Synchronization and Deadlocks
51 pages
Operating System Viva Questions Guide
No ratings yet
Operating System Viva Questions Guide
8 pages
Parallel Programming Question Bank
No ratings yet
Parallel Programming Question Bank
2 pages
Priority Scheduling in Operating Systems
No ratings yet
Priority Scheduling in Operating Systems
14 pages
Understanding MPI: Message Passing Interface
No ratings yet
Understanding MPI: Message Passing Interface
65 pages
Operating System Midterm Exam 2025
No ratings yet
Operating System Midterm Exam 2025
8 pages
CSC Self-Scheduling in Multiprocessors
No ratings yet
CSC Self-Scheduling in Multiprocessors
15 pages
Settings Provider
No ratings yet
Settings Provider
23 pages
Understanding Mutual Exclusion Algorithms
No ratings yet
Understanding Mutual Exclusion Algorithms
19 pages
Distributed Computing Course Overview
No ratings yet
Distributed Computing Course Overview
6 pages
Process Management in Operating Systems
No ratings yet
Process Management in Operating Systems
33 pages
Understanding Distributed Systems Models
No ratings yet
Understanding Distributed Systems Models
27 pages
Overview of Thread Libraries in OS
No ratings yet
Overview of Thread Libraries in OS
6 pages
Advanced Operating Systems Notes PDF
No ratings yet
Advanced Operating Systems Notes PDF
84 pages
Wondershare Dr.Fone Air Activity Log
No ratings yet
Wondershare Dr.Fone Air Activity Log
7 pages
Supercomputers: Overview and Applications
100% (1)
Supercomputers: Overview and Applications
16 pages
MPI and Parsl for Parallel Computing
No ratings yet
MPI and Parsl for Parallel Computing
2 pages
GTASA Crash Log Analysis Report
No ratings yet
GTASA Crash Log Analysis Report
17 pages
Deadlock Management in Systems
No ratings yet
Deadlock Management in Systems
43 pages
Understanding the Bourne Shell and CPU Scheduling
No ratings yet
Understanding the Bourne Shell and CPU Scheduling
10 pages
UNIX Process Management MCQs
No ratings yet
UNIX Process Management MCQs
8 pages

Module III

Uploaded by

Module III

Uploaded by

MODULE- III:

Parallel processing and Multicore

An important and relatively recent

Figure 20.12 depicts a typical CC-NUMA

You might also like