0% found this document useful (0 votes)
8 views20 pages

Module III

The document discusses parallel processing and multicore architecture, detailing various processor organizations such as SISD, SIMD, MISD, and MIMD, along with memory architectures like UMA and NUMA. It highlights the advantages of multicore systems, including scalability and performance, while also addressing hardware and software performance issues. A case study on Intel Core processors illustrates the differences in performance and features among multicore options.

Uploaded by

niranjaniyerofi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Module III

The document discusses parallel processing and multicore architecture, detailing various processor organizations such as SISD, SIMD, MISD, and MIMD, along with memory architectures like UMA and NUMA. It highlights the advantages of multicore systems, including scalability and performance, while also addressing hardware and software performance issues. A case study on Intel Core processors illustrates the differences in performance and features among multicore options.

Uploaded by

niranjaniyerofi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE- III:

Parallel processing and Multicore


architecture

1303102-4: Explore multiple processor organizations (SSID, SIMD, MISD, MIMD) and memory
architectures
Contents
Multiple Processor Organization: SISD, SIMD, MISD, MIMD. Uniform memory access (UMA), Non
uniform memory access (NUMA), CC-NUMA.

Multicore: Hardware and software performance issues, need of multicore. Multicore organization,
heterogeneous multicore organization: CPU and GPU. Case study: Intel Core i7 5960X Module
Multiple Processor Organization
• Types of Parallel Processor Systems: A taxonomy first introduced by Flynn. the following
categories of parallel computer systems are proposed.
Multiple Processor Organization
• Single instruction, single data (SISD) stream: A single processor executes a single
instruction stream to operate on data stored in a single memory. Uniprocessors fall into this
category.
• Single instruction, multiple data (SIMD) stream: A single machine instruction controls the
simultaneous execution of a number of processing elements on a lockstep basis. Each
processing element has an associated data memory, so that instructions are executed on
different sets of data by different processors. Vector and array processors fall into this
category
• Multiple instruction, single data (MISD) stream: A sequence of data is transmitted to a set
of processors, each of which executes a different instruction sequence. This structure is not
commercially implemented.
• Multiple instruction, multiple data (MIMD) stream: A set of processors simultaneously
execute different instruction sequences on different data sets. SMPs, clusters, and NUMA
systems fit into this category.
Multiple Processor Organization
• MIMDs can be fur ther sub divided by th e means in which the processors commu nicate (Figure 20.1)

• If the pro cessor s sh are a common memo ry, then each pro cesso r accesses p rogr ams and data stor ed in the shared memor y, and p rocessors co mmunicate with each other v ia that memo ry. The most common form of such systems is known as a symmetr ic
multiprocessor (SMP)

• Uniform memory access (UMA): All processors have access to all parts of main memory using loads and stores. The memory accesstime of a processor to all regions of memory is the
same. The access times experienced by different processors are the same. The SMP and cluster is example of UMA.

• Nonuniform memory access (NUMA): All processors have access to all parts of main memory using loads and stores. The memory ac cess time of a processor differs depending on which
region of main memory is accessed. The last statement is true for all processors; however, for different processors, which memory regions are slower and which are faster differs.

• Cache-coherent NUMA (CC-NUMA): A NUMA system in which cache coherence is maintained among the caches of the various processors.
SMP: symmetric multi processor
[Link] are two or more similar processors of comparable capability.
2. These processors share the same main memory and I/O facilities and are interconnected by a bus or
other internal connection scheme, such that memory access time is approximately the same for each
processor.
3. All processors share access to I/O devices, either through the same channels or through different
channels that provide paths to the same device.
4. All processors can perform the same functions (hence the term symmetric).
5. The system is controlled by an integrated operating system that provides interaction between
processors and their programs at the job, task, file, and data element levels.
SMP: symmetric multi processor
Cluster

An important and relatively recent


development in computer system
design is clustering. Clustering is
an alternative to symmetric
multiprocessing as an approach to
providing high performance and
high availability, and is particularly
attractive for server applications.
Advantage
• Absolute scalability: It is possible to create large clusters that far surpass the power of even the
largest standalone machines. A cluster can have tens, hundreds, or even thousands of machines,
each of which is a multiprocessor.
• Incremental scalability: A cluster is configured in such a way that it is possible to add new
systems to the cluster in small increments. Thus, a user can start out with a modest system and
expand it as needs grow, without having to go through a major upgrade in which an existing small
system is replaced with a larger system.
• High availability: Because each node in a cluster is a standalone computer, the failure of one node
does not mean loss of service. In many products, fault tolerance is handled automatically in
software.
• Superior price/performance: By using commodity building blocks, it is possible to put together a
cluster with equal or greater computing power than a single large machine, at much lower cost.
CC-NUMA
• A NUMA system without cache coherence is more or less equivalent to a cluster. The commercial
products that have received much attention recently are CC-NUMA systems, which are quite
distinct from both SMPs and clusters.
Motivation
• The processor limit in an SMP is one of the driving motivations behind the development of cluster
systems. However, with a cluster, each node has its own private main memory; applications do not
see a large global memory. In effect, coherence is maintained in software rather than hardware.
This memory granularity affects performance and, to achieve maximum performance, software
must be tailored to this environment. One approach to achieving large-scale multiprocessing while
retaining the flavor of SMP is NUMA.
CC-NUMA

Figure 20.12 depicts a typical CC-NUMA


organization. There are multiple independent
nodes, each of which is, in effect, an SMP
organization. Thus, each node contains
multiple processors, each with its own L1
and L2 caches, plus main memory. The node
is the basic building block of the overall CC-
NUMA organization.
CC-NUMA

Each node in the CC-NUMA system includes some main memory. From the point of view of the
processors, however, there is only a single addressable memory, with each location having a unique
system-wide address. When a processor initiates a memory access, if the requested memory location is
not in that processor’s cache, then the L2 cache initiates a fetch operation. If the desired line is in the
local portion of the main memory, the line is fetched across the local bus. If the desired line is in a remote
portion of the main memory, then an automatic request is sent out to fetch that line across the
interconnection network, deliver it to the local bus, and then deliver it to the requesting cache on that bus.
All of this activity is automatic and transparent to the processor and its cache
Disadvantages of CC-NUMA
Multicore
• A multicore processor, also known as a chip multiprocessor, combines two or more processor units
(called cores) on a single piece of silicon (called a die).
• Typically, each core consists of all of the components of an independent processor, such as
registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition
to the multiple cores, contemporary multicore chips also include L2 cache and, increasingly, L3
cache. The most highly integrated multicore processors, known as systems on chip (SoCs), also
include memory and peripheral controllers.
Hardware Performance Issues
Increase in Parallelism and Complexity
• Pipelining: Individual instructions are executed through a pipeline of stages so that while one
instruction is executing in one stage of the pipeline, another instruction is executing in another
stage of the pipeline.
• Superscalar: Multiple pipelines are constructed by replicating execution resources. This enables
parallel execution of instructions in parallel pipelines, so long as hazards are avoided.
• Simultaneous multithreading (SMT): Register banks are expanded so that multiple threads can
share the use of pipeline resources.
With each of these innovations, designers have over the years attempted to increase the performance
of the system by adding complexity. There is a practical limit to how far this trend can be taken,
because with more stages, there is the need for more logic, more interconnections, and more control
signals.
Hardware Performance Issues
Power Consumption
To maintain the trend of higher performance
,power requirements have grown exponentially as
chip density and clock frequency have risen.
Power considerations provide another motive for
moving toward a multicore organization.
Pollack’s Rule
“performance is roughly proportional to square
root of increase in complexity”

Before 2005 Performance improved mainly by increasing frequency and transistor count. Power Consumption Increased
with frequency and transistors until early [Link] that, power density issues forced designers to hold power levels
steady.
After 2005 Frequency and power hit physical limits. Designers turned to multicore architecture (increasing cores instead of
frequency).Multicore takes advantage of chip density while avoiding high power density.
Software Performance Issues
• small amounts of serial code impact performance.
• According to Amdahl’s law
Speedup= time to execute program on a single processor/ time to execute program on N parallel
processor
=1/((1-f)+f/N)
• where f is the fraction of code infinitely parallelizable with no schedule overhead
Main variable in a multicore organization:
Multicore organization Number of core processors on chip
Number of levels of cache on chip
Amount of shared cache
Case study
• Intel Core i3, i5, i7, i9 Processors Multicore Computers
• Main differences refer to:
• Performance/Heat
• Number of cores
• Maximum Main Memory/Cache Capacity
• Hyperthreading (yes/no)
• Turbo Boost (possibility of increasing frequency and heat)
• Built in Graphic Processor (yes/no)
• Price

You might also like