Vector and SIMD Processing Overview

Vector processors apply operations simultaneously to vectors of data rather than scalars. They use vector instructions like vector-vector, vector-scalar, and vector-memory. Vectorization improves performance by reducing software overhead. Memory is organized for concurrent access to maximize throughput. Vector supercomputers balance vector and scalar performance through architectural design goals like scalability and high I/O performance. SIMD computers apply the same instruction to multiple data elements using processing elements with local interconnects.

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views59 pages

Vector and SIMD Processing Overview

Uploaded by

K S Sanath Kashyap

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 8

Multivector and SIMD computers

-by
Prajwala T R
Dept of CSE
PESIT
Vector processing principles
Vector instruction types
• Vector-ordered collection of scalar items of
same type.
• Uses fixed addressing increment-stride
• Vector processor is ensemble of vector
registers,functional pipelines,regs,vectorizer.
• Vector processing –arithmetic and logical
operators are applied to vectors
• Vectorization
• Vector processors are faster ,efficient
• Reduces software overhead.
Vector instruction types
• Vector vector instruction
• Vector scalar instructions
• Vector memory instructions
• Gather and scatter instructions
– M->v1 X v0
– V1 X v0->M
• Masking instructions-compress or expand the
vector
Vector instructions in cray like
computers
Vector access memory schemes
• Vector operand specification
– Base address
– Stride
– Length
– Access rate should match pipeline rate
C(concurrent)-Access memory
organization
• M-way lower order interleaved memory
structure
• If stride is one successive address are accessed
with one cycle delay
• If stride 2 then access are separated by 2
minor cycle.
• Maximum throughput of m words per cycle.
Low order interleaving
S(simultaneous)-Access memory organization
C/S access memory organization
• N buses and m memory modules
• N buses operate in parallel(c-access)
• M modules are interleaved to allow c access.
• Most popular memory access module in
vector computers
NEC SX vector super computer
Relative vector/scalar performance
• Amdhals law redefined
• P=1/(1-f)+f/r
• Indicates speedup of vector to scalar
processing.
• The hardware speed ratio r is designer’s
choice.
Performance directed design goals
• Architectural design goals
– Maintaining good vector to scalar performance
balance
– Supporting scalability
– Increasing memory system capacity and
performance
– Providing high performance i/o and easy access to
network
Balances vector scalar ratio
• Scalar processing is indispensible part of
general purpose architecture
• Vector balance point
• Vector performance
– 9 MFLOPS-vector
– 1 MFLOPS -scalar
• I/O and networking performance
– With speed of supercomputers increasing
problem size increases and I/O bandwidth
requirement as well
– I/O rate
– Cray systems
– 100GBPS transfer rate
• Memory demand
– Latency and bandwidth
– Effective memory hierarchy
– Memory sizes available on chip is rapidly
increasing.
– Relative speed mismatch
• Scalability
– Support of shared memory with increasing
number of processors and memory port.
– Constraints
• Latency
• Communication overhead
Table of comparison
Cray Y MP 816 system organization
C-90 and clusters
Cray MPP systems
• Off the shell components are not suitable.
• Balance of speed between processor memory
and I/O required.
• Lack of efficient memory operation like
synchronization and communication in RISC
• All the lead to introduction of MPP
• T3D
– 150 MHz clock, partition to emulate as SIMD or
MIMD dynamically
– Distributed memory.
– Mach based microkernel operating systems
– Program debugging and performance tools
Development phases
Fujitsu VP2000
Fujitsu 5000
Mainframe computers
LINPACK results
Compound vector processing
• CVF-compound vector function- composite function of
vector operations are converted from looping structure
of linked scalar operation
• Ex:
Do I=1,N
Load r1,x(I)
Load r2,y(I)
Mul r1,s
Add r2,r1
Store y(I),r2
continue
After vectorization
M(x:x+N-1)->v1
N(y:y+N-1)->v2
S X v1->v1
V2+v1->v2
V2->M(y:y+N-1)
• CVF
Y(I)=S X X(I)+Y(I)
Compound vector functions
• Vector loops and chains
– The loop count is determined at compile time or
run time
• Strip mining-when vector has length greater
than vector register
– Vector registers are not allocated to any other
operation until all segments of current vector are
handled.
• Functional unit independence
– Vector registers act as interface between pipeline
stages
– Vector registers and functional units must be
reserved before a vector chain is established
example
Timing diagram
Chaining limitations
• Number of vector operations
• Number of functional pipeline units
• Number of interfaces for adjacent pipelining
stages
• Degree of chaining depends on how many
unary and binary operators.
• How many scalar operations and vector
operations
• Vector recurrence-
– Functional pipeline feed back input to its own
source registers
– Ex-component counter
What is Systolic Computing?
A set of simple processing elements with local connections
which takes external inputs and processes them in a
predetermined manner in a pipelined fashion
Host Station in Systolic Architecture

• As a result of the local-communication scheme, a systolic network is easily

extended without adding any burden to the I/O.
• Systolic Array.
Control Control Control
Unit Unit Unit
……..
Processing Processing Processing
Units Units Units

Interconnection Network(Local)

• Systolic arrays usually pipe data from an outside host and also pipe the
results back to the host.
Multipipeline networking
• Pipeline net-constructed by interconnecting
multiple functional pipelines using BCN
• 2 level architecture of pipelining
Program graph transformation

• rule 1:Adding k delays to any node in systolic

graph and then subtracting k delay from all
incoming edges
• Rule 2:multiply all edges with scaling constant
• 0-graph is called systolic program graph
SIMD computer organization
• Distributed memory model
– Local memory
– scalar and vector control unit
– All processing elements are interconnected by
routing network
– Masking logic
Shared memory model
• Alignment network
• Alignment network
• must be properly set to avoid conflicts
• SIMD instructions
– All instructions must use vector operands of equal
length n
– Data routing functions
• Host and I/O
– Control memory
– Mass storage and graphics display results
CM-2 architecture
• Front end
• Sequencer
• Modes of communication
– Broadcasting
– Global combining
– Scalar memory bus
• Processing nodes
– 32 bit slice processor
– Floating point accelerator
– Bit slice ALU
• Hypercube routers
• applications
MasPar MP architecture
MasPar MP architecture
• Array control unit
• Scalar RISC processor
• Uses demand paging
• Fetches and decodes the instructions
• PE array
• 1024 PE
• 64 PE clusters-16 clusters per PE
• multistage cross bar interconnection netwrok
• Parallel disk arrays

Parallel Algorithms for Multi-Processor Systems
No ratings yet
Parallel Algorithms for Multi-Processor Systems
28 pages
Understanding TinyOS and nesC Programming
No ratings yet
Understanding TinyOS and nesC Programming
22 pages
Computer Architecture Unit 1 Notes
No ratings yet
Computer Architecture Unit 1 Notes
12 pages
Advanced Computer Architecture Q&A Bank
No ratings yet
Advanced Computer Architecture Q&A Bank
5 pages
Digital Logic and Computer Organization
No ratings yet
Digital Logic and Computer Organization
10 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
34 pages
Fault Tolerance in Systolic Arrays
No ratings yet
Fault Tolerance in Systolic Arrays
42 pages
Address Translation in Computer Systems
No ratings yet
Address Translation in Computer Systems
17 pages
Characteristics of Multiprocessor Systems
No ratings yet
Characteristics of Multiprocessor Systems
26 pages
80386 Microprocessor Security Analysis
No ratings yet
80386 Microprocessor Security Analysis
9 pages
Multiprocessor System Interconnects
No ratings yet
Multiprocessor System Interconnects
53 pages
Pipelining and Superscalar Techniques
No ratings yet
Pipelining and Superscalar Techniques
49 pages
Chapter 03 Assembly Language
100% (1)
Chapter 03 Assembly Language
96 pages
Overview of 8085 Microprocessor Registers
No ratings yet
Overview of 8085 Microprocessor Registers
5 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
74 pages
Memory and I/O Systems Overview
No ratings yet
Memory and I/O Systems Overview
37 pages
Internal Organization of RAM Chips
No ratings yet
Internal Organization of RAM Chips
19 pages
Distributed Transaction Recovery Protocols
No ratings yet
Distributed Transaction Recovery Protocols
26 pages
RISC Processor 5-Stage Architecture Overview
No ratings yet
RISC Processor 5-Stage Architecture Overview
44 pages
Keyboard Interfacing: 8086 vs 8051
No ratings yet
Keyboard Interfacing: 8086 vs 8051
39 pages
RISC vs CISC: Architecture Comparison
No ratings yet
RISC vs CISC: Architecture Comparison
5 pages
Uniprocessor Architecture and Parallelism
No ratings yet
Uniprocessor Architecture and Parallelism
18 pages
RGPV COA Unit 4 Memory Overview
No ratings yet
RGPV COA Unit 4 Memory Overview
25 pages
Advanced Embedded Systems Syllabus
100% (1)
Advanced Embedded Systems Syllabus
11 pages
8086 Microprocessor Overview
No ratings yet
8086 Microprocessor Overview
89 pages
MPMC Unit 3 - ECE
No ratings yet
MPMC Unit 3 - ECE
70 pages
Intensity Transformations in Image Processing
No ratings yet
Intensity Transformations in Image Processing
32 pages
Parallel Processing in Computer Architecture
No ratings yet
Parallel Processing in Computer Architecture
29 pages
Understanding Memory Types and Units
No ratings yet
Understanding Memory Types and Units
50 pages
Understanding DMA Controller Operations
No ratings yet
Understanding DMA Controller Operations
14 pages
Understanding Amdahl's Law in Computing
No ratings yet
Understanding Amdahl's Law in Computing
25 pages
Accessing I/O Devices in Computer Systems
No ratings yet
Accessing I/O Devices in Computer Systems
83 pages
8086 Microprocessor Architecture Overview
No ratings yet
8086 Microprocessor Architecture Overview
33 pages
Evolution of Computer Systems
No ratings yet
Evolution of Computer Systems
194 pages
Register Transfer and Microoperations
No ratings yet
Register Transfer and Microoperations
13 pages
Mixed Reality Spatial Detection Effects
No ratings yet
Mixed Reality Spatial Detection Effects
16 pages
AI Problem Solving Agents Overview
No ratings yet
AI Problem Solving Agents Overview
26 pages
Priority Scheduling in Embedded Systems
No ratings yet
Priority Scheduling in Embedded Systems
17 pages
Overview of Von Neumann Architecture
No ratings yet
Overview of Von Neumann Architecture
5 pages
80386 Protection Mechanism Overview
No ratings yet
80386 Protection Mechanism Overview
26 pages
8085 Microprocessor Architecture Overview
No ratings yet
8085 Microprocessor Architecture Overview
3 pages
8-Bit Multiplication in GNU 8085
No ratings yet
8-Bit Multiplication in GNU 8085
3 pages
Introduction to Artificial Intelligence
No ratings yet
Introduction to Artificial Intelligence
15 pages
VHDL Design for Mealy Sequential Machine
No ratings yet
VHDL Design for Mealy Sequential Machine
36 pages
Scalable Performance in Computer Architecture
No ratings yet
Scalable Performance in Computer Architecture
46 pages
8086 Microprocessor Register Overview
100% (1)
8086 Microprocessor Register Overview
6 pages
RISC Architecture Overview and Features
No ratings yet
RISC Architecture Overview and Features
14 pages
FPGA Implementation in WCDMA Systems
No ratings yet
FPGA Implementation in WCDMA Systems
55 pages
Address Decoding in 8085 Microprocessor
100% (1)
Address Decoding in 8085 Microprocessor
5 pages
Microprocessors and Interfacing Overview
No ratings yet
Microprocessors and Interfacing Overview
1 page
Embedded Systems Course Overview
No ratings yet
Embedded Systems Course Overview
3 pages
Instruction-Level Parallelism Overview
100% (1)
Instruction-Level Parallelism Overview
57 pages
Swapping in OS: Benefits and Drawbacks
No ratings yet
Swapping in OS: Benefits and Drawbacks
12 pages
Superscalar Architecture Explained
No ratings yet
Superscalar Architecture Explained
2 pages
Multimedia System Applications Overview
No ratings yet
Multimedia System Applications Overview
16 pages
Multivector and SIMD Computing Overview
No ratings yet
Multivector and SIMD Computing Overview
42 pages
Vector Processing in Computer Architecture
No ratings yet
Vector Processing in Computer Architecture
42 pages
Understanding Vector Processors in SIMD
No ratings yet
Understanding Vector Processors in SIMD
83 pages
Matrix and Vector Processor Overview
No ratings yet
Matrix and Vector Processor Overview
12 pages
4-Stage Pipeline and SIMD Processors
No ratings yet
4-Stage Pipeline and SIMD Processors
51 pages
Sonic Wall TZ Series
No ratings yet
Sonic Wall TZ Series
14 pages
MX2125 Accelerometer Overview
100% (1)
MX2125 Accelerometer Overview
13 pages
AISSCE Computer Science Practical Exam 2022
No ratings yet
AISSCE Computer Science Practical Exam 2022
1 page
Teleworking Lesson Plan for ICT Class
No ratings yet
Teleworking Lesson Plan for ICT Class
5 pages
HEADS Site: Mining Software Solutions
No ratings yet
HEADS Site: Mining Software Solutions
16 pages
Low Power Floating-Point Unit Overview
No ratings yet
Low Power Floating-Point Unit Overview
10 pages
Document Analysis and Historical Data
No ratings yet
Document Analysis and Historical Data
19 pages
Xth Grade Computer Project Guidelines
No ratings yet
Xth Grade Computer Project Guidelines
6 pages
Nota Pembelian Barang Puskesmas
No ratings yet
Nota Pembelian Barang Puskesmas
1 page
Interpolation Techniques Explained
No ratings yet
Interpolation Techniques Explained
1 page
Smart IT Solutions for Government Services
No ratings yet
Smart IT Solutions for Government Services
6 pages
Dunzo Interview Questions Guide
No ratings yet
Dunzo Interview Questions Guide
67 pages
LMR Plus Electric Fire Pump Controllers
No ratings yet
LMR Plus Electric Fire Pump Controllers
2 pages
S3T: Self-Supervised Music Classification
No ratings yet
S3T: Self-Supervised Music Classification
5 pages
Telenor Bill Payment Options
No ratings yet
Telenor Bill Payment Options
19 pages
New Media 1000b Course Schedule
No ratings yet
New Media 1000b Course Schedule
19 pages
Enhancing LVRT in DFIG Wind Turbines
No ratings yet
Enhancing LVRT in DFIG Wind Turbines
15 pages
New Index Calculus Algorithm Overview
No ratings yet
New Index Calculus Algorithm Overview
23 pages
Zoom Guide for Faculty and Students
No ratings yet
Zoom Guide for Faculty and Students
10 pages
Loops and Functions in Python
No ratings yet
Loops and Functions in Python
13 pages
Fluent CFD Tutorial: First Exercise Guide
No ratings yet
Fluent CFD Tutorial: First Exercise Guide
2 pages
Cloud Security Models: SaaS, PaaS, IaaS
No ratings yet
Cloud Security Models: SaaS, PaaS, IaaS
9 pages
Snoop Transactions in AMBA CHI Protocol
No ratings yet
Snoop Transactions in AMBA CHI Protocol
18 pages
E-Learning Guide for T-Rex Software Training
No ratings yet
E-Learning Guide for T-Rex Software Training
35 pages
Powerflex 755T in Lifting Applications: Id: Qa37263 - Access Levels: Everyone
No ratings yet
Powerflex 755T in Lifting Applications: Id: Qa37263 - Access Levels: Everyone
10 pages
PDD Book
No ratings yet
PDD Book
1,135 pages
Krita Software: Quick Tutorials Guide
No ratings yet
Krita Software: Quick Tutorials Guide
40 pages
Understanding Exponents and Powers
No ratings yet
Understanding Exponents and Powers
4 pages
CHS 2nd QTR Week 5 Application Software
No ratings yet
CHS 2nd QTR Week 5 Application Software
32 pages
MCOSMOS Software Bulletin
No ratings yet
MCOSMOS Software Bulletin
16 pages

Vector and SIMD Processing Overview

Uploaded by

Vector and SIMD Processing Overview

Uploaded by

Chapter 8

Multivector and SIMD computers

• As a result of the local-communication scheme, a systolic network is easily

• rule 1:Adding k delays to any node in systolic

You might also like