0% found this document useful (0 votes)
7 views7 pages

Lecture 46 - Vector Processing

The document discusses Vector (SIMD) Processing, which allows parallel operations on multiple data elements using single-instruction multiple-data (SIMD) instructions. It highlights the importance of data parallelism and vector registers in enhancing processor performance, as well as the role of vectorizing compilers in optimizing loops for vector instructions. An example illustrates how conventional assembly instructions can be replaced with vector instructions to improve efficiency in processing arrays.

Uploaded by

24f3002835
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views7 pages

Lecture 46 - Vector Processing

The document discusses Vector (SIMD) Processing, which allows parallel operations on multiple data elements using single-instruction multiple-data (SIMD) instructions. It highlights the importance of data parallelism and vector registers in enhancing processor performance, as well as the role of vectorizing compilers in optimizing loops for vector instructions. An example illustrates how conventional assembly instructions can be replaced with vector instructions to improve efficiency in processing arrays.

Uploaded by

24f3002835
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Vector (SIMD)Processing

Carl Hamacher, Zvonko Vranesic and Safwat Zaky, Computer Organization and
Embedded Systems, (6e), McGraw Hill Publication, 2017.
Ch 12: 12.2

1
Vector (SIMD)Processing
 Many computationally demanding applications involve programs that use loops to perform operations on
vectors of data, where a vector is an array of elements such as integers or floating-point numbers.
 When a processor executes the instructions in such a loop, the operations are performed one at a time on
individual vector elements.
 Many instructions need to be executed to process all vector elements.
 A processor can be enhanced with multiple ALUs.
 It is possible to operate on multiple data elements in parallel using a single instruction.
 Such instructions are called single-instruction multiple-data (SIMD) instructions. They are also called vector
instructions.
 These instructions can only be used when the operations performed in parallel are independent. This is known
as data parallelism.
 The data for vector instructions are held in vector registers, each of which can hold several data elements. The
number of elements, L, in each vector register is called the vector length. 2

 It determines the number of operations that can be performed in parallel on multiple ALUs.
Vector (SIMD)Processing
 The vector instruction
VectorAdd.S Vi, Vj, Vk
 computes L sums using the elements in vector registers Vj and Vk, and places the resulting sums in
vector register Vi.
 Suffix S denotes the size of each data element
 Special instructions are needed to transfer multiple data elements between a vector register and the
memory. The instruction
VectorLoad.S Vi, X(Rj)
 causes L consecutive elements beginning at memory location X + [Rj] to be loaded into vector
register Vi. Similarly, the instruction
VectorStore.S Vi, X(Rj)
 causes the contents of vector register Vi to be stored as L consecutive locations in the memory.
3
Vectorization
 In a source program written in a high-level language, loops that operate on arrays of integers or
floating-point numbers are vectorizable if the operations performed in each pass are independent of
the other passes.
 Using vector instructions reduces the number of instructions that need to be executed
 Enables the operations to be performed in parallel on multiple ALUs.
 A vectorizing compiler can recognize such loops, if they are not too complex, and generate vector
instructions.

4
Vectorization Example
 Consider vectorization of the loop given below

 Assume that the starting locations in memory for arrays A, B, and C are in registers R2, R3, and R4.
Using conventional assembly-language instructions, the compiler may generate the loop.

5
Vectorization Example Contd..
 The Load, Add, and Store instructions at the beginning of the loop are replaced by
corresponding vector instructions that operate on L elements at a time.
 The vectorized loop requires only N/L passes to process all of the data in the arrays.
 With L elements processed in each pass through the loop, the address pointers in
registers R2, R3, and R4 are incremented by 4L, and the count in register R5 is
decremented by L.

6
Vectorization Example Contd..
 Vectorized form of the loop

You might also like