Chapter 5
A Closer Look at
Instruction Set
Architectures
5.2 Instruction Formats
(Design Considerations)
In designing an instruction set, consideration is given to:
1. Instruction length.
Whether short, long, or variable.
2. Number of operands.
Zero, one, two, or three
3. Number of addressable registers.
More registers, less memory access, more bits to identify
4. Addressing modes.
Choose any or all: direct, indirect or indexed.
5. Memory organization.
Whether byte- or word- addressable.
2
5.2 Instruction Formats
(Byte Ordering Introduction)
• Byte ordering, ( endianness), is another major
architectural consideration.
• If we have a two-byte integer, the integer may be
stored so that the least significant byte is followed
by the most significant byte or vice versa.
3
Big endian
– Big endian machines store the most significant byte
first (at the lower address).
4
Little endian
In little endian machines, the least significant byte is followed by
the most significant byte.
Little Endian
5
Endianness Example
• Show the big endian and small endian
arrangements of the bytes in 1234567816
6
Internal Storage in the CPU
Architectural Choices
• Stack architecture
– Instructions and operands are implicitly taken from the stack.
– A stack cannot be accessed randomly.
• Accumulator architecture
– One operand of a binary operation is implicitly in the accumulator.
– The second operand is in memory, creating lots of bus traffic.
– Used by MARIE
• General purpose register (GPR) architecture,
– Registers can be used instead of memory.
– Faster than accumulator architecture.
– Efficient implementation for compilers.
– Results in longer instructions.(To address multiple registers)
In choosing one over the other, the tradeoffs are simplicity and cost of hardware design with
execution speed and ease of use.
7
5.2 Instruction Formats
GPR Architecture Classifications
• Most systems today are GPR systems.
• There are three types:
– Memory-Memory
– where two or three operands may be in memory.
– A single instruction both sources destination are memory addresses.
– Register-Memory
– where at least one operand must be in a register
– ADD Register1, Memory_Location
– Load-Store
– where no operands may be in memory. All in registers.
– Only Load and Store access memory
– RISC architectures (like MIPS and ARM)
• The number of operands and the number of available registers has a
direct effect on instruction length.
8
Stack architecture
• Mechanism:
• Stack machines rely on two primary types of instructions: one-
operand (PUSH, POP) and zero-operand (ADD, MUL).
• Data Movement:
• LOAD and STORE instructions require a single memory address
operand to move data between main memory and the stack.
• PUSH and POP operations fundamentally interact only with the
stack's Top Element (TOS).
• Computation:
• Binary instructions (like ADD or MULT) are zero-operand. They
implicitly use the top two items on the stack, consume them, and
push the single result back onto the stack.
9
Memory Stack (Diagram)
10
Memory Stack (Description)
• The implementation of a stack in the CPU is done by assigning a
portion of memory to a stack operation and using a processor
register as a stack pointer (SP).
• The figure shows a portion of computer memory partitioned into three
segments: program, data, and stack.
• The Program Counter (PC) points at the address of the next
instruction in the program which is used during the fetch phase to read
an instruction.
• The Address Registers (AR) points at an array of data which is used
during the execute phase to read an operand.
• The Stack Pointer (SP) points at the top of the stack which is used to
push or pop items into or from the stack.
• We assume that the items in the stack communicate with a Data
Register (DR).
11
Reverse Polish Notation (RPN)
• The common mathematical method of writing arithmetic expressions
imposes difficulties when evaluated by a computer.
• The Polish mathematician Lukasiewicz showed that arithmetic
expressions can be represented in prefix notation as well as postfix
notation.
12
Evaluation of Arithmetic Expression
1. 3, 4: PUSH 3, PUSH 4.
2. *: POP 4, POP 3, multiply, PUSH 12.
3. 5, 6: PUSH 5, PUSH 6. (Stack now holds: 12, 5, 6).
4. *: POP 6, POP 5, multiply, PUSH 30. (Stack now holds: 12, 30).
5. +: POP 30, POP 12, add, PUSH 42.
13
Instruction Formats
Instructions are categorized into different formats with respect to the
operand fields in the instructions:
1. Three Address Instructions (shortest program, longest instruction)
2. Two Address Instruction (most common)
3. One Address Instruction (uses an implicit accumulator)
4. Zero Address Instruction (stack-based, longest program, shortest
instruction)
5. RISC Instructions (Load-Store model)
• We will look at how the expression X = (A+B) * (C+D) is handled by each
type of architecture
14
Zero Address Instruction
• A stack-organized computer does not use an address field for the
instructions ADD and MUL.
• The PUSH and POP instructions, however, need an address field to
specify the operand that communicates with the stack.
• To evaluate arithmetic expressions in a stack computer, it is necessary
to convert the expression into reverse polish notation
• The program to evaluate X=(A+B)×(C+D) will be written for a stack-
organized computer.
PUSH A TOS ← M[A]
PUSH B TOS ← M[B]
ADD TOS ← (A+B)
PUSH C TOS ← M[C]
PUSH D TOS ← M[D]
ADD TOS ← (C+D)
MUL TOS ← (C+D) * (A+B)
POP X M[X] ← TOS
15
One Address Instruction
• One address instructions use an implied accumulator (AC)
register for all data manipulation.
• For multiplication and division there is a need for a second
register.
• However, here we will neglect the second register and assume
that the AC contains the result of all operations.
• The program to evaluate X=(A+B)×(C+D) is:
LOAD A AC ← M[A]
ADD B AC ← AC + M[B]
STORE T M[T] ← AC
LOAD C AC ← M[C]
ADD D AC ← AC + M[D]
MUL T AC ← AC * M[T]
STORE X M[X] ← AC
16
Two Address Instruction
• Two address instructions are the most common in commercial
computers.
• Here again each address field can specify either a processor register
or a memory word.
• The program to evaluate X=(A+B)×(C+D) is as follows:
MOV R1, A R1 ← M[A]
ADD R1, B R1 ← R1 + M[B]
MOV R2, C R2 ← M[C]
ADD R2, D R2 ← R2 + M[D]
MUL R1, R2 R1 ← R1 * R2
MOV X, R1 M[X] ← R1
17
Three Address Instruction
• Computers with three-address instruction formats can use each
address field to specify either a processor register or a memory
operand.
• The program in assembly language that evaluates X=(A+B)×(C+D) is
shown below:
ADD R1, A, B R1 ← M[A] + M[B]
ADD R2, C, D R2 ← M[C] + M[D]
MUL X, R1, R2 M[X] ← R1 * R2
• The advantage of three-address format is that it results in short
programs when evaluating arithmetic expressions.
• The disadvantage is that the binary-coded instructions require too
many bits to specify three addresses.
18
RISC Instruction
• The instruction set of a typical RISC processor is restricted to the use of
load and store instructions when communicating between memory and
CPU.
• All other instructions are executed within the registers of the CPU
without referring to memory.
• A program for a RISC type CPU consists of LOAD and STORE
instructions that have one memory and one register address, and
computational-type instructions that have three addresses with all
three specifying processor registers.
• The following is a program to evaluate X=(A+B)×(C+D):
LOAD R1, A R1 ← M[A]
LOAD R2, B R2 ← M[B]
LOAD R3, C R3 ← M[C]
LOAD R4, D R4 ← M[D]
ADD R1, R1, R2 R1 ← R1 + R2
ADD R3, R3, R4 R3 ← R3 + R4
MUL R1, R1, R3 R1 ← R1 * R3
STORE X, R1 M[X] ← R1
19
5.2 Instruction Formats
Expanding Opcodes Introduction
• We have seen how instruction length is affected by the
number of operands supported by the ISA.
• In any instruction set, not all instructions require the same
number of operands.
• Operations that require no operands, such as HALT,
necessarily waste some space when fixed-length instructions
are used.
• One way to recover some of this space is to use expanding
opcodes.
20
5.2 Instruction Formats
Encoding Options
❑ A system has 16 registers and 4K of memory.
❑ We need 4 bits to access one of the registers.
❑ We also need 12 bits for a memory address.
❑If the system is to have 16-bit instructions, we
have two choices for our instructions:
❑ Option 1: space to
3 register addresses
(3 x 4) = 12 bits. 4-bit
Opcode.
❑ Option 2: Space for
1 memory address
12-bit. 4-bit Opcode
21
5.2 Instruction Formats
Encoding with Expanding Opcodes
• If we allow the length of the opcode to vary, we
could create a very rich instruction set:
16-bit #inst #addresses
Is there something missing from this instruction set?
22
EXAMPLE 5.11
Given 8-bit instructions, is it possible to use
expanding opcodes to allow the following to
be encoded? If so, show the encoding.
• Three instructions with two 3-bit operands
• Two instructions with one 4-bit operand
• Four instructions with one 3-bit operand
23
EXAMPLE 5.11 – The Encoding
Group 1:
2 operands
Group 2:
1 operand
Group 3:
1 operand
24
5.3 Instruction types
Instructions fall into several broad categories that you should be familiar with:
1. Data Movement: These are the most frequently used instructions. They move
data between registers, or between registers and memory (e.g., LOAD,
STORE, MOVE).
2. Arithmetic: Instructions perform standard arithmetic functions, such as
addition, subtraction, multiplication, and division.
3. Boolean: These perform logical operations (like AND, OR, XOR, NOT).
4. Bit Manipulation: This includes SHIFT (arithmetic and logical) and ROTATE
instructions, which are used for fast multiplication/division by powers of 2, or
isolating specific bits.
5. I/O (Input/Output): Instructions manage the transfer of data between the CPU
and external devices.
6. Control Transfer: These instructions alter the normal sequential flow of
program execution, including conditional and unconditional branches (JUMP),
skips, and procedure calls (subroutines).
25
5.4 Addressing
• Addressing modes specify where an operand is
located.
• They can specify a
– Constant
– Register
– Memory location.
• The actual location of an operand is its Effective
Address (E.A.)
• Certain addressing modes allow u s to determine
the address of an operand dynamically.
26
5.4 Addressing (Basic Modes)
1. Immediate addressing
– Is where the data is part of the instruction.
– Fastest, no memory or register access
– Example: LOAD 500 - The value 500 is the operand.
2. Direct addressing
– Is where the address of the data is given in the instruction.
– Example: LOAD 500 - The value at memory address 500 is the
operand.
3. Register addressing
– Is where the data is located in a register.
– Example: LOAD R1 - The value in Register R1 is the operand.
4. Indirect addressing
– Gives the address of the address of the data in the instruction.
5. Register indirect addressing
– Uses a register to store the address of the operand.
– Faster than indirect addressing. 27
5.4 Addressing (Advanced Modes)
6. Indexed addressing
– ( E.A = R + address in operand)
– Uses a register as an offset, which is added to the address in the operand to
determine the effective address of the data.
7. Based addressing
– Is similar to indexed addressing, except that a dedicated base register is
used instead of an index register.
• The difference between these two is that an
– Index register holds an offset relative to the address given in the instruction.
– A base register holds a base address where the address field represents
a displacement from this base.
– Used for accessing arrays and structures.
8. Stack addressing
– The operand is assumed (implicitly) to be on top of the stack.
– No address field in the instruction.
28
5.4 Addressing
Calculation Example (Setup)
• For the instruction shown, what value is loaded into
the accumulator for each addressing mode?
29
5.4 Addressing
Calculation Example (Solution)
• These are the values loaded into the accumulator
for each addressing mode.
30
5.5 Instruction-Level Pipelining
Introduction
• Some CPUs divide the fetch-decode-execute cycle into
smaller steps, known as pipeline stages.
• These smaller steps can often be executed in parallel to
increase throughput.
• Goal: To keep the processor's functional units utilized
to complete instructions much faster than one full
cycle at a time.
• Such parallel execution is called Instruction-Level
Pipelining (IPL)
31
5.5 Instruction-Level Pipelining
Breaking Down the Cycle
• Suppose a fetch-decode-execute cycle were broken
into the following smaller steps (Six-Stage Model):
1. S1: Fetch instruction. 4. S4: Fetch operands.
2. S2: Decode opcode. 5. S5: Execute instruction.
3. S3: Calculate effective 6. S6: Store result.
address of operands.
• Six-Stage Example: Suppose we have a six-stage
pipeline.S1 fetches the instruction, S2 decodes it,
S3 determines the address of the operands, S4
fetches them, S5 executes the instruction, and S6
stores the result.
32
5.5 Instruction-Level Pipelining
Overlapping Execution
• For every clock cycle, one small step is carried out,
and the stages are overlapped.
S1. Fetch instruction. S4. Fetch operands.
S2. Decode opcode. S5. Execute.
S3. Calculate effective S6. Store result.
address of operands.
33
Theoretical Speedup of a Pipeline
(Part 1)
• To calculate speedup, we need to define our variables:
K = # of stages/ instruction
t p = time / stage.
n = # of tasks (instructions)
• Each instruction represents a task in the pipeline.
• Time for the First Task: 1st task (instruction) requires k t p , to
complete in a k-stage pipeline.
• The remaining (n - 1) tasks, one task will complete every clock cycle.
• Total time to complete the remaining tasks = (n - 1)tp.
• Total pipelines time: to complete n tasks using a k-stage pipeline
requires
(k tp) + (n - 1)tp = (k + n - 1)tp.
34
Theoretical Speedup of a Pipeline
(Part 2)
• Total Non-Pipelined Time: Without a pipeline, the total time for n
tasks is (n tn) where tn is the time to complete one instruction (k
tp).
• Speedup: is the ratio of non-pipelined time to pipelined time:
• Maximum Theoretical Speedup: As the number of task
n approaches infinity, the term (k + n - 1) approaches n,
which results in a theoretical speedup of:
The maximum theoretical speedup is simply equal to the number of
stages, K. 35
5.5 Instruction-Level Pipelining
Pipelining Hazards
• Our equations take a number of things for granted
1. we have to assume that the architecture supports fetching
instructions and data in parallel.
2. we assume that the pipeline can be kept filled at all times. This is not
always the case.
• Pipeline hazards arise that cause pipeline conflicts and stalls
1. Resource conflicts (Structural Hazards): Two instructions need the
same hardware resource at the same time.
2. Data dependencies (Data Hazards): An instruction needs the result
of a previous instruction that has not yet completed execution and
storage.
3. Conditional branching (Control Hazards): An instruction alters the
program flow, and the instructions already in the pipeline must be
flushed out and discarded if the branch is taken.
36
Pipelining Example Calculation
A nonpipelined system takes 200ns to process a task. The
same task can be processed in a 5-stages pipeline with a clock
cycle of 40ns. Determine the speedup ratio of the pipeline for
200 tasks. What is the maximum speedup that could be
achieved with the pipeline unit over the nonpipelined unit?
Home exercise (1)
Q1.1) A nonpipelined system takes 40ns to execute
instructions. The instructions can be processed by
a 4-segment pipeline with a clock cycle of
10ns. Determine the speedup of the pipeline for 22
instructions.
Q1.2) What is the maximum speedup that could be
achieved with this pipeline
38
Home exercise (2)
Q2.1) A nonpipelined system takes 100ns to
execute instructions. The instructions can be
processed by a 5-segment pipeline with a clock
cycle of 20ns. Determine the speedup of the
pipeline for 31 instructions.
Q2.2) What is the maximum speedup that could be
achieved with this pipeline?
39
Complex Instruction Set Computers (CISC)
• The primary goal of CISC architecture is to
complete a task in as few lines of assembly as
possible.
• This is achieved by designing complex
instructions by building processor hardware that
is capable of understanding and executing a series
of operations.
40
CISC (MULT Example)
• For this particular task, a CISC processor would come
prepared with a specific instruction (we'll call it
"MULT").
• When executed, (MULT 300, 200) instruction
1. Loads the two values into separate registers
2. Multiplies the operands in the execution unit
3. Then stores the product in the appropriate location or
register.
– Thus, the entire task of multiplying two numbers can be
completed with one complex instruction, including loading,
calculating and storing.
41
Characteristics of CISC
• A larger number of instructions – typically from
100 to 250 instructions, including highly specialized
instructions.
• A large variety of addressing modes – typically
from 5 to 20 different modes, offering flexible ways
to access data
• Variable-length instruction formats to ensure high
code density and minimize memory usage.
• Have instructions that manipulate operands in
memory, reducing the need for explicit load/store
operations.
42
CISC (Advantages)
• It operates directly on the computer's memory banks and
does not require the programmer to explicitly call any
loading or storing functions.
• The compiler has to do very little work to translate a
high- level language statement into assembly.
• Because the length of the code is relatively short, very
little RAM is required to store instructions. The emphasis
is put on building complex instructions directly into the
hardware.
43
Reduced Instruction Set Computers (RISC)
• RISC processors only use simple instructions
that can be executed within one clock cycle.
• Thus, the "MULT" command described for CISC
could be divided into three separate commands:
– LOAD, which moves data from the memory bank to a
register,
– MULT, which finds the product of two operands located
within the registers,
– STORE, which moves data from a register to the
memory banks.
44
RISC (Example Code)
• In order to perform the exact series of steps
described in the CISC approach, a programmer
would need to code four lines of assembly:
LOAD R1, 200
LOAD R2, 300
MULT R2, R1
STORE 200, R2
45
RISC (Trade-offs)
• At first, this may seem like a much less efficient
way of completing the operation.
– Because there are more lines of code
– More RAM is needed to store the assembly
level instructions.
– The compiler must also perform more work to convert
a high-level language statement into code of this form.
46
RISC (Advantages)
• However, the RISC strategy brings some very
important advantages.
– Execution Speed: Because each instruction requires only one
clock cycle to execute, the entire program will execute in
approximately the same amount of time as the multi- cycle
"MULT" command.
– Pipelining: Because all of the instructions execute in a
uniform amount of time (i.e. one clock), pipelining is
possible, significantly boosting throughput.
– Hardware Simplification: Reduced instructions require less
transistors of hardware space than the complex instructions,
free HW space.
– Register Count: The free space leaves more room for general
purpose registers, further reducing slow memory accesses. 47
Characteristics of RISC
• Instruction Set: Relatively few instructions and few addressing
modes.
• Memory Access: Strictly limited to LOAD and STORE
instructions.
• Operation Location: All operations are done exclusively within
the CPU's registers.
• Format: Fixed-length, easily decoded instruction format.
• Execution: Single-cycle instruction execution.
• Control: Hardwired control logic is favored over
microprogrammed control (which would slow down the cycle).
• Registers/Pipelining: Features a large number of registers and
an efficient instruction pipeline.
• Compiler Dependency: Relies heavily on compiler support for
the efficient translation of complex high-level programs into the
optimal sequence of simple machine language programs.
48