Chapiter 3
Chapiter 3
PROCESSOR
(The MIPS R3000 microprocessor)
MICROPROCESSORS
A program stored in the memory provides instructions to the CPU to perform a specific
action. This action can be a simple addition. It is function of the CPU to fetch the program
instructions from the memory and execute them.
The CPU contains a number of registers to store information inside the CPU temporarily.
Registers inside the CPU can be 8-bit, 16-bit, 32-bit or even 64-bit depending on the
CPU.
The CPU also contains Arithmetic and Logic Unit (ALU). The ALU performs arithmetic
(add, subtract, multiply, divide) and logic (AND, OR, NOT) functions.
The CPU contains a program counter also known as the Instruction Pointer to
point the address of the next instruction to be executed.
Instruction Decoder is a kind of dictionary which is used to interpret the meaning of the
instruction fetched into the CPU. Appropriate control signals are generated according to the
meaning of the instruction. 2
1
MICROPROCESSORS
Inside the CPU:
Address Bus
Instruction Pointer
Instruction Register
Instruction Decoder Timing Control Bus
and control signals are
Flags
ALU generated
Data Bus
Internal Register A
Busses Register B
Register C
Register D
CPU Clocking
Clock period
Clock (cycles)
Data transfer
and computation
Update state
2
CPU Time
Instructio ns Clock cycles Seconds
CPU Time
Program Instructio n Clock cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
3
Instruction Set
The words of a computer’s language are called instructions and
the vocabulary of commands understood by a given architecture is
called an instruction set. Common groups of instructions are:
• Arithmetic instructions
• Logical instructions
• Data transfer instructions
• Conditional branch instructions
• Unconditional jump instructions
7
4
RISC Principles
10
5
MIPS Microprocessor
MIPS (Microprocessor without Interlocked Pipelined Stages) is a
family of reduced instruction set computer (RISC) instruction set
architectures (ISA) ; developed by MIPS Computer Systems, now MIPS
Technologies, based in the United States.
11
12
6
Overview of the MIPS Architecture
...
4 bytes per word Memory
Up to 232 bytes = 230 words
...
13
7
Special-Purpose Registers
-PC (Program Counter), points to the next instruction to be executed
-Hi :High result of multiplication and division operations
-Lo :Low result of multiplication and division operations
-SR (status): Status Register, Contains the interrupt mask and enable bits
-CAUSE : specifies what kind of interrupt or exception just happened.
-EPC : Exception PC, Contains the address of the instruction when the
exception occurred.
-Vaddr: Bad Address Register, Contains the invalid memory address caused by
load, store, or fetch.
15
Instruction Formats
All instructions are 32-bit wide, Three instruction formats:
Register (R-Type)
Register-to-register instructions
Op: operation code specifies the format of the instruction
Op6 Rs5 Rt5 Rd5 sa5 funct6
Immediate (I-Type)
16-bit immediate constant is part in the instruction
Op6 Rs5 Rt5 immediate16
Jump (J-Type)
Used by jump instructions
16
Op6 immediate26
8
R-Type Instruction Format
Op6 Rs5 Rt5 Rd5 shamt5 funct6
18
9
Encoding MIPS Instructions
Field func when OPCOD = SPECIAL
19
20
10
Encoding MIPS Instructions
Examples:
21
The op and funct fields in combination (0 and 32 in this case) tell that this
instruction performs addition (add).
The rs and rt fields, registers $s1 (17) and $s2 (18), are the source operands,
and the rd field, register $t0 (8), is the destination operand.
11
From Assembly to Machine Code
Thus, the decimal representation of instruction add $t0, $s1, $s2 is:
• op = 000000 (special)
• rs = 17 ($s1)
• rt = 18 ($s2)
• rd = 8 ($t0)
• shamt = 0 (not used)
• funct = 100000(add)
23
Pseudo-Instructions
Most assembler instructions represent machine instructions one-to-one. The
assembler can also treat common variations of machine instructions as if they
were instructions in their own right. Such instructions are called pseudo-
instructions.
The hardware need not implement the pseudo-instructions, but their appearance
in assembly language simplifies programming. Register $at (assembler
temporary) is reserved for this purpose.
12
Addressing Modes
Addressing Modes
26
13
Addressing Modes
27
Addressing Modes
Immediate
op rs rt Immediate
addi $t0, $t1, 5
28
14
Addressing Modes
Register
op rs rt rd funct
Register
add $t0, $t1, $t2
29
Addressing Modes
op rs rt Address
± Register
15
Addressing Modes
PC-relative (e.g., conditional branches, need an offset)
op rs rt Address (offset)
Memory
adding a 16-bit address shifted left 2 bits to thePC
31
Addressing Modes
Pseudodirect
op Address
concat PC
j for
Memory
Concatenating a 26-bit address shifted left 2 bits with the 4 upper bits of the PC.
32
16
Addressing Modes
Examples:
33
Byte-‐Addressable Memory
🞍 Each data byte has a unique address
🞍 Load/store words or single bytes: load byte (lb) and store byte
(sb)
0000000C 4 0 F 3 0 7 8 8 Word 3
00000008 0 1 EE 2 8 4 2 Word 2
00000004 F 2 F 1 A C 0 7 Word 1
00000000 AB CD E F 7 8 Word 0
34
width = 4 bytes
17
Address Space
The MIPS address space is divided in four segments:
• Text, which contains the program code
• Data, which contains constants and global variables
• Heap, which contains memory dynamically allocated during runtime
• Stack, which contains temporary data for handling procedure calls
The heap and stack segments grow toward each other, thereby allowing the
efficient use of memory as the two segments expand and shrink.
35
Address Space
36
18
Example Program: Executable
Executable file header Text Size Data Size
0x34 (52 bytes) 0xC (12 bytes)
Reserved
0x7FFFFFFC
Stack $sp = 0x7FFFFFFC
0x10010000 Heap
$gp = 0x10008000
y
g
0x10000000 f
0x03E00008
0x00851020
0x03E00008
0x23BD0004
0x8FBF0000
0xAF828008
0x0C10000B
0xAF858004
0x20050003
0xAF848000
0x20040002
0xAFBF0000
0x00400000 0x23BDFFFC PC = 0x00400000
Reserved
38
19
Pipelining
39
Idea:
Divide the instruction processing cycle into distinct “stages” of processing
Ensure there are enough hardware resources to process one instruction in each stage
Process a different instruction in each stage
Instructions consecutive in program order are processed in consecutive stages
40
20
Example: Execution of Four Independent ADDs
Multi-cycle: 4 cycles per instruction
F D E W
F D E W
F D E W
F D E W
Time
Pipelined: 4 cycles per 4 instructions
1 instruction completed per cycle
F D E W
F D E W
F D E W
F D E W
Time
41
D 42
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
21
Pipelining Multiple Loads of Laundry
Time
6 PM 7 8 9 10 11 12 1 2 AM
Task
order
6 PM 7 8 9 10 11 12 1 2 AM
TimeA
Task
B
order
A
C
D
B
6 PM 7 8 9 10 11 12 1 2 AM
Time
Task
order 6 PM 7 8 9 10 11 12 1 2 AM
Time
A
- 4 loads of laundry in parallel
Task
order B
- no additional resources
A
C
- throughput increased by 4
B
D
- latency per load is the same
C
43
D
Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
44
22
Pipelining and ISA Design
MIPS ISA designed for pipelining
All instructions are 32-bits
Easier to fetch in one cycle
Load/store addressing
Can calculate address in 3rd stage, access memory in 4th stage
46
23
Pipelining Abstraction
1 2 3 4 5 6 7 8 9 10
Time (cycles)
$0
lw $s2
lw $s2, 40($0) IM RF 40 + DM RF
$t1
add $s3
add $s3, $t1, $t2 IM RF $t2 + DM RF
$s1
sub $s4
sub $s4, $s1, $s5 IM RF $s5 - DM RF
$t5
and $s5
and $s5, $t5, $t6 IM RF $t6 & DM RF
$s1
sw $s6
sw $s6, 20($s1) IM RF 20 + DM RF
$t3
or $s7
or $s7, $t3, $t4 IM RF $t4 | DM RF
47
t0 t1 t2 t3 t4 t5
Inst0 IF ID EX MEM WB
Inst1 IF ID EX MEM WB
Inst2 IF ID EX MEM WB
Inst3 IF ID EX MEM WB
Inst4 IF ID EX MEM
IF ID EX
IF ID
steady state
(full pipeline) IF
48
24
Illustrating Pipeline Operation: Resource View
t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
IF I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
ID I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
EX I0 I1 I2 I3 I4 I5 I6 I7 I8
MEM I0 I1 I2 I3 I4 I5 I6 I7
WB I0 I1 I2 I3 I4 I5 I6
49
25
Pipelining Hazards
51
Structural hazard
Caused by limitations in hardware that don’t allow
concurrent execution of different instructions
Examples
Bus
Single ALU
Single Memory for instructions and data
Single IR
Remedy is to add additional elements to datapath to
eliminate hazard
52
26
Data Hazards Types
An instruction depends on completion of data access by a
previous instruction
r3 r1 op r2 Read-after-Write
r5 r3 op r4 (RAW)
r3 r1 op r2 Write-after-Read
r1 r4 op r5 (WAR)
r3 r1 op r2 Write-after-Write
r5 r3 op r4 (WAW)
53
r3 r6 op r7
Control Hazard
Special case of data dependence: dependence on PC
beq:
branch is not determined until the fourth stage of the pipeline
Instructions after the branch are fetched before branch is
resolved
Always predict that the next sequential instruction is fetched
Called “Always not taken” prediction
These instructions must be flushed if the branch is taken
Branch misprediction penalty
number of instructions flushed when branch is taken
May be reduced by determining branch earlier
54
27
Control Hazard
1 2 3 4 5 6 7 8 9
Time (cycles)
$t1
lw DM
20 beq $t1, $t2, 40 IM RF $t2 - RF
$s0
and DM
24 and $t0, $s0, $s1 IM RF $s1 & RF
Flush
$s4 these
or DM instructions
28 or $t1, $s4, $s0 IM RF $s0 | RF
$s0
sub DM
2C sub $t2, $s0, $s5 IM RF $s5 - RF
30 ...
...
$s2
slt DM $t3
slt
64 slt $t3, $s2, $s3 IM RF $s3 RF
55
Resource contention
56
28
How Can You Handle Data Hazards?
■ Insert “NOP”s (No OPeration) in code at compile time
57
Data Forwarding/Bypassing
Problem: A consumer (dependent) instruction has to wait in decode
stage until the producer instruction writes its value in the register file
Goal: We do not want to stall the pipeline unnecessarily
Observation: The data value needed by the consumer instruction can
be supplied directly from a later stage in the pipeline (instead of only
from the register file)
Idea: Add additional dependence check logic and data forwarding
paths (buses) to supply the producer’s value to the consumer right
after the value is available
Benefit: Consumer can move in the pipeline until the point the value
can be supplied less stalling
58
29
Data Forwarding/Bypassing
Use result when it is computed
Don’t wait for it to be stored in a register
Requires extra connections in the datapath
No bubble!
4-3=1, no stall!
59
Data Forwarding/Bypassing
Can’t always avoid stalls by forwarding
If value not computed when needed
Can’t forward backward in time!
60
30
Code Scheduling to Avoid Stalls
Reorder code to avoid use of load result in the next
instruction
C code for A = B + E; C = B + F;
Stall on Branch
In MIPS pipeline
Need to compare registers and compute target earlier in the pipeline
Add extra hardware to do it in ID stage (earliest ? )
62
31