Computer Architecture Overview
Computer Architecture Overview
All contemporary computer designs are based on John von Neumann concepts.
It is referred to as the von Neumann architecture
It is based on three key concepts:
■■ Data and instructions are stored in a single read–write memory.
■■ The contents of this memory are addressable by location, without regard to
the type of data contained there.
■■ Execution occurs in a sequential fashion (unless explicitly modified) from
one instruction to the next.
Computer Components
Program → The process of connecting the various components in the desired configuration
CPU
software → a sequence of codes or instructions
Instruction Instruction
codes interpreter
Each code is, in effect, an instruction.
Control
signals
Instruction interpreter part of the hardware interprets
each instruction and generates control signals.
General-purpose
Data arithmetic
and logic
Results
At each step, some arithmetic or logical operation is
functions
performed on some data. For each step, a new set of
(b) Programming in software
control signals is needed.
Hardware and Software Approaches (Fig 3.1)
Computer Components
I/O Module
I/O address register (I/OAR)
n–2
n–1
PC = Program counter
Buffers IR = Instruction register
MAR
MBR
=
=
Memory address register
Memory buffer register
I/O buffer register (I/OBR)
I/O AR = Input/output address register exchange of data between
I/O BR = Input/output buffer register
I/O module and the CPU
Computer Components: Top-Level View (Fig. 3.2)
Figure 3.2 Computer Components: Top-Level View
Computer Function - Instruction Cycle
Fetch Cycle Execute Cycle
instruction processing consists of two steps → (1) fetch and (2) execute
The processing (fetch and execute) required for
a single instruction is called ‘instruction cycle’
Figure 3.3 Basic Instruction Cycle
Program execution halts only if
• the machine is turned off
• some sort of unrecoverable error occurs
• a program instruction that halts the computer
is encountered.
Computer Function - Instruction Cycle
CPU Main Memory
0 1. At the beginning of each instruction cycle the
System 1
2
Bus
PC MAR Instruction processor fetches an instruction from memory
Instruction
Instruction
IR MBR 2. The program counter (PC) holds the address
I/O AR
Execution
Data of the instruction to be fetched next
unit Data
I/O BR
Data
Data 3. The processor increments the PC after each
instruction in sequence
The actions:
■■ Processor-memory: Data may be transferred
from processor to memory or from memory to
processor.
■■ Processor-I/O: Data may be transferred to or from
a peripheral device by transferring between the
processor and an I/O module.
■■ Data processing: The processor may perform
some arithmetic or logic operation on data.
■■ Control: An instruction may specify that the
sequence of execution be altered.
An instruction’s execution may involve one of these actions or a combination of them.
The PC contains 300, the address of the first instruction 16 bits long
Memory CPU Registers Memory CPU Registers (c) Internal CPU registers
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC 0001 = Load AC from Memory hexadecimal digits
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC 0010 = Store AC to Memory
302 2 9 4 1 0001 = Load
from AC from Memory 1
2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR 0101 = Add to AC Memory
• • 0010 = Store AC to Memory 2
• •
940 0 0 0 3 940 0 0 0 3 0101 = Add to (d)
ACPartial
from Memory
list of opcodes 5
941 0 0 0 2 941 0 0 0 5
(d) Partial list of opcodes
Step 5 Step 6
Figure 3.4 Characteristics of a Hypothetical Machine
Figure 3.5: Example of Program Execution (contents of memory Note: The processor contains a single data
and registers in hexadecimal) register, called an accumulator (AC).
Figure 3.4 Characteristics of a Hypothetical Machine
Computer Function - Instruction Cycle
Instruction address calculation (iac):
Instruction Operand Operand Determine the address of the next
fetch fetch store instruction to be executed.
Instruction fetch (if): Read instruction
from its memory location into the
Multiple Multiple processor.
operands results Instruction operation decoding (iod):
Analyze instruction to determine type
Instruction Instruction Operand
Data
Operand of operation to be performed and
address operation address address
calculation decoding calculation
Operation
calculation
operand(s) to be used.
Operand address calculation (oac):
If the operation involves reference to an
Return for string
Instruction complete, or vector data operand in memory or available via I/O,
fetch next instruction then determine the address of the
operand.
Figure 3.6 Instruction Cycle State Diagram Operand fetch (of): Fetch the operand
from memory or read it in from I/O.
States in the upper part of Figure 3.6 involve an exchange Data operation (do): Perform the
betweenFigure
the processor and either
3.6 Instruction Cycle memory or an I/O module.
State Diagram operation indicated in the instruction.
States in the lower part of the diagram involve only internal Operand store (os): Write the result into
processor operations. memory or out to I/O.
Computer Function - Interrupts
Program Generated by some condition that occurs as a result of an instruction
The processing execution, such as arithmetic overflow, division by zero, attempt to
of the execute an illegal machine instruction, or reference outside a user's
allowed memory space.
processors can
Timer Generated by a timer within the processor. This allows the operating
be interrupted by system to perform certain functions on a regular basis.
several factors
I/O Generated by an I/O controller, to signal normal completion of an
(Table 3.1): operation, request service from the processor, or to signal a variety of
error conditions.
Hardware failure Generated by a failure such as power failure or memory parity error.
Since I/O operations are slow, they are designed to work together with the interrupt system.
For example:
• Data is sent to the printer.
• The processor goes back to doing other tasks.
• When the printer finishes its job, it sends an interrupt: "printing is done.
• Then the processor says, "okay, now I can send more data."
Computer Function - Interrupts
User Program Interrupt Handler Fetch Cycle Execute Cycle
1 START
Fetch Next Execute
HALT
Instruction Instruction
processor’s
2
current
activity is Basic Instruction Cycle (Fig 3.3)
saved here
i
Interrupt
occurs here i+1
Figure 3.3 Basic Instruction Cycle
Time t0 t0
1 1
t1 t1
4 4
t2 t2
I/O operation
I/O operation;
processor waits 2a concurrent with
t3 t3 processor executing
5 5
t4 t4
2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5
5 3b
FigureFigure
3.103.10
Program Timing: Short I/O Wait
Program Timing: Short I/O Wait
Figure 3.11 Program Timing: Long I/O Wait
Instruction Operand Operand
fetch fetch store
Multiple Multiple
operands results
Figure 3.6 Instruction Cycle State Diagram
Instruction Instruction Operand Operand
Data
address operation address address
Operation
calculation decoding calculation calculation
No
Instruction complete, Return for string interrupt
fetch next instruction or vector data
Computer Function – Interrupts – Multiple interrupts
User program
Interrupt
handler X A disabled interrupt strategy
simply means that the
processor can and will ignore
that interrupt request signal.
Interrupt
handler Y
User program
Interrupt
handler X
handl
Interrupt
handler Y
Interrupts are prioritized, so
Figure 3.13 Transfer of Control they are handled according
with Multiple Interrupts to the relative priority
This method is efficient.
(b) Nested interrupt processing
Computer Function – Interrupts – Multiple interrupts
Printer Communication
User program
interrupt service routine interrupt service routine Interrupt service routine (ISR) is a
t=0 software routine that hardware
invokes in response to an interrupt
15
0 t=
t =1
priority:
t = 25 Communication > Disk > Printer
t= t = 25 Disk
40 interrupt service routine
t=
35 Time
1. User program starts
2. User program is interrupted
3. Printer starts
4. Printer is interrupted
5. Communication completed
Figure 3.14 Example Time Sequence of Multiple Interrupts 6. Disk completed
7. Printer completed
8. User Program completed
Interconnetion Structures
Read Memory The collection of paths connecting
Write
N Words
the various modules is called the
Address 0 Data
interconnection structure.
Data N –1
Address M Ports
External
Data ➢ Memory to processor
Internal
Data Interrupt
Signals
➢ Processor to memory
➢ I/O to processor
External
Data
➢ Processor to I/O
➢ I/O to or from memory
Instructions Address
Control
Data CPU Signals
Data
Interconnection structures:
Interrupt
Signals
(1) Bus interconnection
Figure 3.15 Computer Modules (2) Point-to-point interconnection
Figure 3.15 Computer Modules
Interconnetion Structures – Bus interconnection
A bus is a communication pathway connecting two or more devices.
Typically, a bus consists of multiple communication pathways (lines).
Each line is capable of transmitting signals (binary digits) representing binary 1 and binary 0.
Illustration of a bus (of 16 lines) on a mother board (Fig 14.2 & 14.3 in Commer book)
Interconnetion Structures – Bus interconnection
A bus that connects major computer components (processor, memory, I/O) is called a system bus.
The most common computer interconnection structures are based on the use of one or more system buses.
The data lines provide a path for moving data among system modules.
The address lines are used to designate the source or destination of the data on the data bus.
The control lines are used to control the access to and the use of the data and address lines.
the number of lines determines how many bits can be transferred at a time
Width is a key factor in overall performance of the system.
address lines
The width of the address bus determines the maximum possible addressable memory capacity of the system
8-bit width → 2⁸ = 256 address (means 256 memory cell)
16-bit width → 2¹⁶ = 65,536 address
The first bit defines where to
The address bus is used for not only memory module but also I/O ports.
go, memory or I/O module
control lines
Control signals transmit both command and timing information among system modules.
Interconnetion Structures – Point-to-point interconnection
• Electrical constraints encountered with increasing the frequency of wide
synchronous buses.
Problems with bus • Difficult to perform the synchronization and arbitration functions in a timely
interconnection fashion at higher data rates.
• Difficulties of increasing bus data rate and reducing bus latency to keep up
with the processors when using multicore chips.
The point-to-point interconnect has lower latency, higher data rate, and better scalability.
I/O device
I/O Hub of core processors.
QPI is used to connect to an I/O module, called an I/O hub (IOH).
The link from the IOH to the I/O device controller uses an
DRAM
DRAM
Core Core
A B interconnect technology called PCI express (PCIe).
DRAM
Core Core
C D ■ Physical layer → Consists of the actual wires carrying the
signals. Phit (physical unit)
■ Link layer → Responsible for reliable transmission and
I/O device
I/O device
I/O Hub
flow control. Flit (flow control unit)
■ Routing layer → directing packets through the fabric
■ Protocol layer → The high-level set of rules for exchanging
QPI PCI Express Memory bus packets of data between devices.
Figure3.17
Figure 3.17 Multicore
Multicore Configuration
ConfigurationUsing Using
QPI QPI
Packets
Protocol Protocol Packets: 1 or more Flits
Routing Routing
Flits
Link Link Flits: 80 bits each
Each data path consists of a pair of
wires that transmits data one bit at a
Physical Phits Physical
Phits: 20 bits each time; the pair is referred to as a lane.
COMPONENT A
Intel QuickPath Interconnect Port
Intel QPI Layers (Fig. 3.18)
Fwd Clk
Rcv Clk
Figure 3.18 QPI Layers Transmission Lanes Reception Lanes
Fwd Clk
Rcv Clk
Reception Lanes Transmission Lanes
With PCIe;
• Data flow is performed according to the prioritizing system. The data with higher priority is
processed firstly. Important for many applications such as real-time data.
• To define the order in process the data is tagged to define properties as its priority, type
and sensitivity to delay.
For PCIe;
• Key requirement is high capacity to support the needs of higher data rate I/O devices,
such as Gigabit Ethernet.
• Another requirement deals with the need to support time dependent data streams
PCI Express (PCIe)
Core Core A root complex device, also referred to
as a chipset or a host bridge
Gigabit PCIe
Memory
The root complex acts as a buffering device, to deal
Ethernet
with difference in data rates between I/O controllers
Chipset
PCIe
and memory and processor components.
PCIe–PCI
Memory
Bridge
PCIe
Data packets generated and consumed by the DLL are called Data Link Layer Packets (DLLPs).
Data packets generated and consumed by the TL are called Transaction Layer Packets (TLPs).
NOTE:
Intel QPI is designed for high-speed, low-latency communication between processors.
PCIe, on the other hand, is developed for flexible, scalable, and universal communication with peripheral
(I/O) devices.
PCI Express (PCIe)
PCIe transactions are conveyed using transaction layer packets (TLPs).
When a tranfer to an I/O device is needed;
TLPs defines the high-level processes such as;
1. TLP is generated by TL
• Memory read/write
2. TLP is transferred to layer DL
• I/O read/write
3. The DL adds a Link CRC (LCRC) and
• Message
may later generate DLLPs (such as
• Configuration access
ACK/NACK) to ensure reliable delivery.
4. TLP and DLLP are sent independently
TLPs consist of following field;
over the Physical Layer.
■■ Header: The header describes the type of packet
5. The Physical Layer transmits them
and information needed by the receiver to process
serially to the receiver.
the packet, including any needed routing
information.
■■ Data: A data field of up to 4096 bytes may be A TLP originates in the TL of the sending
included in the TLP (i.e. the data itself for writing device and terminates at the TL of the
function). Some TLPs do not contain a data field. receiving device.
■■ ECRC: An optional end-to-end CRC field enables A DLLP originates in the DL of the
the destination TL layer to check for errors in the sending device and terminates at the
header and data portions of the TLP. DLof the receiving device.
Solve the problem
Fig.3.5 Fig.3.4