Computer Functional Units Explained
Computer Functional Units Explained
A computer is a fast electronic calculating machine that accepts digitized input information,
processes to a list of internally stored instructions and produces the resulting output
information.
HARDWARE
A computer consists of five functionally independent main parts:
1. Input Unit
2. Memory Unit
3. Arithmetic and Logic Unit
4. Output Unit
5. Control Unit
The Input Unit accepts coded information from human operators using devices such as
keyboards or from other computers over digital communication lines.
The information received is stored in the computer’s memory, either for later use or to be
processed immediately by the Arithmetic and Logic Unit.
The processing steps are specified by a program that is also stored in the Memory.
The results are sent back to the outside world through the Output Unit.
All of these actions are coordinated by the Control Unit.
An Interconnection Network provides the means for the functional units to exchange information
and coordinate their actions.
Input and output equipment is referred to as the Input-Output (I/O) Unit.
1. INPUT UNIT
Computers accept coded information through input units. The most common input device is
the keyboard.
The other kinds of input devices for human-computer interaction are available, including
the touchpad, mouse, joystick, and trackball. These are often used as graphic input devices
in conjunction with displays.
Microphones can be used to capture audio input which is then sampled and converted into
digital codes for storage and processing.
Cameras can be used to capture video input.
Digital communication facilities, such as the Internet, can also provide input to a computer
from other computers and database servers.
2. MEMORY UNIT
The function of the memory unit is to store programs and data.
There are three classes of storage, called Primary, Secondary and Cache.
Primary Memory
Primary memory, also called main memory, is a fast memory that operates at electronic
speeds. Programs must be stored in this memory while they are being executed.
The main memory consists of a large number of semiconductor storage cells, each capable
of storing one bit of information.
They are handled in groups of fixed size called [Link] number of bits in each word is
referred to as the word length of the computer, typically 16, 32, or 64 bits.
Instructions and data can be written into or read from the memory under the control
of the processor.
A memory in which any location can be accessed in a short and fixed amount of time
after specifying its address is called a random-access memory (RAM).
The time required to access one word is called the memory access time.
This time is independent of the location of the word being accessed. It typically ranges
from a few nanoseconds (ns) to about 100 ns for current RAM units.
Secondary Memory
Secondary memory is used when large amounts of data and many programs have to be
stored, particularly for information that is accessed infrequently.
Access times for secondary storage are longer than for primary memory.
Example: Magnetic disks, Optical disks (DVD and CD), and Flash memory devices.
Cache Memory
Along with the main memory, a smaller, faster RAM unit, called a cache, is used to hold
sections of a program that are currently being executed, along with any associated data.
The cache is tightly coupled with the processor and is usually contained on the same
integrated-circuit chip.
The purpose of the cache is to facilitate high instruction execution rates.
Memory Hierarchy
5. CONTROL UNIT
The control unit is used to co-ordinate and control all the activities among the
functional units.
The control unit is used to send control and timing signals to other units.
Control circuits are responsible for generating the timing signals that govern the
transfers and determine when a given action is to take place.
Data transfers between the processor and the memory are also managed by the control
unit through timing signals.
SOFTWARE
Computer software is divided in to two broad categories:
o System Software
o Application Software
INSTRUCTIONS
These instructions are referred to as machine instructions or computer
instructions.
The collection of different instructions is referred as the instruction set of the CPU.
Each instruction must contain the information required by the CPU for
execution.
ELEMENTS OF AN INSTRUCTION
o Operation Code: Specifies the operation to be performed (e.g., add, move)
The operation is specified by a binary code is known as the operation code or
opcode.
o Source / Destination operand reference: The operation may involve one or more
source operands but only one destination operand.
o Result operand reference: The operation may produce a result. The result is stored
in a destination operand.
o Next instruction reference: This tells the CPU where to fetch the next
instruction after the execution of this instruction is complete. [
INSTRUCTION REPRESENTATION
Opcodes
Operands
o Operands are also represented symbolically.
o Example :
ADD R, Y
An instruction format defines the layout of the bits of an instruction, in terms of its
constituents parts.
An instruction format must include an opcode and, implicitly or explicitly, zero or more
operands.
The instruction formats are generally classified in to:
Three-address instruction
Two-address instruction
One-address instruction
Zero-address instruction
1. THREE-ADDRESS INSTRUCTIONS
The three-address instruction contains the memory addresses of the three operands— A, B,
and C.
A general instruction of three-address type has the format:
Operation Destination, Source1, Source2
Example
The three-address instruction can be represented symbolically as
Add A, B, C
Operands B and C are called the source operands
A is called the destination operand
Add is the operation to be performed on the operands.
[Link]-ADDRESS INSTRUCTIONS
In a two-address instructions, each instruction having only one or two operands. A
general instruction of two-address type has the format:
Operation Destination/Source1, Source2
Example
An Add instruction of this type is Add A, B performs the operation A ← A + B.
This means that operand A is both a source and a destination.
Example
Consider the instruction : Add A.
It adds the contents of memory location A to the contents of the accumulator register and
place the sum back into the accumulator.
[Link]-ADDRESS INSTRUCTIONS
It is also possible to use instructions in which the locations of all operands are defined
implicitly. Such instructions are found in machines that store operands in a structure called a
pushdown stack. These instructions are called zero-address instructions.
Example
Stack operation – PUSH A , POP B and PEEK
EXAMPLE
Write a program to evaluate the arithmetic statement Y = (A+B)*(C+D) using three- address,
two-address, one-address and zero-address instructions.
Solution:
Using Three-address instructions: Using Two-address instructions:
MOV R1, A
ADD R1, A, B ADD R1, B
ADD R2, C, D MOV R2, C
MUL Y, R1, R2 ADD R2, D
MUL R1, R2
MOV Y, R1
Using One-address instructions: Using Zero-address instructions:
LOAD A PUSH A
ADD B PUSH B
STORE T ADD
LOAD C PUSH C
ADD D PUSH D
MUL T ADD
STORE Y MUL
POP Y
Based on the operations performed by the computer, the instructions are classified in to
following types:
3. CONTROL INSTRUCTIONS
o Branch operations (Conditional) are used to branch to a different set of
instructions depending on the decision made.
o Jump operations (Un Conditional) is used to branch to a different set of
instructions without any conditions.
OPERATIONS
Machine instructions operate on data. The most important general categories of data are :
Addresses
Numbers
Characters
Logical data
ADDRESSES
The addresses are in fact a form of data.
Some calculations must be performed on the operand reference in an instruction to
determine physical address.
Address can be considered as unsigned integer operands.
UNIT II
Data representation (L2) - Signed number representation (L1), fixed and floating-point representation
(L1), and character Representation (L2) -Addition and subtraction of signed numbers (L1), design of
adders (L1)-Multiplication of Positive numbers -Booth’s Algorithm (L3) -Floating Point Arithmetic
(L2), Division (L2).
CHARACTERS
A common form of data is text or character strings.
A number of codes have been devised by which characters are represented by a
sequence of bits.
The earliest common example of this is the Morse code.
Today, the most commonly used character codes are
American Standard Code for Information Interchange (ASCII) - 7 bit code Extended
Binary Coded Decimal Interchange Code (EBCDIC) – 8 bit code
LOGICAL DATA
Each word or other addressable unit is treated as a single unit of data.
It is sometimes useful to consider an n-bit unit as consisting of n 1-bit items of data, each item having
the value 0 or 1.
Computer arithmetic is commonly performed on two very different types of numbers:
(1) Integer
(2) Floating Point
Sign-Magnitude Form
The left most bit is the sign bit.
0 means positive
1 means negative.
Example:
+18 = 00010010
-18 = 10010010
2’s Complement Representation For Negative Numbers
To represent a negative number using the “two’s complement” technique:
1. First decide how many bits are used for representation
2. Then write the modulo of the negative number (in pure binary)
3. Then, change each 0 in 1, each 1 in 0 (Boolean Complement or “one’s
complement”)
4. Finally, add 1 (as the result of Step 3 was a pure binary number)
Examples:
To Represent -3 with 4 bits:
Start from +3 = 0011
Boolean complement gives 1100
Add 1 to LSB gives -3 1101
Addition and subtraction are the two most commonly used arithmetic operations, as the other
two, namely multiplication and division, are respectively the processes of repeatedaddition and
repeated subtraction.
The basic building blocks that form the basis of all hardware used to perform the arithmetic
operations on binary numbers are Half adder, Full adder, Half subtractor, Full subtractor,
Binary Adder(Parallel Adder),Look Ahead Carry Adder,Binary Subtractor (Parallel
Subtractor), Parallel Adder/Subtractor.
CARRY LOOK-AHEAD ADDERS (OR) FAST ADDERS
In Parallel adder, all the bits of the augend and the addend are available for computation at the
same time. The carry output of each full-adder stage is connected to the carry input of the next
high-order stage. Since each bit of the sum output dependson the value of the input carry, time
delay occurs in the addition process. This time delay is called as carry propagation delay.
For example, addition of two numbers (0011+ 0101) gives the result as 1000. Addition of the
LSB position produces a carry into the second position. This carry when added to the bits of the
second position, produces a carry into the third position. This carry when added to bits of the third
position, produces a carry into the last position. The sum bit generated in the last position (MSB) depends
on the carry that was generated by the addition in the previous position. i.e., the adder will not produce
correct result until LSB carry has propagated through the intermediate full-adders. This represents a time
delay that depends on the propagation delay produced in an each full-adder.
3. MULTIPLICATION
2. The partial products are easily defined. When the multiplier bit is 0, the partial
product is 0. When the multiplier is 1, the partial product is the multiplicand.
3. The total product is produced by summing the partial products. For this operation,
each successive partial product is shifted one position to the left relative to the
preceding partial product.
o The multiplier and multiplicand are loaded into two registers (Q and M).
o A third register, the A register, is also needed and is initially set to 0.
o A 1-bit C register, initialized to 0, which holds a potential carry bit resulting from addition.
o Control logic reads the bits of the multiplier one at a time.
If Q0 is 1, then the multiplicand is added to the A register and the result
is stored in the A register, with the C bit used for overflow.
Then all of the bits of the C, A, and Q registers are shifted to the right
one bit, so that the C bit goes into An-1, A0 goes into Qn-1 and Q0 is lost.
If Q0 is 0, then no addition is performed, just the shift.
This process is repeated for each bit of the original multiplier.
The resulting 2n-bit product is contained in the A and Q registers
EXAMPLE
4. After recoding the multiplier, the multiplicand and the multiplier can be
multiplied to generate the 2n products.
RECODING OF MULTIPLIERS
TWO’S COMPLEMENT MULTPLICATION
FLOWCHART - BOOTH’S ALGORITHM
HARDWARE IMPLEMENTATION
o It consists of n+1-bit binary adder, shift, add and subtract control logic and
registers A,B(or M) and Q.
o Divisor is loaded into B(or M) and dividend is loaded into Q
o Register A is initially set to zero. The division operation is then carried out.
o After completion of division, the n-bit quotient is in register Q and the remainder
is in register A.
TYPES OF DIVISION
The division of unsigned binary numbers can be performed by two ways. They are:
1. Restoring Division
2. Non-restoring Division
Working Steps
4.2 - NON-RESTORING DIVISION ALGORITHM
o If the sign bit of A is 0, it is called as the Non-Restoring Algorithm.
o The steps involved in restoring division are:
a) Shift left A and Q one binary position and subtract the divisor from A.
b) Otherwise, Shift left A and Q one binary position and add the divisor from A.
c) If the sign bit of A is 0, set Q0 = 1 and add the divisor back to A, otherwise set
Q0 = 0.
d) Repeat steps (a) and (b) n times.
e) If the sign of A is 1, add the divisor to A.
Example:
o 3.14159265…
o 2.71828…
o 0.000000001 or 1.0×10−9
o 3,155,760,000 or 3.15576×109
SCIENTIFIC NOTATION
Example:
Scientific Notation
A notation that renders numbers with a single digit to the left of the decimal point.
Normalized Notation
A number in floating-point notation that has no leading 0s.
Phase 4: Normalization.
The final phase normalizes the result.
Normalization consists of shifting significand digits left until the most
significant digit (bit, or 4 bits for base-16 exponent) is nonzero.
Each shift causes a decrement of the exponent and thus could cause an
exponent underflow.
Finally, the result must be rounded off and then reported
UNIT - III
Building a Data path – Control Implementation Scheme – Pipelining – Pipelined Datapath and
Control – Handling Data Hazards & Control Hazards – Exceptions.
PIPELINING AND HAZARDS
Datapath design begins in examining the major components required to execute each class of
MIPS instructions.
The major components required to execute each class of MIPS instruction are called as
datapath elements.
A datapath element is a unit used to operate on or hold data within a processor.
In the MIPS implementation, the datapath elements include
Instruction
Memory
Data
Memory
Register
File
ALU
Adders
Building a MIPS datapath consists of
1. DataPath for Fetching the instruction and incrementing the PC
2. DataPath for Executing arithmetic and logic instructions
3. Datapath for Executing a memory-reference instruction
4. DataPath for Executing a branch instruction
Show how to build a datapath for the operational portion of the memory reference
and arithmetic-logical instructions that uses a single register file and a single ALU
to handle both types of instructions, adding any necessary multiplexors.
We can combine the datapath components needed for the individual instruction classes, into a single
datapath and add the control to complete the implementation.
This simplest datapath will execute all instructions in one clock cycle. To share a datapath element
between two different instruction classes, we may need to allow multiple
connections to the input of an element, using a multiplexor and control signal to select among the
multiple inputs.
Step 1
To create a datapath with only a single register file and a single ALU, we must have two different
sources for the second ALU input, as well as two different sources for the data stored into the
register file. Thus, one multiplexor is placed at the ALU input and another at the data input to the
register file.
Step 2
Combine all the pieces to make a simple datapath for the MIPS architecture by adding the
Datapath
for Instruction fetch
Datapath
for Arithmetic-Logical instructions
Datapath
for Memory instructions
Datapath
for Branch instruction
(NOTE : - Write the basic concepts of each datapath here)
Step 3
In the datapath obtained by composing separate pieces,
The branch instruction uses the main ALU for comparison of the register operands, so
we must keep the adder for computing the branch target address.
An additional multiplexor is required to select either the sequentially following
instruction address (PC + 4) or the branch target address to be written into the PC.
Step 4
The control unit must be able to take inputs and generate a write signal for each state element, the
selector control for each multiplexor, and the ALU control.
3. OPERATIONS OF A DATAPATH
Depending on the instruction class, the ALU will need to perform one of the above five
functions.
1. For load word and store word instructions - the ALU needs to to compute the
memory address by add operation.
2. For the R-type instructions - the ALU needs to perform one of the five actions
- AND , OR, subtract, add, (or) set on less than.
3. For branch equal - the ALU must perform a subtraction.
5. PIPELINING
• Instruction Fetch - The CPU reads instructions from the address in the memory
whose value is present in the program counter.
• Instruction Decode - Instruction is decoded and the register file is accessed to get the
values from the registers used in the instruction.
• Execute - ALU operations are performed.
• Memory Access - Memory operands are read and written from/to the memory that is
present in the instruction.
• Write Back – Computed value is written back to the register
6 - STAGE PIPELINED EXECUTION
In a five-stage pipeline, upto five instructions will be in execution during any clock cycle.
The stages are:
1. IF : Instruction Fetch
2. ID : Instruction Decode and Register file read
3. EX : Execute (or) Address Calculation of Operands
4. MEM : Data Memory Access
5. WB : Write Back
The registers are named for the two stages separated by the stages.
The
pipeline register between the IF and ID stages is called IF/ID.
The
pipeline register between the ID and EX stages is called ID/EX.
The
pipeline register between the EX and MEM stages is called EX/MEM.
The
pipeline register between the MEM and WB stages is called MEM/WB.
1. Instruction Fetch
The instruction is being read from memory using the address in the PC and then placed in the
IF/ID pipeline register.
The IF/ID pipeline register is similar to the Instruction register.
The PC address is incremented by 4 and then written back into the PC to be ready for the next
clock cycle.
This incremented address is also saved in the IF/ID pipeline register in case it is needed later for
an instruction, such as beq.
4. Memory Access
The top portion of Figure shows the load instruction reading the data memory using the address
from the EX/MEM pipeline register and loading the data into the MEM/WB pipeline register.
5. Write Back
The bottom portion of Figure shows the final step: reading the data from the MEM/WB pipeline
register and writing it into the register file in the middle of the figure.
The first step is to label the control lines on the existing datapath.
To specify control for the pipeline, we need only set the control values during each pipeline
stage.
The control lines are divided into five groups according to the pipeline stage.
1. Instruction fetch - No Controls.
2. Instruction decode/register file read - No Controls.
3. Execution/address calculation – RegDst, ALUOp & ALUSrc.
4. Memory access – PCSrc (Branch), MemRead & MemWrite.
5. Write-back – MemtoReg & RegWrite
The control signals are then used in the appropriate pipeline stage as the instruction moves
down the pipeline.
In the control lines for the stages,
o The four of the nine control lines are used in the EX phase
o The remaining five control lines passed on to the EX/MEM pipeline register
extended to hold the control lines;
o Three are used during the MEM stage
o The last two are passed to MEM/WB for use in the WB stage.
7. PIPELINE HAZARDS
INTRODUCTION
1. Structural Hazard – The situation when two instructions require the use of a
given hardware resource at the same time.
2. Data Hazard – Any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the pipeline.
So some operation has to be delayed, and the pipeline stalls.
• A structural hazard occurs when two or more instructions that are already in pipeline
need the same resource.
• These hazards are because of conflicts due to insufficient resources.
• The result is that the instructions must be executed in series rather than parallel for a
portion of pipeline.
• Structural hazards are sometime referred to as resource hazards.
• Example:
A situation in which multiple instructions are ready to enter the execute instruction
phase and there is a single ALU (Arithmetic Logic Unit).
One solution to such resource hazard is to increase available resources, such as having
multiple ALU.
DATA HAZARD
A data hazard occurs when there is a conflict in the access of an operand location. There are
three types of data hazards. They are
Read After Write (RAW) or True Dependency:
• An instruction modifies a register or memory location and a succeeding instruction
reads the data in that memory or register location.
• A RAW hazard occurs if the read takes place before the write operation is complete.
• Example
I1: R2←R5 + R3
I2: R4←R2 + R3
Write After Read (WAR) or Anti Dependency:
• An instruction reads a register or memory location and a succeeding instruction writes
to the location.
• A WAR hazard occurs if the write operation completes before the read operation takes
place.
• Example
I1: R4←R1 + R5
I2: R5←R1 + R2
NOTE : - Write about Data Hazards from Page No: 20 and then continue this.
The destination register R2 for the Add instruction is a source register for the
Subtract instruction.
There is a data dependency between these two instructions, because register R2 carries
data from the first instruction to the second.
There are two techniques using which we can handle data hazards.
They are
(1) Using Operand Forwarding (2) Using Software
When the compiler identifies a data dependency between two successive instructions Ij and
Ij+1, it can insert three explicit NOP (No-operation) instructions between them.
The NOP’s introduce the necessary delay to enable instruction Ij+1 to read the new value
from the register file after it is written.
NOTE : - Write about Instruction Hazards from Page No :21 and then continue
this
A variety of approaches have been taken for dealing with Instruction/Control/Branch Hazards.(Conditional
branches)
1) Multiple Streams
2) Prefetch Branch Target
3) Loop Buffer
4) Branch Prediction
5) Delayed Branch
1) MULTIPLE STREAMS
o The approach is to replicate the initial portions of the pipeline and allow
the pipeline to fetch both instructions, making use of multiple streams.
o There are two problems with this approach:
1. Contention delays for access to the registers and to memory.
2. Additional branch instructions may enter the pipeline before the
original branch decision is resolved.
[2) PREFETCH BRANCH TARGET
o When a conditional branch is recognized, the target of the branch is prefetched,
in addition to the instruction following the branch.
o This target is then saved until the branch instruction is executed.
o If the branch is taken, the target has already been prefetched.
3) LOOP BUFFER
o A loop buffer is a small, very-high-speed memory maintained by the instruction
fetch stage of the pipeline and containing the ‘n’ most recently fetched
instructions, in sequence.
o If a branch is to be taken, the hardware first checks whether the branch target is within
the buffer. If so, the next instruction is fetched from the buffer.
4) BRANCH PREDICTION
o To reduce the branch penalty, the processor needs to anticipate that an instruction
being fetched is a branch instruction and predict its outcome to determine which
instruction should be fetched.
o It is generally of two types:
Static Branch Prediction
Dynamic Branch Prediction
o Static Branch Prediction - Assume that the branch will not be taken and to fetch the
next instruction in sequential address order.
o Dynamic Branch Prediction - Uses the recent branch history,to see if a branch was
taken the last time this instruction was executed.
One implementation of that approach is a branch prediction buffer or branch history
table.
A branch prediction buffer is a small memory indexed by the lower portion of the
address of the branch instruction. The memory contains a bit that says whether the branch
was recently taken or not.
A
branch predictor tells us whether or not a branch is taken,
Calculates
the branch target address.
Using
a cache to hold the branch target buffer.
Peripheral Devices A computer peripheral, technically speaking, is any device that connects to the computing unit
but is not part of the core architecture of the computing unit. The core computing unit consists of the central processing
unit (CPU), motherboard, and power supply. The case that surrounds these elements are also considered part of the core
computing unit. So anything that is connected to these elements is considered a peripheral.
o The input-output subsystem of a computer, referred to as I/O, provides an efficient
mode of communication between the central system and the outside environment.
o Programs and data must be entered into computer memory for processing and results
obtained from computations must be recorded or displayed for the user.
I/O INTERFACES
Input-Output interface provides a method for transferring information between
internal storage and external I/O devices.
The I/O bus from the processor is attached to all peripheral interfaces.
To communicate with a particular device, the processor places a device address on the
address lines.
The I/O bus consists of data lines, address lines, and control lines.
The I/O Interface consists of address decoder, control circuits ,data register and
status register to coordinate the I/O transfers.
The address decoder enables the device to recognize its address when this address
appears on the address lines.
The data register holds the data. A data command causes the interface to respond by
transferring data from the bus into one of its registers.
The status register contains information. A status command is used to test various
status conditions in the interface and the peripheral.
A control command is issued to activate the peripheral and to inform it what to do.
I/O INTERFACING TECHNIQUES
o I/O devices can be interfaced to a computer system I/O in two ways , which are called
interfacing techniques.
o They are
Memory mapped I/O
I/O mapped I/O (Isolated I/O)
o It involves a sequence of events to execute I/O operations and then store the results into
the memory.
CPU - IOP COMMUNICATION
MODES OF I/O DATA TRANSFER
Data transfer to and from I/O devices may be handled in one of three possible modes:
1. Programmed I/O
2. Interrupt-initiated I/O
3. Direct memory access (DMA)
Programmed I/O :
o When the processor is executing a program and encounters an
instruction relating to I/O, it executes that instruction by
issuing a command to the appropriate I/O module.
o The I/O module performs the requested action and takes no
action to alert the processor and it does not interrupt the
processor.
o The processor periodically checks the status of the I/O module
until it finds that the operation is complete.
o The processor is responsible for extracting data from main
memory for output and storing data in main memory for input.
o Thus, the instruction set includes I/O instructions in the
following categories:
Control : Used to activate an external device and tell it
what to do.
Status : Used to test various status conditions associated
with an I/O module and its peripherals.
Transfer : Used to read and/or write data between
processor registers and external devices.
Interrupt-Driven I/O :
o An alternative to Programmed I/O is for the processor to issue an
I/O command to a module and then go on to do some other useful
work.
o The I/O module will then interrupt the processor to request service
when it is ready to exchange data with the processor.
o The processor then executes the data transfer and then resumes its
former processing.
o The processor issues a READ command. The I/O module receives a
READ command from the processor and then proceeds to read data
in from the device.
o Once the data are in the I/O module’s data register the module
signals an interrupt to the processor over a control line.
o When the interrupt from the I/O module occurs, the processor saves
the context of the program it is currently executing and begins to
execute an interrupt-handling program that processes the interrupt.
o Interrupt-driven I/O is more efficient than programmed I/O because
it eliminates needless waiting.
Direct Memory Access :
o When large volumes of data are to be moved, a more efficient technique is required:
direct memory access (DMA).
o The DMA function can be performed by a separate
module on the system bus or it can be incorporated into an
I/O module.
o When the processor wishes to read or write a block of
data, it issues a command to the DMA module, by sending
to the DMA module the following information:
• Whether a read or write is requested
• The address of the I/O device involved
• The starting location in memory to read data from
or write data to
• The number of words to be read or written
o The processor then continues with other work. It has delegated this I/O operation to
the DMA module, and that module will take care of it.
o The DMA module transfers the entire block of data, one word at a time, directly to or
from memory without going through the processor. When the transfer is complete, the
DMA module sends an interrupt signal to the processor.
o Thus the processor is involved only at the beginning and end of the transfer.
COMPARISON BETWEEN PROGRAMMED I/O AND INTERRUPT DRIVEN I/O
7. INTERRUPTS
CLASSES OF INTERRUPTS
TYPES OF INTERRUPTS
There are two types of interrupts:
1. Hardware interrupts
2. Software interrupts
Hardware Interrupts :
o Used by devices to communicate that they require attention from the operating
system.
o For example, pressing a key on the keyboard (or) moving the mouse triggers
hardware interrupts that cause the processor to read the keystroke or mouse
position.
Software Interrupts :
o Caused either by an exceptional condition in the processor itself, or a
special instruction in the instruction set which causes an interrupt when it is
executed.
o Example : Divide-by-zero exception
STEPS IN INTERRUPT PROCESSING
1. The device issues an interrupt signal to the processor.
2. The processor finishes execution of the current instruction
before responding to the interrupt.
3. The processor tests for an interrupt, determines
that there is one, and sends an
acknowledgment signal to the device that
issued the interrupt. The acknowledgment
allows the device to remove its interrupt signal.
4. The processor needs to prepare to transfer
control to the interrupt routine.
5. The processor now loads the program counter
with the entry location of the interrupt-
handling program that will respond to this
interrupt.
6. Once the program counter has been loaded, the
processor proceeds to the next instruction
cycle, which begins with an instruction fetch.
The contents of the processor registers need to
be saved, because these registers may be used
by the interrupt handler. So all of these values,
plus any other state information, need to be saved.
7. The interrupt handler next processes the interrupt.
8. When interrupt processing is complete, the saved
register values are retrieved from the stack and restored to the registers.
9. The final act is to restore the PSW and program counter values from the stack.
10. As a result, the next instruction to be executed will be from the previously interrupted
program.
Bus Request :
o It is used by the DMA controller to request the CPU to relinquish(release) the control
of the buses.
Bus Grant :
o It is activated by the CPU to inform the external DMA controller that the buses are
in high impedance state and the requesting DMA can take control of the buses.
o Once the DMA has taken the control of the buses, it transfers the data.
STEPS IN DMA TRANSFER
o DMA transfer is controlled by the DMA controller.
o The DMA Controller requests the control of the buses from the CPU.
o After gaining control, the DMA controller performs read and write operations directly
between devices and memory.
o The DMA requires the CPU to provide two additional bus signals:
The Hold (HLD)Signal is an input to the CPU through which DMA
controllers asksfor ownership of the bus.
The Hold Acknowledge (HLDA) signal tells that the buses has beengranted.
o The CPU will finish all pending bus operations before granting control of the bus to
the DMA controller.
o Once the DMA controller gets the control of the buses, it can perform any transaction
(reads and writes) using the same bus.
o After the transaction is finished, the DMA controller returns the bus to the CPU.
Burst DMA Transfer : In this mode DMA handover the buses to CPU only after completion of
whole data transfer.
Block Transfer :Here, DMA transfers data only when CPU is executing the
instruction which does not require the use of buses.
(a) Byte (or) Cycle stealing DMA transfer Mode
9. BUS STRUCTURE
o There are many ways to connect different parts inside a computer together such as
processor, memory, I/O devices.
o The simplest and most common way of interconnecting various parts of the computer
is a bus.
o A group of lines that serves as a connecting path for several devices is called a bus.
o A bus must have additional lines for Address, Data and Control.
o A bus that connects major computer components/modules (CPU, memory, I/O) is
called a System Bus.
SYSTEM BUS
The system bus is a set of conductors that connects the CPU, memory and I/O modules.
The system bus is separated in to three functional groups:
Data Bus
Address Bus
Control Bus
Data Bus
The data bus consists of 8, 16, 32 or more parallel signal lines.
The data bus lines are bi - directional.
It means that CPU can read data on these lines from memory or from a port as well as
send data out on these lines to a memory location or a port.
The data bus is connected in parallel to all peripherals.
The communication between peripherals and CPU is activated by giving output enable
pulse to the peripheral.
Address Bus
It is a unidirectional bus.
The address bus consists of 16, 20, 24 or more parallel signal lines.
On these lines the CPU sends out the address of the memory location or I/O port that
is to be written or read from.
Control Bus
The control lines regulate the activity on the bus.
The CPU sends signals on the control bus to enable the outputs of addressed memory
devices or port devices.
o The single bus can be used for only one transfer at a time; only two units can actively
use the bus at any given time.
o Bus control lines are used to arbitrate multiple requests for use of one bus.
o The devices connected to a bus vary widely in their speed of operation
o Some devices are relatively slow, such as printer and keyboard
o Some devices are considerably fast, such as optical disks
o Memory and processor units operate are the fastest parts of a computer
Advantage:
Low cost,
It is very flexibility to attach many peripherals.
o The Universal Serial Bus (USB) is the most widely used interconnection standard.
o USB gives fast and flexible interface for connecting all kinds of peripherals.
o USB is released in 1996, and currently maintained by the USB Implementers
Forum (USB IF).
o A Universal Serial Bus (USB) is a common interface that enables communication
between devices and a host controller such as a personal computer (PC).
o A large variety of devices are available with a USB connector, including mouse,
memory keys, disk drives, printers, cameras, and many more.
o Because of its wide variety of uses, the USB has replaced a wide range of interfaces
like the parallel and serial port.
o A USB is intended to enhance plug-and-play and allow hot swapping.
Plug-and-Play enables the operating system (OS) to spontaneously configure
and discover a new peripheral device without having to restart the computer.
Hot Swapping allows removal and replacement of a new peripheral without
having to reboot.
USB FEATURES
1. Simple Connectivity
2. Simple Cables
3. One interface for many devices
4. Automatic Configuration
5. No user setting
6. Frees hardware resources for other devices
7. Hot pluggable(Plug-and- Play)
8. Data transfer rates
9. Co-existence with IEEE standard
10. Reliability
11. Low cost
12. Low power consumption
13. Flexibility
14. Operating system support
USB VERSIONS
There have been several major USB standards, USB4 being the newest.
Most USB devices and cables today adhere to USB 2.0, and a growing number to USB
3.0.
Version Also Called as Transmission rate
USB 4.0 -- 40 Gbps
USB 3.2 Superspeed+ USB 20 Gbps
USB 3.1 Superspeed+ USB 10 Gbps
USB 3.0 SuperSpeed USB 5 Gbps
USB 2.0 High-Speed USB 480 Mbps
USB 1.1 Full Speed USB 12 Mbps
USB CONNECTOR
There are two types of USB Connectors.
In both the types, there are four signals .
The 5.0 V and the Ground signals are used to power the device connected.
The data signals are biphase signals.
Data + represents 5.0V
Data – represents 0 V
UNIT - V
SPMD vs SIMD
1. In SPMD, multiple autonomous processors simultaneously execute the same
program at independent points, rather than SIMD imposes on different data.
With SPMD, tasks can be executed on general purpose CPUs.
2. SIMD requires vector processors to manipulate data streams. Note that the
two are not mutually exclusive.
6. VECTOR PROCESSORS
o Vector processors are the technology used in modern computers and central
processing units.
o A vector processor is a central processing unit that can work on an entire vector
(array) in one instruction.
o The instruction to the processor is in the form of one complete vector instead of
its element.
o Vector processors are used because they reduce the draw and interpret
bandwidth owing to the fact that fewer instructions must be fetched.
o A vector processor is also known as an array processor.
o They exploit data parallelism in large scientific and multimedia applications.
5. HARDWARE MULTITHREADING
o An approach, which allows for a high degree of instruction-level parallelism
without increasing the complexity or power consumption, is called multithreading
o The instruction stream is divided into several smaller streams, known as threads,
such that the threads can be executed in parallel.
o Hardware multithreading is a well-known technique to increase the utilization of
processor resources. The idea is to start executing a different thread when the
current thread is stalled.
o Hardware Multithreading allows multiple threads to share the functional units
of a single processor in an overlapping fashion.
o To permit this sharing, the processor must duplicate the independent state of each
thread.
o There are three main approaches to hardware multithreading.
o Fine-Grained Multithreading
o Coarse-Grained Multithreading
o Simultaneous Multithreading
FINE-GRAINED MULTITHREADING
o Fine-grained multithreading switches between threads on
each instruction, resulting in interleaved execution of
multiple threads.
o Also called as Interleaving.
o This interleaving is often done in a round robin fashion,
skipping any threads that are stalled at that time.
o To make fine-grained multithreading practical, the processor
must be able to switch threads on every clock cycle.
o If there is a sufficient number of threads, it is likely that at
least one is active (not stalled), and the CPU can be kept
running.
o With fine-grained multithreading in a pipelined Architecture,
if:
– the pipeline has k stages,
– there are at least k threads to be executed, and
– the CPU can execute a thread switch at each clock cycle
then
-there can never be more than a single instruction per thread in the pipeline at
any instant, so there cannot be hazards due to dependencies, and the pipeline
never stalls.
o Advantage :
Potential to avoid wasted machine time due to stalls.
It can hide throughput losses that arise from both short and long stalls.
o Disadvantage :
A thread that is ready to execute a long sequence of instructions may have
wait to execute every instruction.
It slows down execution of individual threads.
COARSE-GRAINED MULTITHREADING
o Coarse-grained multithreading was invented as an alternative
to fine-grained multithreading.
o Coarse-grained multithreading switches threads only on
stalls, waiting for a time-consuming operation to complete.
o Also called as Blocking.
o A switch is made to another thread. When this thread in turn
causes a stall, a third thread is scheduled and so on.
o Advantage :
Switching threads doesn’t need to be nearly
instantaneous.
o Disadvantages :
The processor can be idled on shorter stalls, and thread
switching will also cause delays.
Coarse-grained multithreading suffers, from a major
throughput losses, especially from shorter stalls.
o The figure given below illustrates the possible architecture that involve
multithreading and contrasts these with approaches that do not use multithreading.
o Each horizontal row represents the issue slot or slots for a single execution cycle;
that is, the width of each row corresponds to the maximum number of instructions
that can be issued in a single clock cycle.
o The vertical dimension represents the time sequence of clock cycles.
o An empty slot represents an unused execution slot in one pipeline.
6. MULTICORE PROCESSORS
o A multicore computer, also known as a chip multiprocessor, combines
two or more processors (called cores) on a single IC.
o Each core consists of all of the components of an independent processor,
such as registers, ALU, pipeline hardware, and control unit, plus L1
instruction and data caches.
o In addition to the multiple cores, contemporary multicore chips also include
L2 cache and L3 cache.
Hardware Performance Issues Software Performance Issues
Increase in Parallelism Multi-threaded applications
Increase in Complexity Multi-process applications
Increase in Power Consumption Multi-instance applications
MULTICORE ORGANIZATION
The main variables in a multicore organization are as follows:
o The number of core processors on the chip
o The number of levels of cache memory
o The amount of cache memory that is shared
Each core has its own L1 and L2 cache.
L1 and L2 are the fastest memories that a CPU can access.
L1 is always dedicated per core whereas L2 can be shared.
If the processor can find the instruction sets or data for its next operation in the
L1 and L2 cache, then it does not need to access the slower L3 cache.
The four general organizations for multicore systems are :
Dedicated L1 Cache
Dedicated L2 Cache
Shared L2 Cache
Shared L3 Cache
DEDICATED L1 CACHE
o In this organization, the only on-chip cache is L1 cache,
with each core having its own dedicated L1 cache.
o The L1 cache is divided into instruction and data
caches.
o L1 (Level 1) cache is the fastest memory that is present
in a computer system.
o L1 cache has the data the CPU is most likely to need
while completing a certain task.
o Its size typically varies between 256KB to 1MB.
o Example : ARM11 MPCore
DEDICATED L2 CACHE
o In this organization,there is no on-chip cache sharing..
o L2 cache is slower than L1 cache, but bigger in size.
o Its size typically varies between 256KB to 8MB.
o A potential advantage to having only dedicated L2
caches on the chip is that each core enjoys more rapid
access to its private L2 cache.
o L2 cache holds data that is likely to be accessed by the
CPU next.
o Example : AMD Opteron.
SHARED L2 CACHE
o A similar allocation of chip space to memory, but with
the use of a shared L2 cache.
o As the amount of cache memory available on the chip
continues to grow, performance considerations dictate
splitting off a separate, shared L3 cache, with dedicated
L1 and L2 caches for each core processor.
o The use of a shared L2 cache confines the cache
coherency problem to the L1 cache level, which may
provide some additional performance advantage.
o Example : Intel Core Duo.
SHARED L3 CACHE
o L3 (Level 3) cache is the largest cache memory unit,
and also the slowest one.
o Its size typically varies between 4MB to 50MB.
o As both the amount of memory available and the
number of cores grow, the use of a shared L3 cache
combined with either a shared L2 cache or dedicated
per core L2 caches seems likely to provide better
performance than simply a massive shared L2 cache.
o Example : AMD K10.