0% found this document useful (0 votes)
13 views80 pages

Computer Functional Units Explained

The document outlines the functional units of a computer, detailing the roles of hardware components such as the Input Unit, Memory Unit, Arithmetic and Logic Unit, Output Unit, and Control Unit. It also explains the instruction set architecture of a CPU, including instruction execution cycles and addressing modes. Additionally, it covers data representation, including signed number representation and character encoding, as well as basic arithmetic operations.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views80 pages

Computer Functional Units Explained

The document outlines the functional units of a computer, detailing the roles of hardware components such as the Input Unit, Memory Unit, Arithmetic and Logic Unit, Output Unit, and Control Unit. It also explains the instruction set architecture of a CPU, including instruction execution cycles and addressing modes. Additionally, it covers data representation, including signed number representation and character encoding, as well as basic arithmetic operations.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT - I

FUNCTIONAL UNITS OF COMPUTER


Functional blocks of a computer(L2) – CPU (L1), memory(L1), Input-output subsystems(L1), Control unit (L1) -
Instruction set architecture of a CPU(L2) – Registers(L1) - Instruction execution cycle(L2) - RTL interpretation
of instructions(L2) - Addressing modes(L2) - Instruction set(L1)

1. FUNCTIONAL BLOCKS OF A COMPUTER

A computer is a fast electronic calculating machine that accepts digitized input information,
processes to a list of internally stored instructions and produces the resulting output
information.

The functional units (or) components of computer system are :


Hardware and Software.

HARDWARE
A computer consists of five functionally independent main parts:

1. Input Unit
2. Memory Unit
3. Arithmetic and Logic Unit
4. Output Unit
5. Control Unit

 The Input Unit accepts coded information from human operators using devices such as
keyboards or from other computers over digital communication lines.
 The information received is stored in the computer’s memory, either for later use or to be
processed immediately by the Arithmetic and Logic Unit.
 The processing steps are specified by a program that is also stored in the Memory.
 The results are sent back to the outside world through the Output Unit.
 All of these actions are coordinated by the Control Unit.
 An Interconnection Network provides the means for the functional units to exchange information
and coordinate their actions.
 Input and output equipment is referred to as the Input-Output (I/O) Unit.
1. INPUT UNIT
 Computers accept coded information through input units. The most common input device is
the keyboard.
 The other kinds of input devices for human-computer interaction are available, including
the touchpad, mouse, joystick, and trackball. These are often used as graphic input devices
in conjunction with displays.
 Microphones can be used to capture audio input which is then sampled and converted into
digital codes for storage and processing.
 Cameras can be used to capture video input.
 Digital communication facilities, such as the Internet, can also provide input to a computer
from other computers and database servers.

2. MEMORY UNIT
 The function of the memory unit is to store programs and data.
 There are three classes of storage, called Primary, Secondary and Cache.

Primary Memory
 Primary memory, also called main memory, is a fast memory that operates at electronic
speeds. Programs must be stored in this memory while they are being executed.
 The main memory consists of a large number of semiconductor storage cells, each capable
of storing one bit of information.
 They are handled in groups of fixed size called [Link] number of bits in each word is
referred to as the word length of the computer, typically 16, 32, or 64 bits.
 Instructions and data can be written into or read from the memory under the control
of the processor.
 A memory in which any location can be accessed in a short and fixed amount of time
after specifying its address is called a random-access memory (RAM).
 The time required to access one word is called the memory access time.
 This time is independent of the location of the word being accessed. It typically ranges
from a few nanoseconds (ns) to about 100 ns for current RAM units.

Secondary Memory
 Secondary memory is used when large amounts of data and many programs have to be
stored, particularly for information that is accessed infrequently.
 Access times for secondary storage are longer than for primary memory.
Example: Magnetic disks, Optical disks (DVD and CD), and Flash memory devices.

Cache Memory
 Along with the main memory, a smaller, faster RAM unit, called a cache, is used to hold
sections of a program that are currently being executed, along with any associated data.
 The cache is tightly coupled with the processor and is usually contained on the same
integrated-circuit chip.
 The purpose of the cache is to facilitate high instruction execution rates.

Memory Hierarchy

3. ARITHMETIC AND LOGIC UNIT


 The computer operations are executed in the arithmetic and logic unit (ALU) of the
processor.
 Any arithmetic or logic operation, such as addition, subtraction, multiplication,
division, or comparison of numbers is performed by the ALU.
 The operands are brought from the main memory into the processor registers.
 After performing operation , the result is either stored in the register or memory
location.
4. OUTPUT UNIT
 The output unit function is to send processed results to the user or the outside world.
Examples: Printers ,Plotters and Graphic Displays

5. CONTROL UNIT
 The control unit is used to co-ordinate and control all the activities among the
functional units.
 The control unit is used to send control and timing signals to other units.
 Control circuits are responsible for generating the timing signals that govern the
transfers and determine when a given action is to take place.
 Data transfers between the processor and the memory are also managed by the control
unit through timing signals.

SOFTWARE
Computer software is divided in to two broad categories:
o System Software

o Application Software

3. BASIC OPERATIONAL CONCEPTS

 To perform a given task, an appropriate program is stored in the memory.


 Individual instructions are brought from the memory into the processor, which
executes the specified operations.
 Data to be stored are also stored in the memory.

Fig : Connections between the Processor and the Memory


Instruction Register (IR)
o IR holds the instructions that is currently being executed.
o Its output is used by the control unit.
Program Counter (PC)
o This is another specialized register that keeps track of execution of a program.
o It contains the memory address of the next instruction to be fetched and executed.
Besides IR and PC, there are n-general purpose registers R0 through Rn-1.
Memory Address Register (MAR) and Memory Data (Buffer) Register(MDR)
o These registers are used to handle the data transfer between the main memory and the
processor.
o The MAR holds the address of the memory location to be accessed.
The M
Operating Steps
 use interrupt may alter the state of the processor.
 When ISR is completed, the state of the processor is restored and

 The operation of a CPU is determined by the instruction it executes

INSTRUCTIONS
 These instructions are referred to as machine instructions or computer
instructions.
 The collection of different instructions is referred as the instruction set of the CPU.
 Each instruction must contain the information required by the CPU for
execution.

ELEMENTS OF AN INSTRUCTION
o Operation Code: Specifies the operation to be performed (e.g., add, move)
The operation is specified by a binary code is known as the operation code or
opcode.
o Source / Destination operand reference: The operation may involve one or more
source operands but only one destination operand.
o Result operand reference: The operation may produce a result. The result is stored
in a destination operand.
o Next instruction reference: This tells the CPU where to fetch the next
instruction after the execution of this instruction is complete. [

o DR contains the data to be written into or read out of the memory


LOCATION FOR SOURCE AND DESTINATION OPERANDS
Source and result operands can be in one of the following areas:
▪ Main Memory
▪ Virtual Memory
▪ CPU Register
▪ I/O device.

NOTATIONS USED IN INSTRUCTION

The two basic types of notations used in instruction. They are:


1. Register Transfer Notation
2. Assembly Language Notation
Register Transfer Notation
The transfer of information from one location in the computer to another location
such as transfer between memory locations, processor registers, or registers in the I/O
subsystem involves Register Transfer Notation.
Example : R3 ← R1 + R2 Assembly
Language Notation
This type of notation is used to represent machine instructions and programs uses an
assembly language format. Example :
ADD R1, R2

INSTRUCTION REPRESENTATION

o Each instruction is represented by a sequence of bits.


o The instruction is divided into fields, corresponding to the constituent elements of the
instruction.

Opcodes

o Opcodes are represented by abbreviations, called mnemonics that indicate the


operation.
o Common examples include :
▪ ADD Add
▪ SUB Subtract
▪ MUL Multiply
▪ DIV Divide
▪ LOAD Load data from memory
▪ STORE Store data to memory

Operands
o Operands are also represented symbolically.
o Example :
ADD R, Y

This above instruction means add the value contained in Y to R.

INSTRUCTION FORMAT (Based on Number of Addresses)

An instruction format defines the layout of the bits of an instruction, in terms of its
constituents parts.
An instruction format must include an opcode and, implicitly or explicitly, zero or more
operands.
The instruction formats are generally classified in to:
 Three-address instruction
 Two-address instruction
 One-address instruction
 Zero-address instruction

1. THREE-ADDRESS INSTRUCTIONS
The three-address instruction contains the memory addresses of the three operands— A, B,
and C.
A general instruction of three-address type has the format:
Operation Destination, Source1, Source2

Example
The three-address instruction can be represented symbolically as
Add A, B, C
 Operands B and C are called the source operands
 A is called the destination operand
 Add is the operation to be performed on the operands.

[Link]-ADDRESS INSTRUCTIONS
In a two-address instructions, each instruction having only one or two operands. A
general instruction of two-address type has the format:
Operation Destination/Source1, Source2
Example
An Add instruction of this type is Add A, B performs the operation A ← A + B.
This means that operand A is both a source and a destination.

[Link]- ADDRESS INSTRUCTIONS


A machine instruction that specify only one memory operand is called one-address
instruction.
When a second operand is needed, it is understood implicitly to be in a unique location.
A processor register, usually called the accumulator, may be used for this purpose. The
access to data in these registers is much faster than to data stored in memory locations
because registers are inside the processor.
Operation Source1

Example
Consider the instruction : Add A.
It adds the contents of memory location A to the contents of the accumulator register and
place the sum back into the accumulator.

[Link]-ADDRESS INSTRUCTIONS
It is also possible to use instructions in which the locations of all operands are defined
implicitly. Such instructions are found in machines that store operands in a structure called a
pushdown stack. These instructions are called zero-address instructions.

Example
Stack operation – PUSH A , POP B and PEEK

EXAMPLE
Write a program to evaluate the arithmetic statement Y = (A+B)*(C+D) using three- address,
two-address, one-address and zero-address instructions.

Solution:
Using Three-address instructions: Using Two-address instructions:
MOV R1, A
ADD R1, A, B ADD R1, B
ADD R2, C, D MOV R2, C
MUL Y, R1, R2 ADD R2, D
MUL R1, R2
MOV Y, R1
Using One-address instructions: Using Zero-address instructions:

LOAD A PUSH A
ADD B PUSH B
STORE T ADD
LOAD C PUSH C
ADD D PUSH D
MUL T ADD
STORE Y MUL
POP Y

INSTRUCTION TYPES (Based on Operations)

Based on the operations performed by the computer, the instructions are classified in to
following types:

1. Data Processing Instructions - Arithmetic and Logical Operations


2. Data Transfer Operations - Load and Store Operations
3. Control Instructions - Conditional Branch & Unconditional Jump Operations

1. ARITHMETIC & LOGICAL OPERATIONS


o Arithmetic instructions provide computational capabilities for processing numeric
data.
o Logic (Boolean) instructions operate on the bits of a word as bits rather than as
numbers.
o These operations are performed primarily on data in processor registers. Therefore,
there must be memory instructions for moving data between memory and the
registers.

2. DATA TRANSFER OPERATIONS


o These instructions are needed to transfer programs and data into memory and the
results of computations back out to the user.

3. CONTROL INSTRUCTIONS
o Branch operations (Conditional) are used to branch to a different set of
instructions depending on the decision made.
o Jump operations (Un Conditional) is used to branch to a different set of
instructions without any conditions.
OPERATIONS

Machine instructions operate on data. The most important general categories of data are :
 Addresses
 Numbers
 Characters
 Logical data

ADDRESSES
 The addresses are in fact a form of data.
 Some calculations must be performed on the operand reference in an instruction to
determine physical address.
 Address can be considered as unsigned integer operands.
UNIT II

DATA REPRESENTATION AND COMPUTER ARITHMETIC

Data representation (L2) - Signed number representation (L1), fixed and floating-point representation
(L1), and character Representation (L2) -Addition and subtraction of signed numbers (L1), design of
adders (L1)-Multiplication of Positive numbers -Booth’s Algorithm (L3) -Floating Point Arithmetic
(L2), Division (L2).

CHARACTERS
 A common form of data is text or character strings.
 A number of codes have been devised by which characters are represented by a
sequence of bits.
 The earliest common example of this is the Morse code.
 Today, the most commonly used character codes are
American Standard Code for Information Interchange (ASCII) - 7 bit code Extended
Binary Coded Decimal Interchange Code (EBCDIC) – 8 bit code
LOGICAL DATA
 Each word or other addressable unit is treated as a single unit of data.
It is sometimes useful to consider an n-bit unit as consisting of n 1-bit items of data, each item having
the value 0 or 1.
Computer arithmetic is commonly performed on two very different types of numbers:
(1) Integer
(2) Floating Point

INTEGER REPRESENTATION (FIXED POINT REPRESENTATION)


 Integer representation is also called fixed point representation because the radix
point (binary point) is fixed and assumed to be to the right of the rightmost digit.
In integer representation only 0 & 1 are to represent everything.
 Positive numbers are stored in binary as follows:
o Example : 41 = 00101001
 There is no minus sign.
 The negative integer representation can be done by :
o Sign-Magnitude form
o Two’s complement form

Sign-Magnitude Form
 The left most bit is the sign bit.
 0 means positive
 1 means negative.
Example:
+18 = 00010010
-18 = 10010010
2’s Complement Representation For Negative Numbers
 To represent a negative number using the “two’s complement” technique:
1. First decide how many bits are used for representation
2. Then write the modulo of the negative number (in pure binary)
3. Then, change each 0 in 1, each 1 in 0 (Boolean Complement or “one’s
complement”)
4. Finally, add 1 (as the result of Step 3 was a pure binary number)

Examples:
To Represent -3 with 4 bits:
 Start from +3 = 0011
 Boolean complement gives 1100
 Add 1 to LSB gives -3 1101

To Represent -20 with 8 bits:


 Start from +20 = 00010100
 Boolean complement gives 11101011
 Add 1 11101100
 Negation works in the same way, e.g. negation of -3 is obtained by the “two’s
complement” of -3:
 Representation of -3 = 110
1
 Boolean complement gives 001
0
 Add 1 to LSB gives -(-3)=+3 001
1

2. ADDITION AND SUBTRACTION

Addition and subtraction are the two most commonly used arithmetic operations, as the other
two, namely multiplication and division, are respectively the processes of repeatedaddition and
repeated subtraction.
The basic building blocks that form the basis of all hardware used to perform the arithmetic
operations on binary numbers are Half adder, Full adder, Half subtractor, Full subtractor,
Binary Adder(Parallel Adder),Look Ahead Carry Adder,Binary Subtractor (Parallel
Subtractor), Parallel Adder/Subtractor.
CARRY LOOK-AHEAD ADDERS (OR) FAST ADDERS
In Parallel adder, all the bits of the augend and the addend are available for computation at the
same time. The carry output of each full-adder stage is connected to the carry input of the next
high-order stage. Since each bit of the sum output dependson the value of the input carry, time
delay occurs in the addition process. This time delay is called as carry propagation delay.
For example, addition of two numbers (0011+ 0101) gives the result as 1000. Addition of the
LSB position produces a carry into the second position. This carry when added to the bits of the
second position, produces a carry into the third position. This carry when added to bits of the third
position, produces a carry into the last position. The sum bit generated in the last position (MSB) depends
on the carry that was generated by the addition in the previous position. i.e., the adder will not produce
correct result until LSB carry has propagated through the intermediate full-adders. This represents a time
delay that depends on the propagation delay produced in an each full-adder.

3. MULTIPLICATION

3.1 - MULTIPLICATION OF UNSIGNED INTEGERS


(SEQUENTIAL MULTIPLICATION)
1. Multiplication involves the generation of partial products, one for each digit in the
multiplier. These partial products are then summed to produce the final product.

2. The partial products are easily defined. When the multiplier bit is 0, the partial
product is 0. When the multiplier is 1, the partial product is the multiplicand.

3. The total product is produced by summing the partial products. For this operation,
each successive partial product is shifted one position to the left relative to the
preceding partial product.

4. The multiplication of two n-bit binary integers results in a product of up to 2n bits


in length (e.g., 11 * 13 = 10001111).
HARDWARE IMPLEMENTATION

o The multiplier and multiplicand are loaded into two registers (Q and M).
o A third register, the A register, is also needed and is initially set to 0.
o A 1-bit C register, initialized to 0, which holds a potential carry bit resulting from addition.
o Control logic reads the bits of the multiplier one at a time.
 If Q0 is 1, then the multiplicand is added to the A register and the result
is stored in the A register, with the C bit used for overflow.

Then all of the bits of the C, A, and Q registers are shifted to the right
one bit, so that the C bit goes into An-1, A0 goes into Qn-1 and Q0 is lost.
 If Q0 is 0, then no addition is performed, just the shift.

This process is repeated for each bit of the original multiplier.

The resulting 2n-bit product is contained in the A and Q registers
EXAMPLE

3.2 - MULTIPLICATION OF SIGNED INTEGERS


(TWO’S COMPLEMENT MULTIPLICATION & BOOTH ALGORITHM)
Steps:
1. If the operands are decimal, convert it into binary.
2. If the operands are negative, take the 2’s complement of it.
3. Recode the multiplier by comparing Q0 and Q-1.
o If (Q0, Q-1) are 00 or 11, then the corresponding bit will be set to 0.
o If (Q0, Q-1) are 10, then the corresponding bit will be set to ‘-1’
o If (Q0, Q-1) are 01, then the corresponding bit will be set to ‘+1’

4. After recoding the multiplier, the multiplicand and the multiplier can be
multiplied to generate the 2n products.

RECODING OF MULTIPLIERS
TWO’S COMPLEMENT MULTPLICATION
FLOWCHART - BOOTH’S ALGORITHM

EXAMPLE - BOOTH’S ALGORITHM


4. DIVISION
 The division is more complex than multiplication.
 The operands are denoted as dividend and divisor.
 The results of the operands are denoted as Quotient and Remainder.

HARDWARE IMPLEMENTATION
o It consists of n+1-bit binary adder, shift, add and subtract control logic and
registers A,B(or M) and Q.
o Divisor is loaded into B(or M) and dividend is loaded into Q
o Register A is initially set to zero. The division operation is then carried out.
o After completion of division, the n-bit quotient is in register Q and the remainder
is in register A.

TYPES OF DIVISION
The division of unsigned binary numbers can be performed by two ways. They are:
1. Restoring Division
2. Non-restoring Division

4.1 - RESTORING DIVISION ALGORITHM


Example: Restoring Division Algorithm

Working Steps
4.2 - NON-RESTORING DIVISION ALGORITHM
o If the sign bit of A is 0, it is called as the Non-Restoring Algorithm.
o The steps involved in restoring division are:
a) Shift left A and Q one binary position and subtract the divisor from A.
b) Otherwise, Shift left A and Q one binary position and add the divisor from A.
c) If the sign bit of A is 0, set Q0 = 1 and add the divisor back to A, otherwise set
Q0 = 0.
d) Repeat steps (a) and (b) n times.
e) If the sign of A is 1, add the divisor to A.

Example: Non-Restoring Division Algorithm


Working Steps

COMPARISON BETWEEN RESTORING AND NON-RESTORING


DIVISION ALGORITHM
Example :

Division using Restoring division algorithm Division using Non-restoring algorithm


5. FLOATING POINT OPERATIONS
Programming languages support numbers with fractions, which are called reals in
mathematics.

Example:
o 3.14159265…
o 2.71828…
o 0.000000001 or 1.0×10−9
o 3,155,760,000 or 3.15576×109
SCIENTIFIC NOTATION
Example:

3,155,760,000 or 3.15576 ×109

 The number didn’t represent a small fraction so that it cannot be


represented with a 32-bit signed integer.
 The alternative notation for the last two numbers is called scientific
notation, which has a single digit to the left of the decimal point.
 A number in scientific notation that has no leading 0s is called a
normalized number.

Examples for normalized scientific notation:


o 1.0×10−9 (normalized scientific notation)
o 0.1×10−8 (not a normalized scientific notation)
o 10.0×10−10 (not a normalized scientific notation)

Scientific Notation
A notation that renders numbers with a single digit to the left of the decimal point.

Normalized Notation
A number in floating-point notation that has no leading 0s.

FLOATING POINT REPRESENTATION


This number can be stored in a binary word with three fields:
o Sign : Plus (or) Minus ( 1 means Nagative, 0 means Positive)
o S : Significand (or) Mantissa
o E : Ex
6. FLOATING POINT ADDITION AND SUBTRACTION
o In floating-point arithmetic, addition and subtraction are more
complex than multiplication and division.
o There are four basic phases of the algorithm for addition and subtraction:
1. Changing sign of B for Subtraction and Check for zeros.
2. Align the mantissa
3. Perform Addition
4. Normalize the result.

Phase 1: Changing sign of B for Subtraction and Check for zeros.


 The process begins by changing the sign of the subtrahend if it is a
subtract operation. Next, if either operand is 0, the other is reported
as the result.
Phase 2: Align the mantissa.
 The next phase is to manipulate the numbers so that the two exponents are equal.
 Alignment may be achieved by shifting the smaller number to the right
until the exponents are equal.

Phase 3: Perform Addition.


 Next, the two mantissas are added together along with the signs.
Because the signs may differ, the result may be 0.
 There is also the possibility of significand overflow by 1 digit. If
so, the significand of the result is shifted right and the exponent is
incremented.
 An exponent overflow could occur as a result; this would be reported
and the operation halted.

Phase 4: Normalization.
 The final phase normalizes the result.
 Normalization consists of shifting significand digits left until the most
significant digit (bit, or 4 bits for base-16 exponent) is nonzero.
 Each shift causes a decrement of the exponent and thus could cause an
exponent underflow.
 Finally, the result must be rounded off and then reported

UNIT - III

Building a Data path – Control Implementation Scheme – Pipelining – Pipelined Datapath and
Control – Handling Data Hazards & Control Hazards – Exceptions.
PIPELINING AND HAZARDS

 Datapath design begins in examining the major components required to execute each class of
MIPS instructions.
 The major components required to execute each class of MIPS instruction are called as
datapath elements.
 A datapath element is a unit used to operate on or hold data within a processor.
 In the MIPS implementation, the datapath elements include
Instruction
 Memory
Data
 Memory
Register
 File
ALU

Adders

 Building a MIPS datapath consists of
1. DataPath for Fetching the instruction and incrementing the PC
2. DataPath for Executing arithmetic and logic instructions
3. Datapath for Executing a memory-reference instruction
4. DataPath for Executing a branch instruction

1. DATAPATH FOR FETCHING THE INSTRUCTION AND INCREMENTING THE PC


 A memory unit to store the instructions of a program and supply instructions given an address.
 The program counter is used to hold the address of the current instruction.
 An adder to increment the PC to the address of the next instruction.
 To execute any instruction, fetch the instruction from memory.
 To fetch the next instruction, increment the program counter so that it points at the next
instruction, 4 bytes later.

Combined all three elements into single stage


2. DATAPATH FOR EXECUTING ARITHMETIC AND LOGIC INSTRUCTIONS (R-Type)
 The processor’s 32 general-purpose registers are stored in a structure called a register file.
 A register file is a collection of registers in which any register can be read or written by
specifying the number of the register in the file.
 An ALU is used to operate on the values read from the registers.
 It reads two registers, performs an ALU operation on the contents of the registers, and write the
result to a register.
 These instructions are either called R-type instructions or arithmetic logical
instructions.
 This instruction class includes add, sub, AND, OR, and slt.
 R-format Instruction Operations :
1. Read the two register operands
2. Perform the arithmetic/logical operation
3. Write the register result

Combined two elements into single stage


2. DATAPATH FOR EXECUTING A MEMORY-REFERENCE INSTRUCTION
 The MIPS load word and store word instructions have the general form
(i) lw $t1,offset($t2)
(ii) sw $t1,offset ($t2).
 These instructions compute a memory address by adding the base register, which is $t2, to the
16-bit signed offset field contained in the instruction.
 If the instruction is a load, the value read from memory must be written into the register file in the
specified register, which is $[Link], we need both the register file and the ALU.
 If the instruction is a store, the value to be stored must also be read from the register file where it
resides in $t1.
 In addition, a unit to sign-extend the 16-bit offset field in the instruction to a 32-bit signed
value, and a data memory unit to read from or write to.
 The data memory must be written on store instructions; hence, it has both read and write control
signals, an address input, as well as an input for the data to be written into memory.
 Load/Store Instructions Operations :
1. Read register operands
2. Calculate the memory address using 16-bit offset
- Use ALU with sign-extend offset shifted left 2 times
3. Load: Read memory and update register ($t1)
4. Store: Write register value to memory ($t2 + offset)

3. DATAPATH FOR EXECUTING A BRANCH INSTRUCTION


 The general form of a Branch Instruction is
beq $t1,$t2,offset.
 The branch datapath does two operations:
(i) Compute the branch target address and
(ii) Compare the register contents.
 Branch Target Address = PC + 4 + Offset (Sign Extended and Shifted left 2 times)
 When the condition is true (operands are equal), the branch target address becomes the
new PC, and we say that the branch is taken.
 When the condition is false(operands are not equal), the incremented PC should replace
the current PC; we say that the branch is not taken.
 Branch Instruction Operations :
1. Read register operands
2. Compare operands
• Use ALU - Subtract the two operands and
Check for Zero output
3. Calculate target address
• Sign-extend the offset value
• Shift left 2 times
• Add to PC + 4

CREATING A SINGLE (or) COMBINED DATAPATH

Show how to build a datapath for the operational portion of the memory reference
and arithmetic-logical instructions that uses a single register file and a single ALU
to handle both types of instructions, adding any necessary multiplexors.
We can combine the datapath components needed for the individual instruction classes, into a single
datapath and add the control to complete the implementation.

This simplest datapath will execute all instructions in one clock cycle. To share a datapath element
between two different instruction classes, we may need to allow multiple
connections to the input of an element, using a multiplexor and control signal to select among the
multiple inputs.

Step 1
To create a datapath with only a single register file and a single ALU, we must have two different
sources for the second ALU input, as well as two different sources for the data stored into the
register file. Thus, one multiplexor is placed at the ALU input and another at the data input to the
register file.

Step 2
Combine all the pieces to make a simple datapath for the MIPS architecture by adding the
Datapath
 for Instruction fetch
Datapath
 for Arithmetic-Logical instructions
Datapath
 for Memory instructions
Datapath
 for Branch instruction
(NOTE : - Write the basic concepts of each datapath here)
Step 3
In the datapath obtained by composing separate pieces,
 The branch instruction uses the main ALU for comparison of the register operands, so
we must keep the adder for computing the branch target address.
 An additional multiplexor is required to select either the sequentially following
instruction address (PC + 4) or the branch target address to be written into the PC.

Step 4
The control unit must be able to take inputs and generate a write signal for each state element, the
selector control for each multiplexor, and the ALU control.

NOTE : DRAW THE DIAGRAM GIVEN IN PAGE NO : 12

3. OPERATIONS OF A DATAPATH

I. Operation of a Datapath for an ALU (R-Type) instruction


II. Operation of a Datapath for a LOAD /STORE instruction
III. Operation of a Datapath for a BRANCH instruction
IV. Operation of a Datapath for a JUMPinstruction
NOTE : Write the basic concepts of each datapath and then continue its operation steps I.
Operation of a Datapath for an ALU (R-Type) instruction
 Consider the example,
add $t1,$t2,$t3
 The steps are as follows :
1. The instruction is fetched from the instruction memory and the PC is incremented.
2. Two registers, $t2 and $t3, are read from the register file, and the main control unit
computes the setting of the control lines during this step.
3. The ALU operates on the data read from the register file, using the function code to
generate the ALU function.
4. The result from the ALU is written into the register file using the instruction to
select the destination register ($t1).

II. Operation of a Datapath for an LOAD/STORE instruction


 Consider the examples,
(1) lw $t1, offset($t2)
(2) sw $t1, offset($t2)
 The steps are as follows for lw $t1, offset($t2)
1. The instruction is fetched from the instruction memory, and the PC is incremented.
2. A register ($t2) value is read from the register file.
3. The ALU computes the sum of the value read from the register file and the sign-
extended, lower 16 bits of the instruction (offset).
4. The sum from the ALU is used as the address for the data memory.
5. The data from the memory unit is written into the register file; the register
destination is given in the instruction ($t1).
For sw $t1, offset($t2) , Modify Step 5 as
5. The data from the register file , the register source is given in the instruction
($t2) is written into the memory unit.

III. Operation of a Datapath for an BRANCH instruction


 Consider the example,
beq $t1,$t2,offset
 The steps are as follows :
1. The instruction is fetched from the instruction memory and the PC is incremented.
2. Two registers, $t1 and $t2, are read from the register file.
3. The ALU performs a subtraction on the data values read from the register file.
4. The value of PC + 4 is added to the sign-extended, lower 16 bits of the instruction
(offset) shifted left by two; the result is the branch target address.
5. The Zero result from the ALU is used to decide which adder result to store into the
PC.
IV. Operation of a Datapath for a JUMP instruction
 Consider the example, j offset
 The steps are as follows :
1. The jump target address is obtained by shifting left the lower 26 bits of the jump
instruction.
2. Then concatenating the upper 4 bits of PC + 4 as the high order bits, thus yielding
a 32-bit address.
3. Concatenating 00 as the low-order bits.
NOTE : DRAW THE DIAGRAM GIVEN IN PAGE NO : 12

4. CONTROL IMPLEMENTATION SCHEME

• The Control Unit is used to control the functions of various units .


• A control unit is added to the simple datapath.
• A simple MIPS implementation covers
 load word (lw), store word (sw), branch equal (beq)
and
 arithmetic-logical instructions - (add, sub, and, or, set on less than)

ALU Operations ALU Control Lines

 Depending on the instruction class, the ALU will need to perform one of the above five
functions.

1. For load word and store word instructions - the ALU needs to to compute the
memory address by add operation.
2. For the R-type instructions - the ALU needs to perform one of the five actions
- AND , OR, subtract, add, (or) set on less than.
3. For branch equal - the ALU must perform a subtraction.
5. PIPELINING

• Pipelining (or) Instruction Pipelining is an implementation technique in which


multiple instructions are overlapped in execution.
• Pipelining is a process of arrangement of hardware elements of the CPU such that its
overall performance is increased.
• The computer pipeline is divided in stages.
• The stages are connected to one [Link] stage completes a part of an instruction
in parallel.
• Pipelining is widely used in modern processors.
• Pipelining is a particularly effective way of organizing concurrent activity in a
computer system.
• It uses faster circuit technology to build the processor and the main memory.
Advantages :
• Pipelining is a key to make processing fast.
• Pipelining improves system performance in terms of throughput.
• Pipelining makes the system reliable.
Disadvantages:
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.

DIFFERENCE BETWEEN SEQUENTIAL EXECUTION AND PIPELINED EXECUTION

SEQUENTIAL EXECUTION PIPELINED EXECUTION


In the Sequential Execution, the processor In Pipelined Execution, the processor executes a
executes a program by fetching and executing program by overlapping the instructions.
instructions, one after another.
PIPELINED EXECUTION / ORGANIZATION
2 - STAGE PIPELINED EXECUTION
 Execution of a program consists of a sequence of fetch and execute steps.
 Let Fi and Ei refer to the fetch and execute steps for instruction Ii.
 A computer has two separate hardware units.
 They are:

Instruction fetch unit

Instruction execution unit
 The instruction fetched by the fetch unit is stored in an intermediate storage buffer.
 This buffer is needed to enable the execution unit to execute the instruction while the fetch
unit is fetching the next instruction.
 The execution results are stored in the destination location specified by the instruction.
 The fetch and execute steps of any instruction can each be completed in one cycle.

3 - STAGE PIPELINED EXECUTION

 The stages are:


F - Fetch: Read the instruction from the memory
D - Decode : Decode the instruction and fetch the source operand(s)
E - Execute : Perform the operation specified by the instruction
 The stages are:
F - Fetch: Read the instruction from the memory
D - Decode : Decode the instruction and fetch the source operand(s)
E - Execute : Perform the operation specified by the instruction
W - Write : Store the result in the destination location

5 - STAGE PIPELINED EXECUTION

• Instruction Fetch - The CPU reads instructions from the address in the memory
whose value is present in the program counter.
• Instruction Decode - Instruction is decoded and the register file is accessed to get the
values from the registers used in the instruction.
• Execute - ALU operations are performed.
• Memory Access - Memory operands are read and written from/to the memory that is
present in the instruction.
• Write Back – Computed value is written back to the register
6 - STAGE PIPELINED EXECUTION

DESIGNING INSTRUCTION SETS FOR PIPELINING

6. PIPELINED DATAPATH AND CONTROL

 In a five-stage pipeline, upto five instructions will be in execution during any clock cycle.
 The stages are:
1. IF : Instruction Fetch
2. ID : Instruction Decode and Register file read
3. EX : Execute (or) Address Calculation of Operands
4. MEM : Data Memory Access
5. WB : Write Back
 The registers are named for the two stages separated by the stages.
The
 pipeline register between the IF and ID stages is called IF/ID.
The
 pipeline register between the ID and EX stages is called ID/EX.
The
 pipeline register between the EX and MEM stages is called EX/MEM.
The
 pipeline register between the MEM and WB stages is called MEM/WB.

1. Instruction Fetch
 The instruction is being read from memory using the address in the PC and then placed in the
IF/ID pipeline register.
 The IF/ID pipeline register is similar to the Instruction register.
 The PC address is incremented by 4 and then written back into the PC to be ready for the next
clock cycle.
 This incremented address is also saved in the IF/ID pipeline register in case it is needed later for
an instruction, such as beq.

2. Instruction Decode and Register file Read


 The instruction portion of the IF/ID pipeline register supplies the 16-bit immediate field, which is
sign-extended to 32 bits, and the register numbers to read the two registers.
 All three values are stored in the ID/EX pipeline register, along with the incremented PC address.
 Transfer everything that might be needed by any instruction during a later clock cycle.
3. Execute (or) Address Calculation of Operands
 The load instruction reads the contents of register-1 and the sign-extended
immediate from the ID/EX pipeline register and adds them using the ALU.
 That sum is placed in the EX/MEM pipeline register.

4. Memory Access
 The top portion of Figure shows the load instruction reading the data memory using the address
from the EX/MEM pipeline register and loading the data into the MEM/WB pipeline register.

5. Write Back
 The bottom portion of Figure shows the final step: reading the data from the MEM/WB pipeline
register and writing it into the register file in the middle of the figure.

PIPELINED CONTROL SIGNALS

 The first step is to label the control lines on the existing datapath.
 To specify control for the pipeline, we need only set the control values during each pipeline
stage.
 The control lines are divided into five groups according to the pipeline stage.
1. Instruction fetch - No Controls.
2. Instruction decode/register file read - No Controls.
3. Execution/address calculation – RegDst, ALUOp & ALUSrc.
4. Memory access – PCSrc (Branch), MemRead & MemWrite.
5. Write-back – MemtoReg & RegWrite
 The control signals are then used in the appropriate pipeline stage as the instruction moves
down the pipeline.
 In the control lines for the stages,
o The four of the nine control lines are used in the EX phase
o The remaining five control lines passed on to the EX/MEM pipeline register
extended to hold the control lines;
o Three are used during the MEM stage
o The last two are passed to MEM/WB for use in the WB stage.

7. PIPELINE HAZARDS

INTRODUCTION

• Any condition that causes a pipeline to stall(delay) is called a hazard.


• Hazards are problems with the instruction pipeline in CPU , when the next instruction
cannot execute in the following clock cycle.

• Hazards are categorized into three types:

1. Structural Hazard – The situation when two instructions require the use of a
given hardware resource at the same time.

2. Data Hazard – Any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the pipeline.
So some operation has to be delayed, and the pipeline stalls.

3. Instruction Hazard – A delay in the availability of an instruction causes the


pipeline to stall. This type of hazard occurs when the pipeline makes the wrong
decision on a branch prediction and therefore brings instructions into the pipeline
that must subsequently be discarded.
STRUCTURAL HAZARD

• A structural hazard occurs when two or more instructions that are already in pipeline
need the same resource.
• These hazards are because of conflicts due to insufficient resources.
• The result is that the instructions must be executed in series rather than parallel for a
portion of pipeline.
• Structural hazards are sometime referred to as resource hazards.
• Example:

A situation in which multiple instructions are ready to enter the execute instruction
phase and there is a single ALU (Arithmetic Logic Unit).

One solution to such resource hazard is to increase available resources, such as having
multiple ALU.

DATA HAZARD

A data hazard occurs when there is a conflict in the access of an operand location. There are
three types of data hazards. They are
Read After Write (RAW) or True Dependency:
• An instruction modifies a register or memory location and a succeeding instruction
reads the data in that memory or register location.
• A RAW hazard occurs if the read takes place before the write operation is complete.
• Example
I1: R2←R5 + R3
I2: R4←R2 + R3
Write After Read (WAR) or Anti Dependency:
• An instruction reads a register or memory location and a succeeding instruction writes
to the location.
• A WAR hazard occurs if the write operation completes before the read operation takes
place.
• Example
I1: R4←R1 + R5
I2: R5←R1 + R2

Write After Write (WAW) or Output Dependency:


• Two instructions both write to the same location.
• A WAW hazard occurs if the write operations take place in the reverse order of the
intended sequence.
• Example:
I1:R2←R4 +R7
I2:R2←R1 +R3
INSTRUCTION / CONTROL / BRANCH HAZARD
• An instruction (or) control (or) branch hazard, occurs when the pipeline makes the
wrong decision on a branch prediction and therefore brings instructions into the
pipeline that must subsequently be discarded.
• Whenever the stream of instructions supplied by the instruction fetch unit is
interrupted, the pipeline stalls.

8. HANDLING DATA HAZARDS (or) DATA DEPENDENCY

 NOTE : - Write about Data Hazards from Page No: 20 and then continue this.

 Consider the two instructions:



Add R2, R3, #100

Subtract R9, R2, #30

 The destination register R2 for the Add instruction is a source register for the
Subtract instruction.
 There is a data dependency between these two instructions, because register R2 carries
data from the first instruction to the second.

 There are two techniques using which we can handle data hazards.
 They are
(1) Using Operand Forwarding (2) Using Software

Handling Data Dependencies Using Operand Forwarding


 Pipeline stalls due to data dependencies can be improved through the use of operand
forwarding.
 Rather than stalling the instruction, the hardware can forward the value from result register to the
ALU input through the Multiplexers.
 The second instruction can get data directly from the output of ALU after the previous
instruction is completed.
 A special arrangement needs to be made to “forward” the output of ALU to the input of ALU.
Example :
I1 : ADD R1,R2,R3 I2:
SUB R4,R1,R5

Handling Data Dependencies Using Software


 An alternative approach is for detecting data dependencies and dealing with them.

 When the compiler identifies a data dependency between two successive instructions Ij and
Ij+1, it can insert three explicit NOP (No-operation) instructions between them.
 The NOP’s introduce the necessary delay to enable instruction Ij+1 to read the new value
from the register file after it is written.

9. HANDLING INSTRUCTION HAZARDS (or) CONTROL HAZARDS

NOTE : - Write about Instruction Hazards from Page No :21 and then continue
this
A variety of approaches have been taken for dealing with Instruction/Control/Branch Hazards.(Conditional
branches)
1) Multiple Streams
2) Prefetch Branch Target
3) Loop Buffer
4) Branch Prediction
5) Delayed Branch

1) MULTIPLE STREAMS
o The approach is to replicate the initial portions of the pipeline and allow
the pipeline to fetch both instructions, making use of multiple streams.
o There are two problems with this approach:
1. Contention delays for access to the registers and to memory.
2. Additional branch instructions may enter the pipeline before the
original branch decision is resolved.
[2) PREFETCH BRANCH TARGET
o When a conditional branch is recognized, the target of the branch is prefetched,
in addition to the instruction following the branch.
o This target is then saved until the branch instruction is executed.
o If the branch is taken, the target has already been prefetched.
3) LOOP BUFFER
o A loop buffer is a small, very-high-speed memory maintained by the instruction
fetch stage of the pipeline and containing the ‘n’ most recently fetched
instructions, in sequence.
o If a branch is to be taken, the hardware first checks whether the branch target is within
the buffer. If so, the next instruction is fetched from the buffer.
4) BRANCH PREDICTION
o To reduce the branch penalty, the processor needs to anticipate that an instruction
being fetched is a branch instruction and predict its outcome to determine which
instruction should be fetched.
o It is generally of two types:
 Static Branch Prediction
 Dynamic Branch Prediction
o Static Branch Prediction - Assume that the branch will not be taken and to fetch the
next instruction in sequential address order.
o Dynamic Branch Prediction - Uses the recent branch history,to see if a branch was
taken the last time this instruction was executed.

o Techniques for Branch Prediction


Various techniques can be used to predict whether a branch will be taken. The most
common are the following:
Predict
 never taken
Predict
 always taken
Predict
 by opcode

Taken/not taken switch
Branch
 history table

o Branch Prediction Buffer (or) Branch History Table


One implementation of that approach is a branch prediction buffer or branch history
table.

A branch prediction buffer is a small memory indexed by the lower portion of the
address of the branch instruction. The memory contains a bit that says whether the branch
was recently taken or not.
A
 branch predictor tells us whether or not a branch is taken,
Calculates
 the branch target address.
Using
 a cache to hold the branch target buffer.

o Branch Prediction Flowchart



If the instruction is predicted as taken, fetching begins from the target as soon as the PC
is known; it can be as early as the ID stage.
If
 the instruction is predicted as not taken, sequential fetching and executing continue.
If
 the prediction turns out to be wrong, the prediction bits are changed.
sUNIT IV
I/O SYSTEMS
Peripheral Devices and their Characteristics (L1), Input-Output Subsystems (L2), I/O Device
Interface (L2) – SCSI (L2), USB (L2) - I/O Transfers (L2) – Program Controlled (L2) -
Interrupt Driven and DMA (L2) – Privileged and Non-Privileged Instructions (L2) - Software
Interrupts and Exceptions (L2).

Peripheral Devices A computer peripheral, technically speaking, is any device that connects to the computing unit
but is not part of the core architecture of the computing unit. The core computing unit consists of the central processing
unit (CPU), motherboard, and power supply. The case that surrounds these elements are also considered part of the core
computing unit. So anything that is connected to these elements is considered a peripheral.
o The input-output subsystem of a computer, referred to as I/O, provides an efficient
mode of communication between the central system and the outside environment.
o Programs and data must be entered into computer memory for processing and results
obtained from computations must be recorded or displayed for the user.

I/O INTERFACES
 Input-Output interface provides a method for transferring information between
internal storage and external I/O devices.
 The I/O bus from the processor is attached to all peripheral interfaces.
 To communicate with a particular device, the processor places a device address on the
address lines.
 The I/O bus consists of data lines, address lines, and control lines.
 The I/O Interface consists of address decoder, control circuits ,data register and
status register to coordinate the I/O transfers.

 The address decoder enables the device to recognize its address when this address
appears on the address lines.
 The data register holds the data. A data command causes the interface to respond by
transferring data from the bus into one of its registers.
 The status register contains information. A status command is used to test various
status conditions in the interface and the peripheral.
 A control command is issued to activate the peripheral and to inform it what to do.
I/O INTERFACING TECHNIQUES
o I/O devices can be interfaced to a computer system I/O in two ways , which are called
interfacing techniques.
o They are
 Memory mapped I/O
 I/O mapped I/O (Isolated I/O)

Memory Mapped I/O


 Memory-mapped I/O uses the same address
space to address both memory and I/O devices.
 The memory and registers of the I/O devices
are mapped to address values.
 So when an address is accessed by the CPU, it
may refer to a portion of physical RAM, or it can
instead refer to memory of the I/O device.

I/O mapped I/O (Isolated I/O)


 I/O mapped I/O (also known as
port mapped I/O or isolated I/O)
uses a separate, dedicated address
space and is accessed via a
dedicated set of microprocessor
instructions.
I/O PROCESSORS (IOP)
o The Input Output Processor (IOP) is just like a CPU that handles the details of I/O
operations.
o The IOP can fetch and execute its own instructions that are specifically designed to
characterize I/O transfers.
o In addition to the I/O related tasks, it can perform other processing tasks like arithmetic,
logic, branching and code translation.
o The main memory unit takes the pivotal role. It communicates with processor by the
means of DMA.
o IOP takes care of input and output tasks relieving the CPU involved in I/O transfers.
o IOP is a specialized processor which loads and stores data into memory along with the
execution of I/O instructions.
o IOP acts as an interface between system and devices.

o It involves a sequence of events to execute I/O operations and then store the results into
the memory.
CPU - IOP COMMUNICATION
MODES OF I/O DATA TRANSFER
Data transfer to and from I/O devices may be handled in one of three possible modes:
1. Programmed I/O
2. Interrupt-initiated I/O
3. Direct memory access (DMA)

Programmed I/O :
o When the processor is executing a program and encounters an
instruction relating to I/O, it executes that instruction by
issuing a command to the appropriate I/O module.
o The I/O module performs the requested action and takes no
action to alert the processor and it does not interrupt the
processor.
o The processor periodically checks the status of the I/O module
until it finds that the operation is complete.
o The processor is responsible for extracting data from main
memory for output and storing data in main memory for input.
o Thus, the instruction set includes I/O instructions in the
following categories:
 Control : Used to activate an external device and tell it
what to do.
 Status : Used to test various status conditions associated
with an I/O module and its peripherals.
 Transfer : Used to read and/or write data between
processor registers and external devices.

Interrupt-Driven I/O :
o An alternative to Programmed I/O is for the processor to issue an
I/O command to a module and then go on to do some other useful
work.
o The I/O module will then interrupt the processor to request service
when it is ready to exchange data with the processor.
o The processor then executes the data transfer and then resumes its
former processing.
o The processor issues a READ command. The I/O module receives a
READ command from the processor and then proceeds to read data
in from the device.
o Once the data are in the I/O module’s data register the module
signals an interrupt to the processor over a control line.
o When the interrupt from the I/O module occurs, the processor saves
the context of the program it is currently executing and begins to
execute an interrupt-handling program that processes the interrupt.
o Interrupt-driven I/O is more efficient than programmed I/O because
it eliminates needless waiting.
Direct Memory Access :
o When large volumes of data are to be moved, a more efficient technique is required:
direct memory access (DMA).
o The DMA function can be performed by a separate
module on the system bus or it can be incorporated into an
I/O module.
o When the processor wishes to read or write a block of
data, it issues a command to the DMA module, by sending
to the DMA module the following information:
• Whether a read or write is requested
• The address of the I/O device involved
• The starting location in memory to read data from
or write data to
• The number of words to be read or written
o The processor then continues with other work. It has delegated this I/O operation to
the DMA module, and that module will take care of it.
o The DMA module transfers the entire block of data, one word at a time, directly to or
from memory without going through the processor. When the transfer is complete, the
DMA module sends an interrupt signal to the processor.
o Thus the processor is involved only at the beginning and end of the transfer.
COMPARISON BETWEEN PROGRAMMED I/O AND INTERRUPT DRIVEN I/O
7. INTERRUPTS

o An interrupt is defined as hardware or software generated event external to the currently


executing process that affects the normal flow of the instruction execution.
o The processor responds by suspending its current activities, saving its state, and
executing a function called an interrupt handler (or an interrupt service routine, ISR)
to deal with the event.
o This interruption is temporary, and, after the interrupt handler finishes, the processor
resumes normal activities.

CLASSES OF INTERRUPTS

TYPES OF INTERRUPTS
There are two types of interrupts:
1. Hardware interrupts
2. Software interrupts
Hardware Interrupts :
o Used by devices to communicate that they require attention from the operating
system.
o For example, pressing a key on the keyboard (or) moving the mouse triggers
hardware interrupts that cause the processor to read the keystroke or mouse
position.
Software Interrupts :
o Caused either by an exceptional condition in the processor itself, or a
special instruction in the instruction set which causes an interrupt when it is
executed.
o Example : Divide-by-zero exception
STEPS IN INTERRUPT PROCESSING
1. The device issues an interrupt signal to the processor.
2. The processor finishes execution of the current instruction
before responding to the interrupt.
3. The processor tests for an interrupt, determines
that there is one, and sends an
acknowledgment signal to the device that
issued the interrupt. The acknowledgment
allows the device to remove its interrupt signal.
4. The processor needs to prepare to transfer
control to the interrupt routine.
5. The processor now loads the program counter
with the entry location of the interrupt-
handling program that will respond to this
interrupt.
6. Once the program counter has been loaded, the
processor proceeds to the next instruction
cycle, which begins with an instruction fetch.
The contents of the processor registers need to
be saved, because these registers may be used
by the interrupt handler. So all of these values,
plus any other state information, need to be saved.
7. The interrupt handler next processes the interrupt.
8. When interrupt processing is complete, the saved
register values are retrieved from the stack and restored to the registers.
9. The final act is to restore the PSW and program counter values from the stack.
10. As a result, the next instruction to be executed will be from the previously interrupted
program.

8. DIRECT MEMORY ACCESS (DMA)


o A Direct Memory Access (DMA) is a mechanism that allows an input/output (I/O)
device to send or receive data directly to or from the memory without involving the
processor.
o DMA is implemented with a specialized controller called DMA controller.
o DMA Controller is a control unit that transfers blocks of data between an I/O device and
memory independent of the processor.
o DMA controller provides an interface between the bus and the input-output devices.
o More than one external device can be connected to the DMA controller.
o DMA controller contains an address unit, for generating addresses and selecting I/O
device for transfer.
o It also contains the control unit and data count for keeping counts of the number of blocks
transferred and indicating the direction of transfer of data.
o When the transfer is completed, DMA informs the processor by raising an interrupt.

CPU SIGNALS FOR DMA TRANSFER

Bus Request :
o It is used by the DMA controller to request the CPU to relinquish(release) the control
of the buses.
Bus Grant :
o It is activated by the CPU to inform the external DMA controller that the buses are
in high impedance state and the requesting DMA can take control of the buses.
o Once the DMA has taken the control of the buses, it transfers the data.
STEPS IN DMA TRANSFER
o DMA transfer is controlled by the DMA controller.
o The DMA Controller requests the control of the buses from the CPU.
o After gaining control, the DMA controller performs read and write operations directly
between devices and memory.

o The DMA requires the CPU to provide two additional bus signals:
 The Hold (HLD)Signal is an input to the CPU through which DMA
controllers asksfor ownership of the bus.
 The Hold Acknowledge (HLDA) signal tells that the buses has beengranted.
o The CPU will finish all pending bus operations before granting control of the bus to
the DMA controller.
o Once the DMA controller gets the control of the buses, it can perform any transaction
(reads and writes) using the same bus.
o After the transaction is finished, the DMA controller returns the bus to the CPU.

MODES OF DMA OPERATION


There are three modes of DMA Operation.
They are
Byte Transfer (or) Cycle Stealing DMA Transfer : In this mode, DMA givescontrol
of buses to CPU after transfer of every byte.

Burst DMA Transfer : In this mode DMA handover the buses to CPU only after completion of
whole data transfer.

Block Transfer :Here, DMA transfers data only when CPU is executing the
instruction which does not require the use of buses.
(a) Byte (or) Cycle stealing DMA transfer Mode

(b) Burst DMA Transfer Mode


(c) Transparent DMA transfer Mode

9. BUS STRUCTURE
o There are many ways to connect different parts inside a computer together such as
processor, memory, I/O devices.
o The simplest and most common way of interconnecting various parts of the computer
is a bus.
o A group of lines that serves as a connecting path for several devices is called a bus.
o A bus must have additional lines for Address, Data and Control.
o A bus that connects major computer components/modules (CPU, memory, I/O) is
called a System Bus.

SYSTEM BUS
 The system bus is a set of conductors that connects the CPU, memory and I/O modules.
 The system bus is separated in to three functional groups:
 Data Bus
 Address Bus
 Control Bus
Data Bus
 The data bus consists of 8, 16, 32 or more parallel signal lines.
 The data bus lines are bi - directional.
 It means that CPU can read data on these lines from memory or from a port as well as
send data out on these lines to a memory location or a port.
 The data bus is connected in parallel to all peripherals.
 The communication between peripherals and CPU is activated by giving output enable
pulse to the peripheral.
Address Bus
 It is a unidirectional bus.
 The address bus consists of 16, 20, 24 or more parallel signal lines.
 On these lines the CPU sends out the address of the memory location or I/O port that
is to be written or read from.
Control Bus
 The control lines regulate the activity on the bus.
 The CPU sends signals on the control bus to enable the outputs of addressed memory
devices or port devices.

ELEMENTS OF BUS DESIGN

TYPES OF BUS STRUCTURES


The types of bus structure are:
(1) Single bus structure and (2) Multiple bus structure

Single Bus Structure


o The simplest way is to interconnect functional units to a single bus.

o The single bus can be used for only one transfer at a time; only two units can actively
use the bus at any given time.
o Bus control lines are used to arbitrate multiple requests for use of one bus.
o The devices connected to a bus vary widely in their speed of operation
o Some devices are relatively slow, such as printer and keyboard
o Some devices are considerably fast, such as optical disks
o Memory and processor units operate are the fastest parts of a computer
Advantage:
 Low cost,
 It is very flexibility to attach many peripherals.

Multiple Bus Structure


o To improve performance Multibus structure can be used.
o In two – bus structure : One bus can be used to fetch instruction and the other bus
can be used to fetch data, required for execution. Thus improving the performance, but
cost increases.
o In a three-bus structure : Two Buses are used to transfer source operands to the inputs of ALU,
and result transferred to destination over the third bus.
o Different units in a system having different [Link] units are very fast.
o All these devices communicate with each other over the same bus.
o In order to communicate smoothly, they include buffer registers with the devices to
hold the data during transfer.

BUS STANDARDS / BUS INTERFACES


o A bus standard provides flexibility.
o It can connect multiple devices from different manufacturers.
o An I/O bus standard defines the connectors, signals and clock speeds.
o There are a number of existing I/O bus standards that are used.
o These bus standards are parallel buses.
o A parallel bus interfaces the system memory bus through a bridge or a switching circuit.
o The different types of bus standards are
1) ISA (Industry Standard Architecture)
2) MCA (Micro Channel Architecture)
3) EISA ( Extended Industry Standard Architecture)
4) PCI (Peripheral Component Interconnect)
5) USB (Universal Serial Bus)

ISA - Industry Standard Architecture


o ISA is the oldest of the bus standards
o Today's computers still have an ISA bus interface in form of an ISA slot on the
mother board.
o ISA is a standard bus architecture that is associated with the IBM AT motherboard.
o An ISA bus provides a basic route for peripheral devices that are attached to a
motherboard to communicate with different circuits or other devices that are also
attached to the same motherboard.
o For example, an ISA bus may be used to connect a sound card,video card, network
card, disk drives or an extra serial port.

MCA - Micro Channel Architecture


o MCA was introduced by IBM in 1987.
o MCA was a proprietary 16 or 32-bit parallel computer bus which was used until the
mid-1990s.
o MCA offers several significant improvements over ISA.
o The MCA bus was IBM's attempt to replace the ISA bus with something "bigger and
better".
o The primary downfall of the MCA bus was that it was a proprietary bus and required
licensing fees.
o Because of its proprietary format and competing standards, the MCA bus never
became widely used.

EISA - Extended Industry Standard Architecture


o EISA was introduced in 1988 and was designed by nine competitors to compete with
IBM's bus.
o These competitors were AST Research, Compaq, Epson, Hewlett Packard, NEC,
Olivetti, Tandy, WYSE, and Zenith Data Systems.
o When they developed EISA, they avoided the two key mistakes that IBM made.
 First, they made it compatible with the ISA bus.
 Second, they opened the design to all manufacturers instead of keeping it
proprietary
o The EISA is a 32-bit bus, and found on Intel 80386, 80486 and early
Pentiumcomputers.
o EISA are sometimes found in network fileservers.
o The EISA bus is virtually non-existent on desktop systems for several reasons.
 First, EISA-based systems tend to be much more expensive than other types of
systems.
 Second, there are few EISA-based cards available.
 Finally, the performance of this bus is quite low compared to the popular local
buses.
o Although the EISA bus is backwards compatible and not a proprietary bus, it
neverbecame widely used and is no longer found in computers today.

PCI - Peripheral Component Interconnect


o PCI was a local bus standard introduced by Intel in 1992, revised in 1993 to version
2.0, and later revised in 1995 to PCI 2.1
o The PCI bus is the de-facto standard bus for current-generation personal computers.
o It is high performance bus that is used to integrate chips, processor, memory
subsystems and expansion boards.
o Originally, it was 32 bit but now it also support 64 bit transmission and was the most
commonly found and used computer bus in computers during the late 1990's and early
2000's.
o The PCI architecture was designed as a replacement for the ISA standard, with three
main goals:
 To get better performance when transferring data between the computer and its
peripherals
 To be as platform independent as possible
 To simplify adding and removing peripherals to the system.
o It is the only bus that can carry 64 bits of data, which is useful for Pentium processors.
o PCI is three times faster than either EISA or MCA.
o It is ideally suited for high-speed data transfers in graphic situations and is popular in
embedded systems.
o Today's computers and motherboards have replaced PCI with PCI Express (PCIe)
slots.
USB - Universal Serial Bus
o The Universal Serial Bus (USB) is the most widely used interconnection standard.
o A large variety of devices are available with a USB connector, including mice,
memory keys, disk drives, printers, cameras, and many more.
o The commercial success of the USB is due to its simplicity and low cost.
o The original USB specification supports two speeds of operation, called low-speed
(1.5 Megabits/s) and full-speed (12 Megabits/s). Later, USB 2, called High-Speed
USB, was introduced.
o It enables data transfers at speeds up to 480 Megabits/s.
o As I/O devices continued to evolve with even higher speed ,USB 3 (called
Superspeed) was developed.
11. UNIVERSAL SERIAL BUS (USB)

o The Universal Serial Bus (USB) is the most widely used interconnection standard.
o USB gives fast and flexible interface for connecting all kinds of peripherals.
o USB is released in 1996, and currently maintained by the USB Implementers
Forum (USB IF).
o A Universal Serial Bus (USB) is a common interface that enables communication
between devices and a host controller such as a personal computer (PC).
o A large variety of devices are available with a USB connector, including mouse,
memory keys, disk drives, printers, cameras, and many more.
o Because of its wide variety of uses, the USB has replaced a wide range of interfaces
like the parallel and serial port.
o A USB is intended to enhance plug-and-play and allow hot swapping.
 Plug-and-Play enables the operating system (OS) to spontaneously configure
and discover a new peripheral device without having to restart the computer.
 Hot Swapping allows removal and replacement of a new peripheral without
having to reboot.

USB FEATURES
1. Simple Connectivity
2. Simple Cables
3. One interface for many devices
4. Automatic Configuration
5. No user setting
6. Frees hardware resources for other devices
7. Hot pluggable(Plug-and- Play)
8. Data transfer rates
9. Co-existence with IEEE standard
10. Reliability
11. Low cost
12. Low power consumption
13. Flexibility
14. Operating system support

USB VERSIONS
There have been several major USB standards, USB4 being the newest.
Most USB devices and cables today adhere to USB 2.0, and a growing number to USB
3.0.
Version Also Called as Transmission rate
USB 4.0 -- 40 Gbps
USB 3.2 Superspeed+ USB 20 Gbps
USB 3.1 Superspeed+ USB 10 Gbps
USB 3.0 SuperSpeed USB 5 Gbps
USB 2.0 High-Speed USB 480 Mbps
USB 1.1 Full Speed USB 12 Mbps

USB CONNECTOR
There are two types of USB Connectors.
In both the types, there are four signals .

 The 5.0 V and the Ground signals are used to power the device connected.
 The data signals are biphase signals.
 Data + represents 5.0V
 Data – represents 0 V

UNIT - V

MEMORY AND PARALLELISM

Memory System Design(L1) - Semiconductor Memory Technologies( L1)– Memory


organization( L3) - Instruction level Parallelism( L2) – Parallel processing challenges ( L2)–
Flynn's classification( L2) – Hardware Multithreading ( L2) – Multi-core processors ( L2).

Memory System Design


1. Registers
Registers are small, high-speed memory units located in the CPU. They are used to store the most
frequently used data and instructions. Registers have the fastest access time and the smallest storage
capacity, typically ranging from 16 to 64 bits.
2. Cache Memory
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently used data
and instructions that have been recently accessed from the main memory. Cache memory is designed
to minimize the time it takes to access data by providing the CPU with quick access to frequently
used data.
3. Main Memory
Main memory , also known as RAM (Random Access Memory), is the primary memory of a
computer system. It has a larger storage capacity than cache memory, but it is slower. Main memory
is used to store data and instructions that are currently in use by the CPU.
Types of Main Memory
 Static RAM: Static RAM stores the binary information in flip flops and information remains
valid until power is supplied. It has a faster access time and is used in implementing cache
memory.
 Dynamic RAM: It stores the binary information as a charge on the capacitor. It requires
refreshing circuitry to maintain the charge on the capacitors after a few milliseconds. It contains
more memory cells per unit area as compared to SRAM.
4. Secondary Storage
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD) , is a non-volatile
memory unit that has a larger storage capacity than main memory. It is used to store data and
instructions that are not currently in use by the CPU. Secondary storage has the slowest access time
and is typically the least expensive type of memory in the memory hierarchy.
5. Magnetic Disk
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic or a
magnetized material. The Magnetic disks work at a high speed inside the computer and these are
frequently used.
6. Magnetic Tape
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film. It is
generally used for the backup of data. In the case of a magnetic tape, the access time for a computer
is a little slower and therefore, it requires some amount of time for accessing the strip.
Characteristics of Memory Hierarchy
 Capacity: It is the global volume of information the memory can store. As we move from top to
bottom in the Hierarchy, the capacity increases.
 Access Time: It is the time interval between the read/write request and the availability of the
data. As we move from top to bottom in the Hierarchy, the access time increases.
 Performance: Earlier when the computer system was designed without a Memory Hierarchy
design, the speed gap increased between the CPU registers and Main Memory due to a large
difference in access time. This results in lower performance of the system and thus, enhancement
was required. This enhancement was made in the form of Memory Hierarchy Design because of
which the performance of the system increases. One of the most significant ways to increase
system performance is minimizing how far down the memory hierarchy one has to go to
manipulate data.
 Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases i.e.
Internal Memory is costlier than External Memory.
Advantages of Memory Hierarchy
 It helps in removing some destruction, and managing the memory in a better way.
 It helps in spreading the data all over the computer system.
 It saves the consumer’s price and time.

PARALLEL PROCESSING CHALLENGES


o The difficulty with parallelism is not the hardware.
o It is difficult to write software that uses multiple processors and the problem gets
worse as the number of processors increases.
o We must get better performance or better energy efficiency from a parallel
processing program.
o It is difficult to write parallel processing programs that are fast, as the number
of processors increases.
o In parallel programming, the other challenges include scheduling, load balancing,
time for synchronization and overhead for communication.
 M.J. Flynn proposed a classification for the organization of a computer system
by the number of instructions and data items that are manipulated
simultaneously.
 The sequence of instructions read from memory constitutes an instruction
stream.
 The operations performed on the data in the processor constitute a data stream.
 It is based on the multiplicity of instruction and data streams observed by the
CPU during program execution.

o Flynn proposed the following categories of computer systems:


o Single Instruction, Single Data stream (SISD)
o Single Instruction, multiple Data stream (SIMD)
o Multiple Instructions, single Data stream (MISD)
o Multiple Instructions, multiple Data stream (MIMD)

1. SINGLE INSTRUCTION SINGLE DATA STREAM (SISD)


o An SISD computing system is a uniprocessor machine which is capable of
executing a single instruction, operating on a single data stream.
o In SISD, machine instructions are processed in a sequential manner and computers
adopting this model are popularly called sequential computers.
o Most conventional computers have SISD architecture. All the instructions and data
to be processed have to be stored in primary memory.
o There is some sort of control unit that provides an instruction stream to a
processing unit
o The processing unit operates on a single data stream from a data memory unit.
o E.g - IBM PC, IBM 704,VAX 11/7801,CRAY -1 ,Older mainframe computers

2. SINGLE INSTRUCTION MULTIPLE DATA STREAM (SIMD)


o An SIMD system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different data streams.
o In SIMD, there is a single control unit, feeding a single instruction stream to
multiple Processing Units.
o Each Processing Unit may have its own dedicated memory unit.
o These machines are used in applications such as Digital signal processing, image
processing and multimedia applications (Audio and Video).
o Machines based on an SIMD model are well suited to scientific computing since
they involve lots of vector and matrix operations.
o Vector and array processors fall into this category.
o E.g – ILLIAC-IV,MPP,CM-2,STARAN

3. MULTIPLE INSTRUCTION SINGLE DATA STREAM (MISD)


o An MISD computing system is a multiprocessor machine capable of executing
different instructions but all of them operating on the same data stream.
o This is an uncommon architecture which is generally used for fault tolerance.
o Machines built using the MISD model are not useful in most of the application, a
few machines are built, but none of them are available commercially.
o This structure is not commercially implemented.
o E.g – Systolic Array Computers .

4. MULTIPLE INSTRUCTIONS MULTIPLE DATA STREAM (MIMD)


o MIMD machines are usually referred to as multiprocessors or multi computers.
o An MIMD system is capable of executing multiple instructions on multiple data
streams.
o Each processor must include its own control unit that will assign to the processors
parts of a task or a separate task.
o Machines built using this model are capable to any kind of application..
o The MIMD may be a distributed-memory multiprocessor or a shared-memory
multicomputer.
o E.g – CRAY-XMP, IBM 370/168 M
5. SINGLE PROGRAM, MULTIPLE DATA (SPMD)
o SPMD (single program, multiple data) is a technique employed to
achieve parallelism; it is a subcategory of MIMD.
o Tasks are split up and run simultaneously on multiple processors with different
input in order to obtain results faster.
o SPMD is the most common style of parallel programming.

SPMD vs SIMD
1. In SPMD, multiple autonomous processors simultaneously execute the same
program at independent points, rather than SIMD imposes on different data.
With SPMD, tasks can be executed on general purpose CPUs.
2. SIMD requires vector processors to manipulate data streams. Note that the
two are not mutually exclusive.

6. VECTOR PROCESSORS
o Vector processors are the technology used in modern computers and central
processing units.
o A vector processor is a central processing unit that can work on an entire vector
(array) in one instruction.
o The instruction to the processor is in the form of one complete vector instead of
its element.
o Vector processors are used because they reduce the draw and interpret
bandwidth owing to the fact that fewer instructions must be fetched.
o A vector processor is also known as an array processor.
o They exploit data parallelism in large scientific and multimedia applications.

5. HARDWARE MULTITHREADING
o An approach, which allows for a high degree of instruction-level parallelism
without increasing the complexity or power consumption, is called multithreading
o The instruction stream is divided into several smaller streams, known as threads,
such that the threads can be executed in parallel.
o Hardware multithreading is a well-known technique to increase the utilization of
processor resources. The idea is to start executing a different thread when the
current thread is stalled.
o Hardware Multithreading allows multiple threads to share the functional units
of a single processor in an overlapping fashion.
o To permit this sharing, the processor must duplicate the independent state of each
thread.
o There are three main approaches to hardware multithreading.
o Fine-Grained Multithreading
o Coarse-Grained Multithreading
o Simultaneous Multithreading

FINE-GRAINED MULTITHREADING
o Fine-grained multithreading switches between threads on
each instruction, resulting in interleaved execution of
multiple threads.
o Also called as Interleaving.
o This interleaving is often done in a round robin fashion,
skipping any threads that are stalled at that time.
o To make fine-grained multithreading practical, the processor
must be able to switch threads on every clock cycle.
o If there is a sufficient number of threads, it is likely that at
least one is active (not stalled), and the CPU can be kept
running.
o With fine-grained multithreading in a pipelined Architecture,
if:
– the pipeline has k stages,
– there are at least k threads to be executed, and
– the CPU can execute a thread switch at each clock cycle
then
-there can never be more than a single instruction per thread in the pipeline at
any instant, so there cannot be hazards due to dependencies, and the pipeline
never stalls.

o Advantage :
 Potential to avoid wasted machine time due to stalls.
 It can hide throughput losses that arise from both short and long stalls.
o Disadvantage :
 A thread that is ready to execute a long sequence of instructions may have
wait to execute every instruction.
 It slows down execution of individual threads.
COARSE-GRAINED MULTITHREADING
o Coarse-grained multithreading was invented as an alternative
to fine-grained multithreading.
o Coarse-grained multithreading switches threads only on
stalls, waiting for a time-consuming operation to complete.
o Also called as Blocking.
o A switch is made to another thread. When this thread in turn
causes a stall, a third thread is scheduled and so on.
o Advantage :
 Switching threads doesn’t need to be nearly
instantaneous.
o Disadvantages :
 The processor can be idled on shorter stalls, and thread
switching will also cause delays.
 Coarse-grained multithreading suffers, from a major
throughput losses, especially from shorter stalls.

SIMULTANEOUS MULTITHREADING (SMT)


o Simultaneous multithreading (SMT) is a variation on hardware multithreading to
exploit instruction-level parallelism (ILP) and thread-level parallelism(TLP) at the
same time.
Simultaneous Multi-Threading (SMT) = ILP + TLP
o Simultaneous multithreading (SMT) is a technique for improving the overall
efficiency of superscalar CPUs with hardware multithreading.
o SMT permits multiple independent threads of execution to
better utilize the resources provided by modern processor
architectures.
o In SMT, instructions belonging to different threads are
(almost certainly) independent, and by issuing them
concurrently, CPU resources utilization raises.
o SMT is a multiple-issue processor that uses the resources of
a multiple-issue, dynamically scheduled processor.
o SMT is convenient since modern multiple-issue CPUs have
a number of functional units that cannot be kept busy with
instructions from a single thread.
o By applying register renaming and dynamic scheduling,
instructions belonging to different threads can be executed concurrently.
(Register Renaming - There are an infinite number of virtual registers available, and
hence all WAW and WAR hazards are avoided and an unbounded number of
instructions can begin execution simultaneously.
Dynamic Parallelism - The processor decides at run time which instructions to
execute in parallel).
o SMT often have more functional unit parallelism available than a single thread can
effectively use.
o SMT is always executing instructions from multiple threads, leaving it up to the
hardware to associate instruction slots with their proper threads.

APPROACHES TO EXECUTE MULTIPLE THREADS

o The figure given below illustrates the possible architecture that involve
multithreading and contrasts these with approaches that do not use multithreading.
o Each horizontal row represents the issue slot or slots for a single execution cycle;
that is, the width of each row corresponds to the maximum number of instructions
that can be issued in a single clock cycle.
o The vertical dimension represents the time sequence of clock cycles.
o An empty slot represents an unused execution slot in one pipeline.

The various approaches to execute Multiple Threads are


1. Approaches with a scalar (Single Issue) processor
2. Approaches with a superscalar (Multiple Issue) processor
Approaches with a Scalar Processor

o The different approaches with a scalar (i.e., single-issue) processor:


1. Single Threaded Scalar:
o This is the simple pipeline found in traditional RISC and CISC machines,
with no multithreading.
2. Interleaved Multithreaded Scalar:
o This is the easiest multithreading approach to implement.
o By switching from one thread to another at each clock cycle, the pipeline
stages can be kept fully occupied, or close to fully occupied.
o The hardware must be capable of switching from one thread context to
another between cycles.
3. Blocked Multithreaded Scalar:
o A single thread is executed until a latency event occurs that would stop the
pipeline, at which time the processor switches to another thread.

Approaches with a Superscalar Processor


1. Superscalar:
o This is the basic superscalar approach with no multithreading.
o It is the most powerful approach to providing parallelism within a
processor.
o During some cycles, not all of the available issue slots are used. During
these cycles, less than the maximum number of instructions is issued; this is
referred to as horizontal loss.
o During other instruction cycles, no issue slots are used; these are cycles
when no instructions can be issued; this is referred to as vertical loss.

2. Interleaved Multithreading Superscalar:


o During each cycle, as many instructions as possible are issued from a single
thread.
o With this technique, potential delays due to thread switches are eliminated.
o The number of instructions issued in any given cycle is still limited by
dependencies that exist within any given thread.

3. Blocked Multithreaded Superscalar:


o Instructions from only one thread may be issued during any cycle, and
blocked multithreading is used.

6. MULTICORE PROCESSORS
o A multicore computer, also known as a chip multiprocessor, combines
two or more processors (called cores) on a single IC.
o Each core consists of all of the components of an independent processor,
such as registers, ALU, pipeline hardware, and control unit, plus L1
instruction and data caches.
o In addition to the multiple cores, contemporary multicore chips also include
L2 cache and L3 cache.
Hardware Performance Issues Software Performance Issues
Increase in Parallelism Multi-threaded applications
 Increase in Complexity Multi-process applications
 Increase in Power Consumption Multi-instance applications

MULTICORE ORGANIZATION
 The main variables in a multicore organization are as follows:
o The number of core processors on the chip
o The number of levels of cache memory
o The amount of cache memory that is shared
 Each core has its own L1 and L2 cache.
 L1 and L2 are the fastest memories that a CPU can access.
 L1 is always dedicated per core whereas L2 can be shared.
 If the processor can find the instruction sets or data for its next operation in the
L1 and L2 cache, then it does not need to access the slower L3 cache.
 The four general organizations for multicore systems are :
 Dedicated L1 Cache
 Dedicated L2 Cache
 Shared L2 Cache
 Shared L3 Cache
DEDICATED L1 CACHE
o In this organization, the only on-chip cache is L1 cache,
with each core having its own dedicated L1 cache.
o The L1 cache is divided into instruction and data
caches.
o L1 (Level 1) cache is the fastest memory that is present
in a computer system.
o L1 cache has the data the CPU is most likely to need
while completing a certain task.
o Its size typically varies between 256KB to 1MB.
o Example : ARM11 MPCore
DEDICATED L2 CACHE
o In this organization,there is no on-chip cache sharing..
o L2 cache is slower than L1 cache, but bigger in size.
o Its size typically varies between 256KB to 8MB.
o A potential advantage to having only dedicated L2
caches on the chip is that each core enjoys more rapid
access to its private L2 cache.
o L2 cache holds data that is likely to be accessed by the
CPU next.
o Example : AMD Opteron.
SHARED L2 CACHE
o A similar allocation of chip space to memory, but with
the use of a shared L2 cache.
o As the amount of cache memory available on the chip
continues to grow, performance considerations dictate
splitting off a separate, shared L3 cache, with dedicated
L1 and L2 caches for each core processor.
o The use of a shared L2 cache confines the cache
coherency problem to the L1 cache level, which may
provide some additional performance advantage.
o Example : Intel Core Duo.

SHARED L3 CACHE
o L3 (Level 3) cache is the largest cache memory unit,
and also the slowest one.
o Its size typically varies between 4MB to 50MB.
o As both the amount of memory available and the
number of cores grow, the use of a shared L3 cache
combined with either a shared L2 cache or dedicated
per core L2 caches seems likely to provide better
performance than simply a massive shared L2 cache.
o Example : AMD K10.

7. SHARED MEMORY MULTIPROCESSORS


o A shared memory multiprocessor is a parallel
processor with a single address space across all
processors.

o All processes on the various CPUs share a unique


logical address space, which is mapped on a
physical memory that can be distributed among the
processors.
o Processors communicate through shared variables
in memory.
o Each process can read and write a data item simply using load and store operations
o These systems can still run independent jobs in their own virtual address spaces,
even if they all share a physical address space.
o This is an architectural model simple and easy to use for programming.
o It can be applied to a wide variety of problems that can be modeled as a set of
tasks, to be executed in parallel.
o The basic issue in shared memory multiprocessor systems is memory itself, since
the larger the number of processors involved, the more difficult to work on
memory efficiently.

TYPES OF SHARED MEMORY MULTIPROCESSORS

There are three types of Shared memory multiprocessors. They are


o Uniform Memory Access multiprocessors (UMA)
o Non-Uniform Memory Access multiprocessors (NUMA)
o Cache Only Memory Access (COMA)

UNIFORM MEMORY ACCESS MULTIPROCESSOR (UMA)

o Multiprocessor in which accesses to main


memory take about the same amount of time
o No matter which processor requests the access
and no matter which word is asked.
o These systems are also called
Symmetric Shared-memory Multiprocessors.

NON-UNIFORM ACCESS MULTIPROCESSOR (NUMA)


o A type of single address space
multiprocessor in which some memory
accesses are much faster than others
depending on which processor asks for
which word.
o These systems are also called Distributed
Shared Memory architectures.
CACHE ONLY MEMORY ACCESS (COMA)
• Data have no specific “permanent” location
 The entire physical address space is considered a huge, single cache.
 The data can be read into the local caches and/or modified and then updated at
their “permanent” location.
 Data can migrate and/or can be replicated in the various memory banks of the
central main memory.

TYPES OF SHARED MEMORY ARCHITECTURES

There are two types of Shared memory Architectures. They are


1. Symmetric Shared-Memory Architecture
2. Distributed Shared Memory Architecture
SYMMETRIC SHARED-MEMORY ARCHITECTURE

o The Symmetric Shared Memory Architecture consists of several processors with a


single physical memory shared by all processors through a shared bus.
o A multicore processor has multiple CPUs or cores on a single chip.
o The cores have private level-1 caches, while other caches may or may not be
shared between the cores.

DISTRIBUTED SHARED MEMORY ARCHITECTURE


o A distributed-memory system, often called a multicomputer, consists of multiple
independent processing nodes with local memory modules which are connected by
a general interconnection network.
o Here, physically separated memories can be addressed as one logically shared
address space.
o Here, the term "shared" does not mean that there is a single centralized memory,
but that the address space is "shared".
o Each node of a cluster has access to shared memory in addition to each node's non-
shared private memory.

You might also like