0% found this document useful (0 votes)
6 views39 pages

Memory Hierarchy and Technologies Explained

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views39 pages

Memory Hierarchy and Technologies Explained

Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT V - MEMORY AND CONTROL UNIT

Memory hierarchy - Memory technologies – Cache basics – Measuring and improving


cache performance - Virtual memory, TLBs - Input/output system, programmed I/O, DMA
and interrupts, I/O processors.

[Link] HIERARCHY

The principle of locality

“States that programs access a relatively small portion of their address space at any instant

of time.” Eg: just as you accessed a very small portion of the library’ s collection. There are

two different types of locality:

■ Temporal locality (locality in time): if an item is referenced, it will tend to be referenced

again soon. If you recently brought a book to your desk to look at, you will probably need to

look at it again soon.

■ Spatial locality (locality in space): if an item is referenced, items whose addresses are

close by will tend to be referenced soon. For example, If you are referring a book you will also

refer a book which are near to that particular book.

A memory hierarchy consists of multiple levels of memory with different speeds and sizes.

The faster memories are more expensive per bit than the slower memories and thus are

smaller.

A structure that uses multiple levels of memories; as the distance from the processor

increases, the size of the memories and the access time both increase.

Fig:1 The basic structure of memory hierarchy

Today, there are three primary technologies used in building memory hierarchies.
Main memory is implemented from DRAM (dynamic random access memory), while

levels closer to the processor (caches) use SRAM (static random access memory).

The third technology, used to implement the largest and slowest level in the hierarchy,

is usually magnetic disk. (Flash memory is used instead of disks in many embedded

devices)

DRAM is less costly per bit than SRAM, although it is substantially slower. The price

difference arises because DRAM uses significantly less area per bit of memory, and

DRAMs thus have larger capacity.

Because of the differences in cost and access time, it is advantageous to build

memory as a hierarchy of levels. In fig 1 it shows the faster memory is close to the

processor and the slower, less expensive memory is below it.

The upper level— the one closer to the processor— is smaller and faster than the lower

level, since the upper level uses technology that is more expensive.

The below fig shows that the minimum unit of information that can be either present

or not present in the two-level hierarchy is called a block or a line.

Fig 2 Every pair of levels in the memory hierarchy can be thought of as having an upper and

lower level. transfer an entire block when we copy something between levels.

Hit: If the data requested by the processor appears in some block in the upper level, this is

called a .

Miss: If the data is not found in the upper level, the request is called a .

The lower level in the hierarchy is then accessed to retrieve the block containing the

requested data.

The hit rate, or , is the fraction of memory accesses found in the upper level; it is

often used as a measure of the performance of the memory hierarchy.


The miss rate (1 hit rate) is the fraction of memory accesses not found in the upper level.

Since performance is the major reason for having a memory hierarchy, the time to service

hits and misses is important:

Hit time is the time to access the upper level of the memory hierarchy, which includes the

time needed to determine whether the access is a hit or a miss .

The miss penalty is the time to replace a block in the upper level with the corresponding

block from the lower level, plus the time to deliver this block to the processor .

Because the upper level is smaller and built using faster memory parts, the hit time will be

much smaller than the time to access the next level in the hierarchy, which is the major

component of the miss penalty.

Programs exhibit both temporal locality, the tendency to reuse recently accessed data

items, and spatial locality, the tendency to reference data items that are close to other

recently accessed items.

Memory hierarchies take advantage of temporal locality by keeping more recently

accessed data items closer to the processor. Memory hierarchies take advantage of

spatial locality by moving blocks consisting of multiple contiguous words in memory to

upper levels of the hierarchy.

Fig.3 This diagram shows the structure of a memory hierarchy: as the distance from the

processor increases, so does the size.

The above fig shows that a memory hierarchy uses smaller and faster memory

technologies close to the processor. Thus, accesses that hit in the highest level of the

hierarchy can be processed quickly. Accesses that miss go to lower levels of the

hierarchy, which are larger but slower.


If the hit rate is high enough, the memory hierarchy has an effective access time

close to that of the highest (and fastest) level and a size equal to that of the lowest

level.

In most systems, the memory is a true hierarchy, meaning that data cannot be present

in level unless it is also present in level + 1.

[Link] TECHNOLOGY

SRAM Technology

The first letter of SRAM stands for . SRAMs don’ t need to refresh and so the access

time is very close to the cycle time. SRAMs typically use six transistors per bit to prevent

the information from being disturbed when read.

The dynamic nature of the circuits in DRAM requires data to be written back after being

read— hence the difference between the access time and the cycle time as well as the

need to refresh.

SRAM needs only minimal power to retain the charge in standby mode. SRAM designs are

concerned with speed and capacity, while in DRAM designs the emphasis is on cost per bit

and capacity.

For memories designed in comparable technologies, the capacity of DRAMs is roughly 4–

8 times that of SRAMs. The cycle time of SRAMs is 8– 16 times faster than DRAMs, but

they are also 8– 16 times as expensive.

DRAM Technology

1) As early DRAMs grew in capacity, the cost of a package with all the necessary address

lines was an issue. The solution was to multiplex the address lines, thereby cutting the

number of address pins in half.

2) One-half of the address is sent first, called the (RAS). The other half of

the address, sent during the (CAS), follows it.

3) These names come from the internal chip organization, since the memory is organized as a

rectangular matrix addressed by rows and columns.


Memory controllers include hardware to refresh the DRAMs periodically. This requirement

means that the memory system is occasionally unavailable because it is sending a signal

telling every chip to refresh. The time for a refresh is typically a full memory access (RAS

and CAS) for each row of the DRAM. Since the memory matrix in a DRAM is conceptually

square, the number of steps min a refresh is usually the square root of the DRAM capacity.

DRAM designers try to keep time spent refreshing to less than 5% of the total time. So far

we have presented main memory as if it operated like a Swiss train, consistently delivering

the goods exactly according to schedule.

Although we have been talking about individual chips, DRAMs are commonly sold on small

boards called (DIMMs). DIMMs typically contain 4– 16 DRAMs,

and they are normally organized to be 8 bytes wide (+ ECC) for desktop systems.

Improving Memory Performance inside a DRAM Chip

To improve bandwidth, there has been a variety of evolutionary innovations over time.

The first was timing signals that allow repeated accesses to the row buffer without

another row access time, typically called fast page mode. Such a buffer comes naturally,

as each array will buffer 1024– 2048 bits for each access. Conventional DRAMs had an

asynchronous interface to the memory controller, and hence every transfer involved

overhead to synchronize with the controller.

The second major change was to add a clock signal to the DRAM interface, so that the

repeated transfers would not bear that overhead. Synchronous DRAM (SDRAM) is the

name of this optimization. SDRAMs typically also had a programmable register to hold the
number of bytes requested, and hence can send many bytes over several cycles per

request.

The third major DRAM innovation to increase bandwidth is to transfer data on both the

rising edge and falling edge of the DRAM clock signal, thereby doubling the peak data

rate. This optimization is called double data rate (DDR)

FLASH MEMORY

It’ s a type of Electrically Erasable Programmable Read-only Memory(EEPROM). most flash

products include a controller to spread the writes by remapping blocks that have been written

many times to less trodden blocks. This technique is called . With wear leveling,

personal mobile devices are very unlikely to exceed the write limits in the flash. S uch wear

levelling lowers the potential performance of flash, but it is needed unless higher-level

software monitors block wear.

DISK MEMORY

A magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400

to 15,000 revolutions per minute.

The metal platters are covered with magnetic recording material on both sides, similar to

the material found on a casstte or videotape.

To read and write information on a hard disk, a movable containing a small

electromagnetic coil called a is located just above each surface.

The entire drive is permanently sealed to control the environment inside the drive, which, in

turn, allows the disk heads to be much closer to the drive surface.

Each disk surface is divided into concentric circles, called tracks. There are typically tens

of thousands of tracks per surface.

Each track is in turn divided into sectors that contain the information; each track may have

thousands of sectors. S ectors are typically 512 to 4096 bytes in size.


Seek

The process of positioning a read/write head over the proper track on a disk.

Rotational latency

A lso called rotational delay. The time required for the desired sector of a disk to rotate

under the read/write head; usually assumed to be half the rotation time. The average latency

to the desired information is halfway around the disk. Disks rotate at 5400 RPM to 15,000

RPM. The average rotational latency at 5400 RPM is

Difference between SRAM and DRAM:

CACHE BASICS
Cache is one of the fastest and smallest level of the memory hierarchy between the

processor and main memory. It is built using SRAMS.

Figure below shows such a simple cache, before and after requesting a data item that is

not initially in the cache.

Before the request, the cache contains a collection of recent references X1, X2, … , Xn− 1

The processor requests a word Xn that is not in the cache. This request results in a miss,

and the word Xn is brought from memory into the cache.

The simplest way to assign a location in the cache for each word in memory is to assign

the cache location based on the address of the word in memory.

This cache structure is called direct mapped, since each memory location is mapped

directly to exactly one location in the cache.

For example, almost all direct-mapped caches use this mapping to find a block:

Thus, an 8- block cache uses the three lowest bits (8=23) of the block address.

For example, Figure below shows how the memory addresses between 1ten (00001two) and

29ten (11101two) map to locations 1ten (001two) and 5ten (101two) in a direct-mapped cache

of eight words.

A direct-mapped cache with eight entries showing the addresses of memory words

between 0 and 31 that map to the same cache locations


To know whether the data in the cache corresponds to a requested word we add a set of

tags to the cache.

The tags contain the address information required to identify whether a word in the cache

corresponds to the requested word.

The tag needs only to contain the upper portion of the address .Only have the upper 2 of

the 5 address bits in the tag.

Lower 3-bit index field of the address selects the block.

The most common method is to add a valid bit to indicate whether an entry contains a

valid address.

If the bit is not set, there cannot be a match for this block.

Accessing a Cache

A sequence of nine memory references to an empty eight-block cache, including the action

for each reference.

Figure below shows how the contents of the cache change on each miss.
• The index of a cache block, together with the tag contents of that block, uniquely specifies

the memory address of the word contained in the cache block.


• The total number of bits needed for a cache is a function of the cache size and the address

size, because the cache includes both the storage for the data and the tags.

Handling Cache Misses

• Control unit deals with cache misses. The control unit must detect a miss and process the

miss by fetching the requested data from memory

• If the cache reports a hit, the computer continues using the data as if nothing happened.

• If the data is not present in the cache then it is a miss. The cache miss handling is done in

collaboration with the processor control unit and with a separate controller that initiates

the memory access and refills the cache.

• The processing of a cache miss creates a pipeline stall as different to an interrupt, which

would require saving the state of all registers.

• To get the proper instruction into the cache, instruct the lower level in the memory

hierarchy to perform a read.

The steps to be taken on an instruction cache miss:

1. Send the original PC value (current PC – 4) to the memory.

2. Instruct main memory to perform a read and wait for the memory to complete its access.

3. Write the cache entry, putting the data from memory in the data portion of the entry, writing

the upper bits of the address (from the ALU) into the tag field, and turning the valid bit on.

4. Restart the instruction execution at the first step, which will refetch the instruction, this time

finding it in the cache.

Handling writes

• Writes work somewhat differently.

• Suppose on a store instruction, we wrote the data into only to the data cache (without

changing main memory)

• Then, after the write into the cache, memory would have a different value from that in the

cache. In such a case, the cache and memory are said to be

• the cache consistent is always to write the

data into both the memory and the cache. This scheme is called write-through.

• write buffer- A queue that holds data while the data is waiting to be written to memory. A

write buffer stores the data while it is waiting to be written to memory.


• After writing the data into the cache and into the write buffer, the processor can continue

execution.

• The alternative to a write-through scheme is a scheme called write-back.

• In a write back scheme, when a write occurs, the new value is written only to the block in

the cache. The modified block is written to the main memory only when it is replaced.

MEASURING AND IMPROVING CACHE PERFORMANCE

• To measure and analyze cache performance. Two different techniques for improving

cache performance.

• One focuses on reducing the miss rate by reducing the probability that two different

memory blocks will participate for the same cache location.

• The second technique reduces the miss penalty by adding an additional level to the

hierarchy. This technique, called

• Memory-stall clock cycles come primarily from cache misses. Stalls generated by reads

and writes can be quite complex.

• Memory-stall clock cycles can be defined as the sum of the stall cycles coming from reads

plus those coming from writes:

• The read-stall cycles can be defined in terms of the number of read accesses per

program, the miss penalty in clock cycles for a read,

• and the read miss rate:

• Writes are more complicated. For a write-through scheme, we have two sources of stalls:

• Write misses, which usually require that we fetch the block before continuing the write.

• buffer stalls, which occur when the write buffer is full when a write occurs.

• Cycles stalled for writes equals the sum of these two:


• Write-back schemes also have potential additional stalls arising from the need to write a

cache block back to memory when the block is replaced.

Calculating Cache Performance

Average memory access time (AMAT)

Average memory access time is the average time to access memory considering both hits

and misses and the frequency of different accesses.

Reducing Cache Misses by More Flexible Placement of Blocks

• Direct mapped cache : A block can go in exactly one place in the cache. There is a direct

mapping from any block address in memory to a single location in the upper level of the

hierarchy.

The position of memory block in direct mapping is given by

• Fully associative – It is a scheme where a block can be placed in cache.

Such a scheme is called fully associative, because a block in memory may be associated

with any entry in the cache. To find a given block in a fully associative cache, all the entries

in the cache must be searched because a block can be placed in any one.

• The middle range of designs between direct mapped and fully associative is called set

associative.

• Set-associative cache - there are a fixed number of locations where each block can be

placed. A set-associative cache with

set-associative cache.
• An which consists of

given by the index field,

and a block can be placed in

• placement combines direct-mapped placement and fully

associative placement: a block is directly mapped into a set, and then all the blocks in the

set are searched for a match.

The position of memory block in set associative mapping is given by

Figure below shows where block 12 may be placed in a cache with eight blocks total,

according to the three block placement policies. Varies for direct mapped, set-associative,

and fully associative placement.

 In direct-mapped placement, there is only one cache block where memory block 12 can be

found, and that block is given by (12 modulo 8)=4.

• In a two-way set-associative cache, there would be four sets, and memory block 12 must

be in set (12 mod 4)=0; the memory block could be in either element of the set.

• In a fully associative placement, the memory block for block address 12 can appear in any

of the eight cache blocks.

• An 8-block cache configured as direct-mapped, 2-way set associative, 4-way set

associative,&fully associative
VIRTUAL MEMORY:

The main memory can act as a “cache” for the secondary storage, usually implemented

with magnetic disks. This technique is called virtual memory.

Techniques that automatically move program and data blocks into the physical main

memory when they are required for execution is called the Virtual Memory.

Virtual memory implements the translation of a program’ s address space to physical

addresses. This translation process enforces protection of a program’ s address space

from other virtual machines.

The binary address that the processor issues either for instruction or data are called the

virtual / Logical address.

The virtual address is translated into physical address by a combination of hardware and

software components. This kind of address translation is done by MMU(Memory

Management Unit).

When the desired data are in the main memory , these data are fetched /accessed

immediately.

If the data are not in the main memory, the MMU causes the Operating system to bring the

data into memory from the disk. Transfer of data between disk and main memory is

performed using DMA scheme.

Fig:Virtual Memory Organisation


Address Translation:

In address translation, all programs and data are composed of fixed length units called

Pages.

The Page consists of a block of words that occupy contiguous locations in the main

memory.

The pages are commonly range from 2K to 16K bytes in length.

The cache bridge speed up the gap between main memory and secondary storage and it is

implemented in software techniques.

Each virtual address generated by the processor contains virtual Page number(Low order

bit) and offset(High order bit)

Virtual Page number+ Offset Specifies the location of a particular byte (or word) within a

page.

Page Table: It contains the information about the main memory address where the page is

stored & the current status of the page.

Page Frame: An area in the main memory that holds one page is called the page frame.

Page Table Base Register: It contains the starting address of the page table.

Virtual Page Number+Page Table Base register->Gives the address of the corresponding

entry in the page [Link])it gives the starting address of the page if that page currently

resides in memory.

Control Bits in Page Table:


The Control bits specifies the status of the page while it is in main memory.

Function:

 The control bit indicates the validity of the page ie)it checks whether the page is actually

loaded in the main memory.

 It also indicates that whether the page has been modified during its residency in the

memory;this information is needed to determine whether the page should be written back

to the disk before it is removed from the main memory to make room for another page.

 The Page table information is used by MMU for every read & write access.

 The Page table is placed in the main memory but a copy of the small portion of the page

table is located within MMU.

 This small portion or small cache is called Translation LookAside Buffer(TLB).

 This portion consists of the page table enteries that corresponds to the most recently

accessed pages and also contains the virtual address of the entry.

Fig:Virtual Memory Address Translation

I n virtual memory, the address is broken into a and a .

The figure below shows the translation of the virtual page number to a

The physical page number constitutes the upper portion of the physical address, while the

page offset, which is not changed, constitutes the lower portion.

The number of bits in the page offset field determines the page size.
In virtual memory systems, we locate pages by using a table that indexes the memory; this

structure is called a page table, and it resides in memory. Each program has its own page

table, which maps the virtual address space of that program to main memory.

A valid bit is used in each page table entry, If the bit is 0, the page is not present in main

memory and a page fault occurs. If the bit is 1, the page is in memory and the entry

contains the physical page number.

Page Faults

If the valid bit for a virtual page is 0, a page fault occurs. The operating system must be

given control.
The operating system gets control, and it must find the page in the next level of the

hierarchy (usually flash memory or magnetic disk) and decide where to place the

requested page in main memory.

The operating system usually creates the space on flash memory or disk for all the pages

of a process when it creates the process. This space is called the swap space.

Fig: Indicates a page fault and the data is brought from the disk strage

Making Address Translation Fast: the TLB - TRANSLATION-LOOKASIDE

BUFFER

Modern processors include a special cache that keeps track of recently used translations.

This special address translation cache is traditionally referred to as a translation-lookaside

buffer (TLB), although it would be more accurate to call it a translation cache.


On every reference, we look up the virtual page number in the TLB. If we get a hit, the

physical page number is used to form the address, and the corresponding reference bit is

turned on.

If the processor is performing a write, the dirty bit is set to 1.

If a miss in the TLB occurs, we must determine whether it is a page fault or merely a TLB

miss. If the page exists in memory, then the TLB miss indicates only that the translation is

missing.

The processor can handle the TLB miss by loading the translation from the page table into

the TLB and then trying the reference again.

If the page is not present in memory, the TLB miss indicates a true page fault. In this case,

the processor invokes the operating system using an exception.

TLB misses can be handled either in hardware or in software.

After a TLB miss occurs and the missing translation has been retrieved from the page

table, we will need to select a TLB entry to replace.

Because the reference and dirty bits are contained in the TLB entry, we need to copy these

bits back to the page table entry when we replace an entry.

Some systems use other techniques to approximate the reference and dirty bits,

eliminating the need to write into the TLB except to load a new table entry on a miss.

Some typical values for a TLB might be


■ TLB size: 16– 512 entries

■ Block size: 1– 2 page table entries (typically 4– 8 bytes each)

■ Hit time: 0.5– 1 clock cycle

■ Miss penalty: 10– 100 clock cycles

■ Miss rate: 0.01%– 1%

DIRECT MEMORY ACCESS

 A special control unit may be provided to allow the transfer of large block of data at high

speed directly between the external device and main memory, without continuous

intervention by the processor. This approach is called DMA.

 DMA transfers are performed by a control circuit called the DMA Controller.

DMA Controller.

 To initiate the transfer of a block of words , the processor sends,

i) Starting address

ii) Number of words in the block

iii)Direction of transfer.

 When a block of data is transferred , the DMA controller increment the memory address for

successive words and keep track of number of words and it also informs the processor by

raising an interrupt signal.

 While DMA control is taking place, the program requested the transfer cannot continue and

the processor can be used to execute another program.

 After DMA transfer is completed, the processor returns to the program that requested the

transfer.

Registers in a DMA Interface


R/W->Determines the direction of transfer

When R/W =1, DMA controller read data from memory to I/O device.

o R/W =0, DMA controller perform write operation.

o Done Flag=1, the controller has completed transferring a block of data and is ready to

receive another command.

o IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has completed

transferring the block of data.

o IRQ=1, it indicates that the controller has requested an interrupt

Use of DMA controllers in a computer system

 A DMA controller connects a high speed network to the computer bus,and the disk

controller for two disks, also has DMA capability and it provides two DMA channels.

 To start a DMA transfer of a block of data from main memory to one of the disks,the

program write’ s the address and the word count information into the registers of the

corresponding channel of the disk controller.

 When DMA transfer is completed, it will be recorded in status and control registers of the

DMA channel (ie) Done bit=IRQ=IE=1.

Cycle Stealing:

 Requests by DMA devices for using the bus are having higher priority than processor

requests .

 Top priority is given to high speed peripherals such as, Disk, High speed Network Interface

and Graphics display device.


 Since the processor originates most memory access cycles, the DMA controller can be

said to steal the memory cycles from the processor. This interviewing technique is called

Cycle stealing.

Burst Mode: The DMA controller may be given exclusive access to the main memory to

transfer a block of data without interruption. This is known as Burst/Block Mode.

Bus Master: The device that is allowed to initiate data transfers on the bus at any given time

is called the bus master

Bus Arbitration:

It is the process by which the next device to become the bus master is selected and the bus

mastership is transferred to it.

 Types: There are 2 approaches to bus arbitration. They are

i)Centralized arbitration ( A single bus arbiter performs arbitration)

ii)Distributed arbitration (all devices participate in the selection of next bus master).

Centralized Arbitration:

 Here the processor is the bus master and it may grants bus mastership to one of its DMA

controller.

 A DMA controller indicates that it needs to become the bus master by activating the Bus

Request line (BR) which is an open drain line.

 The signal on BR is the logical OR of the bus request from all devices connected to [Link]

BR is activated the processor activates the Bus Grant Signal (BGI) and indicated the DMA

controller that they may use the bus when it becomes free.

 This signal is connected to all devices using a daisy chain arrangement.

 If DMA requests the bus, it blocks the propagation of Grant Signal to other devices and it

indicates to all devices that it is using the bus by activating open collector line, Bus Busy

(BBSY).

A simple arrangement for bus arbitration using a daisy chain

.
Sequence of signals during transfer of bus mastership for the devices

 The timing diagram shows the sequence of events for the devices connected to the

processor is shown.

 DMA controller 2 requests and acquires bus mastership and later releases the bus.

 During its tenture as bus master, it may perform one or more data transfer.

 After it releases the bus, the processor resources bus mastership.

Distributed Arbitration:

It means that all devices waiting to use the bus have equal responsibility in carrying out the

arbitration process.

Fig:A distributed arbitration scheme


 Each device on the bus is assigned a 4 bit id. When one or more devices request the bus,

they assert the Start-Arbitration signal&place their 4 bit ID number on four open collector

lines, ARB0 to ARB3.

 A winner is selected as a result of the interaction among the signals transmitted over

these lines.

 The net outcome is that the code on the four lines represents the request that has the

highest ID number.

 The drivers are of open collector type. Hence, if the i/p to one driver is equal to 1, the i/p to

another driver connected to the same bus line is equal to „ 0‟ (ie. bus the is in low-voltage

state).

 Eg: Assume two devices A & B have their ID 5 (0101), 6(0110) and their code is 0111.

 Each devices compares the pattern on the arbitration line to its own ID starting from MSB.

 If it detects a difference at any bit position, it disables the drivers at that bit position. It

does this by placing „ 0‟ at the i/p of these drivers.

 In our eg. „ A‟ detects a difference in line ARB1, hence it disables the drivers on lines

ARB1 & ARB0. This causes the pattern on the arbitration line to change to 0110 which

means that „ B‟ has won the contention.

INTERRUPTS:

 An interrupt is an external event that causes the execution of one program to be

suspended and the execution of another program to begin.

 In program‐controlled I/O, when the processor continuously monitors the status of the

device , the processor will not perform any function.

 An alternate approach would be for the I/O device to alert the processor when it becomes

ready. – The Interrupt request line will send a hardware signal called the interrupt signal

to the processor. On receiving this signal, the processor will perform the useful function

during the waiting period.

 The routine executed in response to an interrupt request is called Interrupt Service

Routine. The interrupt resembles the subroutine calls. The interrupt request uses a line in

the bus called interrupt request line.

Fig:Transfer of control through the use of interrupts


 The processor first completes the execution of instruction i. Then it loads the PC(Program

Counter) with the address of the first instruction of the ISR.

 After the execution of ISR, the processor has to come back to instruction i + 1.

 Therefore, when an interrupt occurs, the current contents of PC which point to i +1 is put in

temporary storage in a known location.

 A return from interrupt instruction at the end of ISR reloads the PC from that temporary

storage location, causing the execution to resume at instruction i+1.

 When the processor is handling the interrupts, it must inform the device that its request

has been recognized so that it remove its interrupt requests signal.

 This may be accomplished by a special control signal called the interrupt acknowledge

signal.

 The task of saving and restoring the information can be done automatically by the

processor.

 The processor saves only the contents of program counter & status register (ie) it saves

only the minimal amount of information to maintain the integrity of the program execution.

 Saving registers also increases the delay between the time an interrupt request is received

and the start of the execution of the ISR. This delay is called the Interrupt Latency.

 Generally, the long interrupt latency in unacceptable. The concept of interrupts is used in

Operating System and in Control Applications, where processing of certain routines must

be accurately timed relative to external events. This application is also called as real-time

processing.

Interrupt Hardware:

Fig: An equivalent circuit for an open drain bus used to implement a common interrupts

request line.
 A single interrupt request line may be used to serve „ n‟ devices.

 All devices are connected to the line via switches to ground. To request an interrupt, a

device closes its associated switch, the voltage on INTR line drops to 0(zero).

 If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are open and

the voltage on INTR line is equal to Vdd.

 When a device requests an interrupts, the value of INTR is the logical OR of the requests

from individual devices.

(ie)INTR = INTR1+… … … … +INTRn

INTR->It is used to name the INTR signal on common line it is active in the low voltage state.

 Open collector (bipolar ckt) or Open drain (MOS circuits) is used to drive INTR line.

 The Output of the Open collector (or) Open drain control is equal to a switch to the ground

that is open when gates input is in „ 0‟ state and closed when the gates input is in „ 1‟

state.

 Resistor „ R‟ is called a pull-up resistor because it pulls the line voltage upto the high

voltage state when the switches are open.

Enabling and Disabling Interrupts:

 The arrival of an interrupt request from an external device causes the processor to suspend

the execution of one program & start the execution of another because the interrupt may

alter the sequence of events to be executed.

 INTR is active during the execution of Interrupt Service Routine.


 There are 3 mechanisms to solve the problem of infinite loop which occurs due to

successive interruptions of active INTR signals.

The following are the typical scenario.

 The device raises an interrupt request.

 The processor interrupts the program currently being executed.

 Interrupts are disabled by changing the control bits is PS (Processor Status register)

 The device is informed that its request has been recognized&in response, it deactivates the

INTR signal.

 The actions are enabled&execution of the interrupted program is resumed.

Edge-triggered:

 The processor has a special interrupt request line for which the interrupt handling circuit

responds only to the leading edge of the signal. Such a line said to be edge-triggered.

Handling Multiple Devices:

 When several devices requests interrupt at the same time, it raises some questions. They

are.

o How can the processor recognize the device requesting an interrupt?

o Given that the different devices are likely to require different ISR, how can the processor

obtain the starting address of the appropriate routines in each case?

o Should a device be allowed to interrupt the processor while another interrupt is being

serviced?

o How should two or more simultaneous interrupt requests be handled?

Polling Scheme:

 If two devices have activated the interrupt request line, the ISR for the selected device (first

device) will be completed & then the second request can be serviced.

 The simplest way to identify the interrupting device is to have the ISR polls all the

encountered with the IRQ bit set is the device to be serviced .

 IRQ (Interrupt Request) -> when a device raises an interrupt requests, the status register

IRQ is set to 1.

Merit:
 It is easy to implement.

Demerit:

 The time spent for interrogating the IRQ bits of all the devices that may not be requesting

any service.

Vectored Interrupt:

 Here the device requesting an interrupt may identify itself to the processor by sending a

special code over the bus & then the processor start executing the ISR.

 The code supplied by the processor indicates the starting address of the ISR for the device.

 The code length ranges from 4 to 8 bits. The location pointed to by the interrupting device

is used to store the staring address to ISR.

 The processor reads this address, called the interrupt vector & loads into PC.

 The interrupt vector also includes a new value for the Processor Status Register.

 When the processor is ready to receive the interrupt vector code, it activate the interrupt

acknowledge (INTA) line.

Interrupt Nesting:

Multiple Priority Scheme:

 In multiple level priority scheme, we assign a priority level to the processor that can be

changed under program control.

 The priority level of the processor is the priority of the program that is currently being

executed.

 The processor accepts interrupts only from devices that have priorities higher than its own.

 At the time the execution of an ISR for some device is started, the priority of the processor

is raised to that of the device.

 The action disables interrupts from devices at the same level of priority or lower.

Privileged Instruction:

 The processor priority is usually encoded in a few bits of the Processor Status word.

 It can also be changed by program instruction & then it is write into PS. These instructions

are called privileged instruction.

 This can be executed only when the processor is in supervisor mode.

 The processor is in supervisor mode only when executing OS routines. It switches to the

user mode before beginning to execute application program.


Privileged Exception:

 User program cannot accidently or intentionally change the priority of the processor &

disrupts the system operation.

 An attempt to execute a privileged instruction while in user mode, leads to a special type of

interrupt called the privileged exception.

Fig: Implementation of Interrupt Priority using individual Interrupt request acknowledge

lines

 Each of the interrupt request line is assigned a different priority level.

 Interrupt request received over these lines are sent to a priority arbitration circuit in the

processor.

 A request is accepted only if it has a higher priority level than that currently assigned to the

processor.

Simultaneous Requests:

Daisy Chain:
 The interrupt request line INTR is common to all devices.

 The interrupt acknowledge line INTA is connected in a daisy chain fashion such that INTA

signal propagates serially through the devices.

 When several devices raise an interrupt request, the INTR is activated&the processor

responds by setting INTA line to 1. this signal is received by device.

 Device1 passes the signal on to device2 only if it does not require any service.

 If devices1 has a pending request for interrupt blocks that INTA signal & proceeds to put

its identification code on the data lines. Therefore, the device that is electrically closest to

the processor has the highest priority.

Merits:

 It requires fewer wires than the individual connections.

Arrangement of Priority Groups:

 Here the devices are organized in groups & each group is connected at a different priority

level. Within a group, devices are connected in a daisy chain.

 At the devices end, an interrupt enable bit in a control register determines whether the

device is allowed to generate an interrupt requests.

 At the processor end, either an interrupt enable bit in the PS (Processor Status) or a priority

structure determines whether a given interrupt requests will be accepted.

Initiating the Interrupt Process:

Load the starting address of ISR in location INTVEC (vectored interrupt).


Load the address LINE in a memory location PNTR. The ISR will use this location as a

pointer to store the i/p characters in the memory.

Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1.

Exception of ISR:

Read the input characters from the keyboard input data register. This will cause the

interface circuits to remove its interrupt requests.

Store the characters in a memory location pointed to by PNTR & increment PNTR.

When the end of line is reached, disable keyboard interrupt & inform program main.

Return from interrupt.

STANDARD I/O INTERFACE

 A standard I/O Interface is required to fit the I/O device with an Interface circuit.

 The processor bus is the bus defined by the signals on the processor chip itself.

 The devices that require a very high speed connection to the processor such as the main

memory, may be connected directly to this bus.

 The bridge connects two buses, which translates the signals and protocols of one bus into

another.

 The bridge circuit introduces a small delay in data transfer between processor and the

devices.

 We have 3 Bus [Link] are,

PCI (Peripheral Component Inter Connect)

SCSI (Small Computer System Interface)

USB (Universal Serial Bus)

SCSI INTERFACE

SCSI is available in a variety of interfaces. The first, still very common, was parallel SCSI

(now also called SPI), which uses a parallel bus design.

SCSI interfaces have often been included on computers from various manufacturers for

use under Microsoft Windows, Mac OS, Unix, Commodore Amiga and Linux operating

systems, either implemented on the motherboard or by the means of plug-in adaptors.

Short for Small Computer System Interface, SCSI is pronounced as "Scuzzy" and is one of

the most commonly used interface for disk drives that was first completed in 1982.
SCSI-1 is the original SCSI standard developed back in 1986 as ANSI X3.131-1986. SCSI-1 is

capable of transferring up to eight bits a second.

SCSI-2 was approved in 1990, added new features such as Fast and Wide SCSI, and support

for additional devices.

SCSI-3 was approved in 1996 as ANSI X3.270-1996.

SCSI is a standard for parallel interfaces that transfers information at a rate of eight bits per

second and faster, which is faster than the average parallel interface. SCSI-2 and above

supports up to seven peripheral devices, such as a hard drive, CD-ROM, and scanner, that

can attach to a single SCSI port on a system's bus. SCSI ports were designed for Apple

Macintosh and Unix computers, but also can be used with PCs. Although SCSI has been

popular in the past, today many users are switching over to SATA drives.

SCSI connectors

The below illustrations are examples of some of the most commonly found and used SCSI

connectors on computers and devices and illustrations of each of these connections.

 SCSI is used for connecting additional devices both inside and outside the computer box.

 SCSI bus is a high speed parallel bus intended for devices such as disk and video display.

 SCSI refers to the standard bus which is defined by ANSI (American National Standard

Institute).

 SCSI bus the several options. It may be,

Narrow bus It has 8 data lines & transfers 1 byte at a time.

Wide bus It has 16 data lines & transfer 2 byte at a time.

Single-Ended Transmission Each signal uses separate wire.

HVD (High Voltage Differential) It was 5v (TTL cells)

LVD (Low Voltage Differential) It uses 3.3v

 Because of these various options, SCSI connector may have 50, 68 or 80 pins.

 The data transfer rate ranges from 5MB/s to 160MB/s 320Mb/s, 640MB/s.

 The transfer rate depends on,

Length of the cable

Number of devices connected.

 To achieve high transfer rat, the bus length should be 1.6m for SE signaling and 12m for

LVD signaling.
The SCSI bus us connected to the processor bus through the SCSI controller.

The data are stored on a disk in blocks called sectors.

Each sector contains several hundreds of bytes. These data will not be stored in

contiguous memory location.

 SCSI protocol is designed to retrieve the data in the first sector or any other selected

sectors. Using SCSI protocol, the burst of data are transferred at high speed.

 The controller connected to SCSI bus is of 2 types. They are,

Initiator

Target

Initiator:

It has the ability to select a particular target & to send commands specifying the operation to

be performed.

They are the controllers on the processor side.

Target:

The disk controller operates as a target.

It carries out the commands it receive from the initiator. The initiator establishes a logical

connection with the intended target.

Steps:

 Consider the disk read operation, it has the following sequence of events.

 The SCSI controller acting as initiator, contends process, it selects the target controller &

hands over control of the bus to it.

 The target starts an output operation, in response to this the initiator sends a command

specifying the required read operation.

 The target that it needs to perform a disk seek operation, sends a message to the initiator

indicating that it will temporarily suspends the connection between them.

 Then it releases the bus.

 The target controller sends a command to disk drive to move the read head to the first

sector involved in the requested read in a data buffer. When it is ready to begin transferring

data to initiator, the target requests control of the bus. After it wins arbitration, it reselects

the initiator controller, thus restoring the suspended connection.


 The target transfers the controls of the data buffer to the initiator & then suspends the

connection again. Data are transferred either 8 (or) 16 bits in parallel depending on the

width of the bus.

 As the initiator controller receives the data, if stores them into main memory using DMA

approach.

 The SCSI controller sends an interrupt to the processor to inform it that the requested

operation has been completed.

Bus Signals:-

 The bus has no address lines.

 Instead, it has data lines to identify the bus controllers involved in the selection /

reselection / arbitration process.

 For narrow bus, there are 8 possible controllers numbered from 0 to 7.

 For a wide bus, there are 16 controllers.

 Once a connection is established b/w two controllers, these is no further need for

addressing & the datalines are used to carry the data.

SCSI bus signals: Category Name Function

Data - DB (0) to DB (7) Datalines

- DB(P) Parity bit for data bus.

Phases - BSY Busy

- SEL Selection

Information type - C/D Control / Data

- MSG Message

Handshake - REQ Request

- ACK Acknowledge

Direction of transfer I/O Input / Output

Oter - ATN Attention

- RST Reset.
PCI:

 PCI defines an expansion bus on the motherboard.

 PCI is developed as a low cost bus that is truly processor independent.

 It supports high speed disk, graphics and video devices.

 PCI has plug and play capability for connecting I/O devices.

 To connect new devices, the user simply connects the device interface board to the bus.

Data Transfer:

 The data are transferred between cache and main memory is the bursts of several words

and they are stored in successive memory locations.

 When the processor specifies an address and request a „ read‟ operation from memory,

the memory responds by sending a sequence of data words starting at that address.

 During write operation, the processor sends the address followed by sequence of data

words to be written in successive memory locations.

 PCI supports read and write operation.

 A read / write operation involving a single word is treated as a burst of length one.

 PCI has three address spaces. They are

Memory address space

I/O address space

Configuration address space

 I/O address space → It is intended for use with processor

 Configuration space → It is intended to give PCI, its plug and play capability.

 PCI Bridge provides a separate physical connection to main memory.

 The master maintains the address information on the bus until data transfer is completed.

 At any time, only one device acts as bus master.

 A master is called „ initiator‟ in PCI which is either processor or DMA.

 The addressed device that responds to read and write commands is called a target.

 A complete transfer operation on the bus, involving an address and bust of data is called a

„ transaction’ .

Fig:Use of a PCI bus in a Computer system


USB – Universal Serial Bus

 USB is used for connecting additional devices both inside and outside the computer box.

 USB uses a serial transmission to suit the needs of equipment ranging from keyboard-

keyboard -to game control to internal connection.

 USB supports 3 speed of operation. They are,

Low speed (1.5Mb/s)

Full speed (12mb/s)

High speed ( 480mb/s)

 The USB has been designed to meet the key objectives. They are,

It provide a simple, low cost & easy to use interconnection s/w that overcomes the

difficulties due to the limited number of I/O ports available on a computer.

It accommodate a wide range of data transfer characteristics for I/O devices including

telephone & Internet connections.

Enhance user convenience through ‘Plug & Play’ mode of operation.

Port Limitation:-

 Normally the system has a few limited ports.

 To add new ports, the user must open the computer box to gain access to the internal

expansion bus & install a new interface card.

 The user may also need to know to configure the device & the s/w.

Merits of USB:-

 USB helps to add many devices to a computer system at any time without opening the

computer box.

Device Characteristics:-

 The kinds of devices that may be connected to a cptr cover a wide range of functionality.
 The speed, volume&timing constrains associated with data transfer to&from devices varies

significantly.

Eg:1 Keyboard ->Since the event of pressing a key is not synchronized to any other event in a

computer system, the data generated by keyboard are called asynchronous.

 The data generated from keyboard depends upon the speed of the human operator which

is about 100bytes/sec.

Plug & Play:-

 The main objective of USB is that it provides a plug & play capability.

 The plug & play feature enhances the connection of new device at any time, while the

system is operation.

 The system should,

Detect the existence of the new device automatically.

Identify the appropriate device driver s/w.

Establish the appropriate addresses.

Establish the logical connection for communication.

USB Architecture:-

 USB has a serial bus format which satisfies the low-cost & flexibility requirements.

 Clock & data information are encoded together & transmitted as a single signal.

 There are no limitations on clock frequency or distance arising form data skew, & hence it

is possible to provide a high data transfer bandwidth by using a high clock frequency.

 To accommodate a large no/. of devices that can be added / removed at any time, the USB

has the tree structure.

Fig:USB Tree Structure


 Each node of the tree has a device called „ hub‟ , which acts as an intermediate control

point b/w host & I/O devices.

 At the root of the tree, the „ root hub‟ connects the entire tree to the host computer.

 The leaves of the tree are the I/O devices being served.

You might also like