0% found this document useful (0 votes)

13 views50 pages

Computer Organization & Architecture Overview

The document covers the fundamentals of computer organization and architecture, detailing the distinctions between computer architecture and organization, as well as the evolution of computers and performance improvements. It discusses the structure and function of computer systems, including key components like the CPU, memory, and I/O systems, and introduces the concept of interconnection structures such as buses and point-to-point interconnects. Additionally, it outlines the characteristics of computer memory systems, including access methods and performance metrics.

Uploaded by

aaishaeduc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views50 pages

Computer Organization & Architecture Overview

Uploaded by

aaishaeduc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BSCS- 511 Computer Organization And Architecture

LECTURE – 01 (CHAPTER 01: INTRODUCTION)

COMPUTER ARCHITECTURE COMPUTER ORGANIZATION
It refers to those attributes that have a direct It refers to the organizational units and their
impact on the execution of a program. interconnection that realize the architectural
specifications.
It refers to those attributes of a system visible to The organizational attributes are those
a programmer. hardware features and details which are
transparent to the programmer.
Examples of architectural attributes: the Examples of organizational attributes:
instruction set, the number of bits used to interfaces between the computer and
represent various data types, I/O mechanisms peripherals, and the memory technology used.
and techniques for addressing memory.
It is an architectural design issue whether a It is an organizational issue whether that
computer will have multiply instruction. instruction will be implemented using a simple
multiply unit or by a mechanism that makes
repeated use of the add unit of the system.
An architecture may survive for long. An organization changes more frequently than
the architecture.

Evolution:
- Increase in processor size, memory size, I/O speed and capacity
- Decrease in component size
Ways to improve performance:
- By improvement in computer organization
- By heavy use of pipelining, use of parallel execution techniques, speculative execution
techniques etc.
COMPUTER STRUCTURE AND FUNCTION:
Structure: The way in which the components are
interrelated
Function: The operation of each individual component as
part of the structure.
Four main functions:
- Data processing
- Data storage
- Data movement
- Control

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Four main structural components:

- Central Processing Unit: Controls the operation of the computer and performs its data
processing functions.
- Main memory: Stores data.
- I/O: Moves the data between the computer and its external environment.
- System interconnection: Some mechanism that provides for communication among CPU,
main memory and I/O.
Major structural components of CPU:
- Control Unit: Controls the operation of the processor and hence the computer.
- Arithmetic and Logic Unit: Performs the computer’s data processing functions.
- Registers: Provides storage internal to the CPU.
- CPU interconnection: Some mechanism that provides for communication among the CU,
ALU, and registers.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 02 (CHAPTER 02: COMPUTER EVOLUTION AND PERFORMANCE)

In 1946, von Neumann and his colleagues began the design of a new stored program computer,
referred to as the IAS computer.
The IAS computer, although not completed until 1952, is the prototype of all subsequent general-
purpose computers.

The IAS computer consists of:

- A main memory, which stores both data and instructions in the same form.
- An arithmetic and logic unit capable of operating on binary data.
- A control unit, which interprets the instructions in memory and causes them to be executed.
- I/O equipment operated by the control unit.
- Interconnections.

Key points of the Von Neumann Architecture:

- Data and instructions are stored in a single read write memory.
- The contents of this memory is addressable by location without regard to the type of data it
contains.
- Execution occurs in a sequential order (unless explicitly modified) from one instruction to
the next.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

- Memory of IAS consists of 1000 storage locations, called words, of 40 binary digits each.
- Numbers are represented in binary form and each instruction is a binary code.
- Each number is represented by a sign bit and 39 bit value (Fig 2.2a)
- A word may also contain two 20-bit instructions, with each instruction consisting of 8 bit
operation code specifying the instruction to be performed and a 12 (total memory locations:
212 = 4096) bit designating one of the words in the memory (ranging from 0 to 999)

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Both the control unit and the ALU contain storage locations, called registers.
- Memory buffer registers (MBR): Contains a word to be sent to the memory or the I/O unit,
or is used to receive a word from the memory or the I/O unit.
- Memory address registers (MAR): Specifies the address in the memory of the word to
written from or read into the MBR.
- Instruction register (IR): Contains the 8-bit opcode instruction being executed.
- Instruction buffer register (IBR): Employed to store temporarily the right hand instruction
from a word in memory.
- Program counter (PC): Contains the address of the next instruction pair to be fetched from
the memory.
- Accumulator (AC) and multiplier quotient (MQ): Employed to hold the temporarily
operands and results of ALU operations.

• IAS operates by repetitively performing an instruction cycle.

• Each instruction cycle consists of two sub cycles: fetch cycle and execute cycle.

The IAS computer had a total of 21 instructions. These can be grouped as follows:
- Data transfer: Move data between memory and ALU registers or between two ALU
registers.
- Unconditional branch: Normally, the CU executes instruction in sequence from memory.
This sequence can be changed by a branch instruction, which facilitates repetitive operations.
- Arithmetic: Operations performed by the ALU.
- Address modify: Permits addresses to be computed in the ALU and then inserted into
instructions stored in memory. This allows a program considerable addressing flexibility.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 03 (CHAPTER 03: TOP LEVEL VIEW OF COMPUTER FUNCTION AND

INTERCONNECTION 3.1, 3.2 NOT INCLUDED)

- A computer consists of a set of modules of three basic types (memory, I/O, and processor)
that communicate with each other.
- The connection of paths connecting the various modules is called the interconnection
structure.
- Design of interconnection structure depends on the nature of exchanges that must be made
among modules.

Types of exchanges that are needed for each type of module:

- Memory module:
o Each memory module consists of N words of equal length.
o Each word is assigned a unique numerical address (0, 1, …, N-1)
o The nature of operation is indicated by the read and write control signals.
o The location of the operation is specified by an address.
o Data is to be written or read.
- I/O module:
o From internal point of view, the functionality of I/O module is similar to memory
module.
o Nature of operation  read and write

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

o Address  port address identifying the input and output devices.

o Data  to and from external devices
o Data  to and from microprocessor
o Interrupt  to the processor
- Processor:
o Reads in instruction and data
o Writes out data after processing
o Uses control signals to control the overall operation of the system.
o Receives interrupt from I/O ports.

BUS INTERCONNECTION:

- A bus is a communication pathway connecting one or more devices by a shared transmission

medium.
- In bus interconnection, only one device at a time can successfully transmit, others can only
receive.
- Typically, bus consists of multiple lines. Each line capable of transmitting a single bit.
- Several lines of a bus can be used to transmit binary digits in parallel.
- Computer system contains number of different buses to connect components at various
levels of computer system hierarchy.
- A bus that connects major computer components is called a system bus.
- The most common computer interconnection structures are based on the use of one or more
system buses.

Bus Structure:

- A typical bus consists of about 50 to 100 separate lines.

- Each line is assigned a particular meaning or function.
- The lines can be classified into three functional groups:
o Data lines: provide a path for moving data among system modules. (no. of lines =
width of data bus) The number of lines determine how many bits can be transferred
at a time.
o Address lines: used to designate the source or destination of the data on the data
bus. The width of the address bus determines the maximum possible memory
capacity of the system.
o Control lines: used to control the access to and the use of the data and address lines.
Control signals transmit both timing and command information among system
modules.
Example control signals: memory read, memory write, I/O read, I/O write, transfer
ACK, bus request, bus grant, interrupt request, interrupt ACK, clock, reset.

Operation of the bus:

If one module wishes to send data to another
- Obtain the use of the bus.
- Transfer data via the bus.
If one module wishers to request data from another
- Obtain the use of the bus
- Transfer a request to the other module over the appropriate control and address lines.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 04

ELEMENTS OF BUS DESIGN

Bus Types:
- Dedicated: it is permanently assigned to either one function or to a physical subset of
computer components.
- Multiplexed: method for using same lines for multiple purposes is called time multiplexing.
It uses fewer lines which results in less cost. It has complex circuitry within each module.
Events that share the same lines can’t take place in parallel.

Method of Arbitration:
- Centralized Arbitration: a single hardware device, called a bus controller or arbiter, is
responsible for allocating time on the bus.
- Distributed Arbitration: Each modules contains access control logic and the modules act
together to share the bus.

Timing:
- Synchronous Timing: the occurrence of events on bus is determined by a clock.
- Asynchronous Timing: the occurrence of one event on a bus follows and depends on the
occurrence of a previous event.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

POINT-TO-POINT INTERCONNECT

Compared to the shared bus structure, the ptp

interconnect has lower latency, higher data rate, and
better scalability.

Intel’s QuickPath Interconnect (QPI): 2008

- Multiple Direct Connections: multiple

components within the system have direct
connections to other components. No need
for arbitration.
- Layered Protocol Architecture: QPI
processor-layer interconnect use a layered
protocol architecture, rather than the simple
use of control signals.
- Packetized Data Transfer: Data are not sent
as a raw bit stream, rather sent as a sequence
of packets, each of which includes control
header and error control codes.

QPI is a four layered protocol architecture, including the following layers: (bottom to top)

- Physical: consists of wires carrying the

signals. The unit of transfer at the
Physical layer is 20 bits, which is called
Phit (physical unit)
- Link: responsible for reliable
transmission and flow control. The unit
of transfer at the Link layer is 80 bit,
called Flit (flow control unit)
- Routing: provides a framework for
directing packets through the fabric.
- Protocol: high level set of rules for
exchanging packets of data between
devices.

QPI PORT
- 84 individual links.
- Each path consists of pair of
wires that transmits bits at a
time.
- Pair is called lane.
- 20 data lanes in each direction.
- 20 bits in parallel in each
direction.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

PCI EXPRESS: (2003)

Peripheral Component Interconnect is a high-bandwidth, processor-independent bus that can

function as a peripheral or mezzanine bus.

PCIe, Peripheral Component Interconnect Express, as with QPI, is a ptp interconnect scheme
intended to replace bus-based schemes such as PCI.

Chipset, also called host bridge and root complex device:

- Connects the processor and memory subsystems to the PCIe switch fabric comprising one or
more PCIe and PCIe switch devices.
- Acts as a buffering device, to deal with the difference in data rates b/w I/O controllers, and
memory and processor components.
- Translates b/w PCIe transaction formats and the processor and the memory signal and
control requirements.

What types of transfers must a computer’s interconnection structure (e.g., bus) support?

Memory to processor: The processor reads an instruction or a unit of data from memory.
Processor to memory: The processor writes a unit of data to memory.
I/O to processor: The processor reads data from an I/O device via an I/O module.
Processor to I/O: The processor sends data to the I/O device.
I/O to or from memory: For these two cases, an I/O module is allowed to exchange data directly
with memory, without going through the processor, using direct memory access (DMA).

What is the benefit of using a multiple-bus architecture compared to a single-bus architecture?

With multiple buses, there are fewer devices per bus. This (1) reduces propagation delay, because
each bus can be shorter, and (2) reduces bottleneck effects.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 05 (CHAPTER 04: CACHE MEMORY)

CHARACTERISTICS OF COMPUTER MEMORY SYSTEMS

Location:

- INTERNAL: processor register, cache, and main memory.

- EXTERNAL: optical disks, magnetic disks, and tapes.

Capacity:

- Typically expressed in terms of bytes or words.

Unit of Transfer:

- For internal memory, the unit of transfer is equal to the number of electrical lines into and
out of the memory module. It may be equal to word length or larger.
- Number of bits read out or written into the memory at a time.
- For external memory, unit of transfer is usually referred as block.

Access Method:

- SEQUENTIAL ACCESS: Memory is organized into units of data, called record. Access
must be made in a specific linear sequence. Time to access an arbitrary record is highly
variable, depending on the shared read/write mechanism used to move the records from its
current location to the desire location. (tape units)

- DIRECT ACCESS: In direct access, individual blocks or records have a unique address
based on physical location. Access is accomplished by direct access to reach a block plus
sequential searching, counting, or waiting to reach the final location. (disk units)

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

- RANDOM ACCESS: Each addressable location in memory has a unique addressing

mechanism. Access time is constant. Any location can be selected at random and directly
addressed and accessed. (Main memory, and some cache systems)

- ASSOCIATIVE: A storage device in which location is identified by what is in it, rather than
by its position is known as associative access method. It is a random access type that enable
a comparison of desired information with the stored information. Each location has its own
addressing mechanism and retrieval time is constant independent of location or prior access
patterns. (Cache memories)

Performance:

- ACCESS TIME: (latency)

o For RAM, the time from the instant that an address is presented to the memory to
the instant that data have been stored.
o For non-RAM, the time it takes to position the read-write mechanism at the desired
location. Time required to read the record will depend on the length of the record
thus not included in access time.
 Access time depends on the following factors: (in non-RAM)
• Location of the information required.
• Current location of the storage system relative to the desired
information.

- MEMORY CYCLE TIME: (concerned with system bus) Applied only to RAM systems,
and consists of the access time plus any additional time required before a second access can
commence.
Cycle time = access time + transient time

- TRANSFER RATE: The rate at which data can be transferred into or out of the memory
unit.
For RAM = 1/cycle time

For non-RAM

TN = TA + n/R
TN = average time to read or write n bits.
TA = average access time.
n = number of bits.
R = Transfer rate, in bps

Physical Types:

- Semiconductor memory, magnetic surface memory, used for disk and tape, and optical and
magneto-optical.
- Access time and physical characteristics depend upon physical type.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Physical Characteristics:

- Volatile memory: information is lost when electrical power is switched off.

- Nonvolatile memory: information once recorded remains without deterioration until
deliberately changed.

LECTURE – 05 (CHAPTER 05: CACHE MEMORY)

The relation among the three key characteristics of computer memory is as follows:
- Faster access time, greater cost per bit.
- Greater capacity, smaller cost per bit.
- Greater capacity, slower access time.

As one goes down in the hierarchy, the following occurs:

- Decreasing cost per bit
- Increasing capacity
- Increasing access time
- Decreasing frequency of access of the memory by the processor

The basis for the validity of condition (d) is a principle known as locality of reference.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 06 (CHAPTER 05: CACHE MEMORY)

CACHE MEMORY PRINCIPLES

Cache memory is designed to combine the memory access time of expensive, high-speed memory
combined with large memory size of less expensive, lower-speed memory.

L2 cache is slower and typically

larger than L1 cache, and L3 cache
is slower and typically larger than L2
cache.

Clearly explain how the working of cache memory is related to some characteristics of large
program.
- Study of large program reveal that most of the execution time is spent in the execution of a
few routines (sub-sections). When the execution is localized within these routines, a number
of instructions are executed repeatedly, this property of programs is known as LOCALITY
OF REFERENCE
- Thus while some localized areas of the program are executed repeatedly, the other areas are
executed less frequently.
- To reduce the execution time, these most repeated segments may be replaced with a fast
memory known as Cache (Buffer) memory.
- The memory control circuitry is designed to take advantage of the property of locality of
reference.
- If a word in a block of memory is read, that block is transferred to one of the slots of the
cache.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Operation of Cache:
- When the processor attempts to read a word on the memory, a check is made to determine if
the word is present in the cache.
- If so, the word is delivered to the processor.
- If not, a block of memory, consisting of some fixed number of words, is read into the cache.
- Then the word is delivered from the cache to the processor.
- Cache includes tags to identify which block of main memory is in each cache slot.

Cache consists of m blocks, called lines. Each line

consists K words, plus a tag of a few bits.

A control bit is used to show whether the line has

been modified or not. The length of the line is
called the line size.

Main memory has 2n addressable words. Each

word having a unique n-bit address.

There are M = 2n/K blocks in main memory.

Read Address = RA

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

In the cache organization, cache connects to the processor via control, data and address lines.

The data and address lines also attach to data and address buffers, which attach to a system bus from
which main memory is reached.

During a cache hit, the data and address buffers are disabled and communication is only between
cache and the processor.

During a cache miss, the desired address is loaded onto the system bus and the data are returned
through the data buffer to both the cache and processor.

ELEMENTS OF CACHE DESIGN:

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Cache Size:
- Size of cache should be small enough so that the average cost per bit is close to that of main
memory and large enough so that the average access time is close to that of a cache alone.
- The larger the cache, the larger the number of gates involved in addressing the cache, so
large cache tend to be slower than the smaller ones.
- Number of lines is considerably less than the number of main memory block, m << M

Mapping Function:

- Because there are fewer cache line than main memory blocks, an algorithm is needed for
mapping main memory blocks into cache lines.
- A means is needed for determining which main memory block currently occupies a cache
line.
- The choice of mapping function dictates how the cache is organized
- Three techniques can be used:

Direct: It is the simplest technique which maps each block of main memory into only one possible
cache line. The mapping is expressed as
i = j modulo m
i = cache line number
j = main memory block number
m = number of line in cache
Mapping is implemented using main memory
address.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Summary:

- Address length = (s + w) bits

- Number of addressable units = 2(s + w) words or bytes
- Block size = line size = 2w words or bytes
- Number of block in main memory = 2(s + w) / 2w = 2s
- Number of lines in cache = m = 2r
- Size of cache = 2(r + w)
- Size of tag = (s – r) bits

- The use of a portion of the address as line number provides a unique mapping of each block
of main memory into the cache.
- When a block is actually read into its assigned line, it is necessary to tag the data to
distinguish it from other block that can fit in the same line. The most significant bits s-r serve
this purpose.
- Direct mapping is simple and inexpensive to implement.
- DISADVANTAGE: if a program reference words repeatedly from two different blocks that
map into the same line, then the block will be continuously swapped and the hit ratio will be
low, this continuous swapping is called thrashing
o Victims Cache: a small cache of 4 to 16 lines resided between a direct mapped L1
cache and the next level of memory is proposed to hold discarded (thrashed) data.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Associative:
- Each memory block can be loaded into any line of the cache.
- The cache control logic
interprets a memory
address simply as a tag
and a word field.
- The tag field uniquely
identifies a block of main
memory.
- To determine whether a
block is in the cache, it
simultaneously examines
every line’s tag for a
match.

Summary:
- Address length = (s + w) bits
- Number of addressable units = 2 (s + w) words or bytes
- Block size = line size = 2w words or bytes
- Number of block in main memory = 2(s + w) / 2w = 2s
- Number of lines in cache = m = undetermined from
address
- Size of tag = s bits

The disadvantage of associative method is the complex circuitry requited to examine the tag of all
cache line in parallel.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Set-Associative:
- The cache consists of a number of sets, each consisting a number of lines,
m=v*k
i = j module v
where,
i = cache set number
j = main memory block number
m = number of line in cache
v = number of sets
k = number of line in each set
This is also referred as k-way set associative mapping.

Block Bj can be mapped into any of the lines of set j.

Cache control logic interprets a memory address as three fields:
- Tag
- Set
- Word
Summary:
- Address length = (s + w) bits
- Number of addressable units = 2 (s + w) words or bytes
- Block size = line size = 2w words or bytesNumber of
block in main memory = 2(s + w) / 2w = 2s
- Number of lines in set = k
- Number of sets = v = 2d
- Number of lines in cache = m = kv = k * 2d
- Size of cache = k * 2d+w words or bytes
- Size of tag = (s – d) bits
s bits (of the tag and set field) specify one of the 2s blocks of main memory.
With k – way set associative mapping, the tag in memory address is smaller than the associative
method and is only compared to the k tags within a single set.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Replacement Algorithm:
- When Cache is full, a decision is to be made about which block should be removed to make
room for the other block from the main memory. Such a decision can potentially affect the
system performance. The Locality of Reference provide a clue to reasonable strategy.
- As sub-program stay for reasonable periods of time, it can be assume that a block which has
recently been referenced will also be referenced in near future.
- Thus the block that has stayed for long without being referenced should be removed. Such a
block is known as LEAST RECENTLY USED Block and the algorithm to determine such a
block is known as LRU Algorithm.
- A counter is used to keep track of LRU Block in Cache. For a 4 slots cache 2 bit counter is
used, for 8 slots cache 3 bits are required.

For a Read Operation:

In case of HIT (Content in Cache):
- Counter for the block referenced is set to ‘0’
- All other counters with value originally lower than referenced one are incremented by 1,
while all others remain unchanged.

In case of MISS (Content not in Cache)

If Cache is Not FULL:
- The counter associated with the new block loaded form Main Memory is reset to ‘0’.
- Values of all other counters are incremented by 1.

If Cache is FULL:
- The block with the largest count is removed.
- New block is put in its place.
- Its counter is reset to ‘0’
- The remaining counters are incremented by 1

Other Algorithms are Least Frequently Used (LFU), First in First out (FIFO) and Random
(selecting line to be replaced at random).

Based on LOR chances are there that the block referenced most frequently is referenced next and the
block which is less frequently used may not be referenced again so it can be removed (Least
Frequently Used). It is also implemented by use of a counter. This algorithm is also relatively good.
While tests have shown that Random technique is best while FIFO has shown poor performance.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Write Policy:
For a write operation to a word in cache:
WRITE THROUGH POLICY:
- Main memory and cache are updated simultaneously.
- Main memory is always updated (valid)
- Disadvantage:
o Generates substantial memory traffic
o May create a bottleneck
WRITE BACK POLICY:
- Updates only the cache and mark the location with a flag called dirty bit or use bit.
- When the block is replaced, it is written back to the main memory if and only if the dirty bit
is set.
- Disadvantage:
o Complex circuitry
o Creates bottleneck

In a bus organization with more than one device having their separate caches but shared memory
the task of keeping memory valid becomes more complex.

A system that keep all caches and the main memory valid is said to maintain cache coherency.

Techniques for cache coherency include the following:

- Bus watching with write through: Each cache controller monitors the address lines for each
controller for write operation.
- Hardware transparency: additional hardware is used to ensure all updates in main memory
via caches are reflected in all caches.
- Non Cacheable memory: Only a portion of main memory is shared by more than one
processor and this is designated as non-cacheable.

Line Size:
- As the block size increases from small to large, hit ratio increases because of principle of
locality.
- Principle of locality: data in the vicinity of a referenced word are likely to be referenced in
the near future.
- Two specific effects come into play:
o Larger number of blocks reduce the number of blocks in the cache. Small number of
block result in data being overwritten shortly after they are fetched.
o As a block becomes larger, each additional word is farther from the requested word
and therefore likely to be needed in the future.
- The relation between line size and hit ratio is complex and no optimum value has been
found.
- 8 to 64 bytes of line size seems optimum in normal computers.
- In High Processing Computers, 64 to 128 bytes cache size are frequently used.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Number of Caches:
Level of caches
Unified vs split cache

MULTILEVEL CACHE:

- If there is no L2 (Off chip) cache and the processor makes an access request for a memory
location not in L1 cache, then the processor must access the DRAM or ROM across the bus,
resulting in poor performance.
- If an L2 (SRAM) cache is used with speed matching bus speed, then data can be accessed
using a zero-wait state
- Using multilevel cache complicate all of the design issues related to cache, including size,
replacement algorithm and write policy.
- Results show that L2 cache with at least double the size of L1 cache improve hit ratio.

UNIFIED VS SPLIT CACHE:

Benefits of Unified Cache:

- Higher hit rate than split cache.
- Only one cache needs to be designed and implemented.
Benefits of Split Cache:
- It is used for parallel instruction execution for prefetching of predicted future instructions.
- It eliminates contention for the cache between the instructions fetch/decode unit and the
execution unit.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 07 (Arithmetic and Logic Unit)

- The ALU is that part of the computer that actually performs arithmetic and logical
operations on data.
- ALU is based on the use of simple digital logic devices that can store binary digits and
perform simple Boolean logic operations.
- Operands for arithmetic and logic operations are presented to the ALU in registers, and the
results of an operation are stored in registers.
- These registers are temporary storage locations within the processor that are connected by
signal paths to the ALU.

- The ALU may also

set flags as the result of an operation.
- The flag values are also stored in registers within the processor.

INTEGER: (all whole numbers)

If an n-bit sequence of binary digits is interpreted as an unsigned integer A:

Sign Magnitude Representation:

- Treat MSB (most significant bit), left most, in the word as sign bit.
- If the sign bit is 0, the number is positive. If the sign bit is 1, the number is negative.
- In sign magnitude the right most bit, n-1 bits, in a word of n bit holds the magnitude of the
integer.
Example:
1101 = -5
00110101 = +53
Drawbacks:
- Dual representation of ‘0’
o +0 = 00000000
o -0 = 10000000
- Addition and subtraction require consideration of both sign and magnitude of the number.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Two’s Complement Representation:

- The MSB represents the sign bit, as well as weight.

To get -5, we should apply two’s complement operation on the binary representation of +5.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

- +8, -8 is out of range of the register, that’s why it’s not mentioned. The register is 4 bit and
can 24 values stored in it.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Negation Rules:
- Take the complement of each bit of integer (including the sign bit)
- Treating the result an unsigned integer add 1.
Special cases:
- Negation of 0 is 0, there is a carry out of the MSB position which is ignored.
- If we take negation of a bit pattern of 1 followed by all zeroes, we get back the same result.

Addition:
- Addition proceeds as if the two numbers are unsigned integers.
- Ignore the carry if it occurs.
- If result is larger than that can be held in the word size, it is said to be overflow condition.
ALU must signal this fact so that no attempt is made to use the result.
- OVERFLOW RULE: if two numbers are added, and they are either positive or negative,
then overflow occurs if and only if the result has the opposite sign. Overflow can occur
whether or not there is a carry.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Subtraction:
To subtract A from B, take the two’s complement of B and add it in A.

Multiplication:
Unsigned Integers:

- If Q right most bit is 1

o A+M
o Shift right
- If Q right most bit is 0
o Shift right

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Two’s complement Multiplication:

Two’s Complement Division:

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Restoring Division Algorithm is used for unsigned binary division

- The algorithm assumes that the divisor V and the dividend D are both positive:
o |V| < |D|
o |V| = |D| then Q = 1 and R = 0
o |V| > |D| then Q = 0 and R = D
- To do Twos Complement Division we need to convert the operands into unsigned values
and at the end to account for the signs do complementation where needed:
o Sign of Remainder Sign ( R ) = Sign ( D )
o Sign of Quotient Sign ( Q ) = Sign ( D ) x Sign ( V )
- For example: -17/5: R=2, Q = 3
o We save the negative sign and divide 17 by 5
o The result is then changed according to the signs
o R = 2 will be negated, two’s complement of 2. 0010  1110
o Also take two’s complement of Q if the signs of both D and V are opposite.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

FLOATING POINT NUMBERS: (all except integers)

The term floating point is derived from the fact that there is no fixed number of digits before and
after the radix point; that is, the decimal point can float.

In general, floating point representations are slower and less accurate than fixed point
representation, but they can handle a larger range of numbers.

- Floating point numbers can be represented in scientific notation.

- We can represent a number in the following form:

Sign = plus or minus

Significand S
Exponent E

- If radix is moved to the left, add in the exponent value

- If radix is moved to the right, subtract in the exponent value.

Base B is fixed and need not to be stored repeatedly.

The exponent is stored in biased representation.

A fixed representation, called the bias, is subtracted from the field to get the true exponent value.
Bias = 2(k-1) – 1, k is number of bits in binary exponent.

For example, if the exponent field is of 8 bits then 2(8-1) – 1 = 127. Now every value saved in this
exponent field will be saved with additional 127, like 00000000 + 127 = 01111111.

If the value saved in exponent field is 00000000, then the actual value will be -127.

In 8 bit bias representation, a range of -127 to +127 can be represented.

Advantage of bias representation is that non negative floating point number can be treated as integer
for comparison purpose.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Floating point numbers are stored after normalization. Normalization is moving the radix point to
the right of the most significant 1.

1 is not saved as it is the same for all.

SUMMARY:
- The sign is stored in the first bit of the word.
- The first bit of the true significand is always 1 and need not to be stored in the significand
field.
- The value of bias (127 in case of bit exponent) is added to the true exponent to be stored in
the exponent field.
- The base is 2 and need not to be stored.

In a 32-bit word length, of 23 bit significand, 8 bit for biased exponent and 1 bit for sign, 232 different
numbers can be represented in the range:
Negative numbers: - (2 – 2-23) x 2128 and -2– 127
Positive Numbers = +2– 127 and + (2 - 2– 23) x 2128

Floating point number have a large range because their numbers are at different distances from each
other.

Range and Precision:

As the magnitude increases, the distance between them also increases.

- A large exponent yields a large range.
- Precision increases by increasing bits for significand.
- A large exponent base gives a greater range at the expense of less precision.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Floating Point Representation in IEEE 754 Format:

- An exponent of zero together with a fraction of zero represents positive or negative zero
(depending on sign bit)
0 0000 0000 0000 0000 0000 0000 0000 000
1 0000 0000 0000 0000 0000 0000 0000 000
- An exponent of all ones together with fraction of zero represent positive and negative
infinity.
0 1111 1111 0000 0000 0000 0000 0000 000 = +infinity
1 1111 1111 0000 0000 0000 0000 0000 000 = -infinity
In floating point overflow should be treated as an error or represented as infinity.
- An exponent of zero together with a fraction of non-zero fraction represent as a subnormal
number. (bit to the left of radix point is 0 and true exponent is -126)
0 0000 0000 0000 0000 0000 0000 0000 0010
- An exponent of all one together with a non-zero fraction is the value NaN (not a number),
used to represent exception conditions.

Floating Point Arithmetic:

For addition and subtraction:
Both operand must have the same exponent value
Alignment is the process of making the exponent values same.

For Multiplication and Division:

In multiplication, exponent values of both operands is added and simple multiplication is performed
on the significand part.

In division, exponent values of both operands is subtracted and simple division is performed on the
significand part.

This is double biased, that’s why 127 is

subtracted from it in multiplication and
127 will be added in division.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Four basic phases of the algorithm for addition and subtraction:

1. Check for zeroes.
2. Align the significands (small power is made equivalent to the bigger power by moving the
radix point)
3. Add or subtract the significands.
4. Normalize the result.

GUARD BITS: Registers in ALU contain additional bits for holding implied bit and is used to pad
out the right end of the significand with 0s.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Special conditions in floating point arithmetic:

- Exponent overflow  positive or negative infinity. (greater or equal than 128)
- Exponent underflow  0 (subnormal)
- Significand underflow  in the process of aligning the significand digits may flow off the
right end of the significand. (significand part becomes 0, then we use the guard bits)
- Significand overflow  addition of 2 same sign significand may result in a carry out of MSB.
This can be fixed by realignment.

64 bit pattern

LECTURE – 08 (Instruction Set Handout)

MACHINE INSTRUCTION CHARACTERISTICS

The operation of the processor is determined by the instructions it execute, referred to as machine
instructions or computer instructions. The collection of different instruction executed by a processor
is referred to as processor’s instruction set.

Elements of Machine Instruction:

Each instruction must contain information required by the processor for execution.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

The elements are as follows:

Operation Code: it specifies the operation to be performed. The operation is specified by a binary
code, called the operation code or opcode.
Source Operand Reference: the operation may involve one or more source operands, that is,
operands that are inputs for the operation.
Result Operand Reference: the operation may produce a result.
Next Instruction Reference: this tells the processor where to fetch the next instruction after the
execution of this instruction is complete.

Source or result operand reference can be in the following areas:

- Main or Virtual memory.

- CPU Registers
- I/O Devices
- Immediate

From Designer’s POV:

Machine instruction set provide the fundamental requirement of the processor.
Implementing the processor is implementing the machine instruction set.

From Program’s POV:

Become aware of registers and memory structure, types of data directly supported by machine, and
functioning of ALU.

In most cases, the next instruction immediately follows the current instruction, if a program counter
is used, there is no need for explicit reference to the next instruction.

Instruction Representation:
Instruction is represented by a sequence of bits.
It is divided in fields, corresponding to the elements of instruction.

OPCODE Source Operand Result Operand Next Instruction

References References References
1, 2 3 4

- Most instruction sets use more than one format.

- Processor extracts data from various instruction field once it is in the Instruction Register.
- In common practice, instructions are represented by symbolic representation called
mnemonics. Like, ADD, SUB, MUL, DIV, STOR, LOAD, etc.
- A machine language instruction express operations in a basic form involving movement of
data to or from registers. Like, ADD x, y.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Instruction Types:
Machine instruction set must be efficient to express any of the instructions from a high level
language.

- Data Processing: Arithmetic and logic instructions.

- Data Storage: Movement of data into or out of the registers or the memory locations.
- Data Movement: I/O instructions.
- Control: Test and Branch instructions

- Arithmetic instructions provide computational capabilities for processing numeric data.

- Logic instructions operate on the bits of a word as bits rather than as numbers, it processes
any kind of data.
- Memory instruction are used for moving data between the memory and the registers.
- I/O instructions are needed to transfer programs and data into the memory and the results of
computation back out to the user.
- Test instructions are used to test the value of a data word or the status of a computation.
- Branch instructions are used to branch a different set of instructions depending on the
decision made.

Number of Addresses:
- The processor architecture is typically described in terms of number of addresses contained
in each instruction. It is not common today.
- Present day processor don’t need all of the four addresses, explicitly.
- One, two or three address instruction may be used, the address of the next instruction is
provided implicitly by the program counter.
- Most CPU designs involve a variety of instruction format.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

What does less addresses mean?

- More basic instructions and less complex design of the CPU.
- Instruction word is shorter.
- More instructions mean longer program and longer execution time.
- Program design is also complex.

Most computers employ a mixture of instruction formats in terms of number of explicit addresses.
Number of addresses per instruction is a basic design decision.
Zero-address instructions are applicable to a special memory organization called a stack.

Instruction Set Design:

The most analyzed and most interesting aspect of the computer design is the instruction set design.
The design of the instruction set is very complex since it affects many aspects of the computer
system.

The instruction set defines many of the functions performed by the processor and has a significant
effect on the implementation of the processor. The instruction set is the programmer’s means of
controlling the processor. Therefore, programmer’s requirement must be considered in designing the
instruction set.

The most fundamental design issues that still remain in dispute are:
OPERATION REPERTOIRE: how many and which operation to provide and how complex should
each operation be.
DATA TYPES: the various types of data on which operations are performed.
INSTRUCTION FORMAT: instruction length, number of addresses, size of various fields and so
on.
REGISTERS: number of instruction registers that can be referred by the instruction and their use.
ADDRESSING: the mode or modes by which the address of an operand is specified.

The instruction format is part of the instruction set design and is also complex as it includes:
- Specification of total number of bits (instruction length)
- Number of addresses in the instruction.
- Specification of size of various fields in the instruction word.

Types of Operand:
Machine instructions operate on data. The most general categories of data are:
- Address (they are treated as data  unsigned integers)
- Numbers (limited in terms of magnitude and precision)
- Characters
- Logical Data

NUMBERS:
Three types of numerical data is common in computers
- Binary integer or binary fixed point
- Binary floating point.
- Decimal
CHARACTERS:
The most commonly used character code is International Reference Alphabet (IRA), referred to as
American Standard Code for Information Interchange (ASCII). Each code is a unique 7 bit address.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LOGICAL DATA:
Each word or other addressable unit is treated as a single unit of data.

Arm data Types:

- ARM processors support data types of 8 (bytes), 16 (halfword), and 32 (words) bits in length.
- Halfword access should be halfword aligned and word accesses should be word aligned.
- For nonaligned access attempt, the architecture support three alternatives:
o Default case:
 Treated as truncated.
 Bit [1:0] treated as zero for word.
 Bit [0] treated as zero for halfword.
 Load single word instructions rotate right word aligned data transferred by
non word-aligned address one, two or three bytes.
o Alignment checking: when the appropriate control bit is set, a data abort signal
indicates an alignment fault for attempting unaligned access.
o Unaligned access: the processor uses one or more memory accesses to generate the
required transfer of adjacent bytes transparently to the programmer.
- Unsigned integer interpretation is supported for all types.
- All three data types can also be used for twos complement signed integer.
- Majority of ARM processor implementation do not provide floating point hardware, which
saves power and area.
- A floating point arithmetic is implemented in software, if required.
- ARM supports an optional floating point coprocessor that supports single and double
precision floating point data types defined in IEEE 754.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Types of Operations:
Number of different opcodes varies from machine to machine.
General type of operations found on all machines are:
- DATA TRANSFER: move, store, load, exchange, clear, set, push, pop.
- ARITHMETIC: add, subtract, multiply, divide, absolute, negate, increment, decrement.
- LOGICAL: AND, OR, NOT, Exclusive-OR, test, compare, shift, rotate, set control
variables.
- CONVERSION: translate, convert (e.g. binary to decimal) .
- INPUT/OUTPUT: input, output, test I/O, start I/O
- SYSTEM CONTROL
- TRANFER OF CONTROL: jump, jump conditional, jump to subroutine, return, execute,
skip, skip conditional, halt, wait, no operation.

DATA TRANSFER:
A data transfer operation must specify:
- Location of source and destination operands
- Length of data to be transferred.
- Addressing mode for each operand.

Data transfer operation occurs in the following steps:

- Calculate memory address, based on the address mode.
- Determine whether the addressed item is in the cache.
- If not, issue a command to the memory module.

ARITHMETIC:
- Provided for:
o Signed integer numbers.
o Floating point
o Packed decimal numbers.
- Single operand instructions:
o Absolute: |A|
o Negate: -A
o Increment: A++ (add 1 to the operand)
o Decrement: A-- (subtract one from the operand)

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LOGICAL:

INPUT/OUTPUT:
- There are a variety of approaches taken to execute I/O operations, including:
o Isolated programmed I/O
o Memory-mapped programmed I/O (using data movement instructions)
o DMA (separate controller)
o Use of an I/O processor.

SYSTEM CONTROL:
- Only execute when the processor is in privileged state (kernel mode).
- Instructions for the use of OS.
- Example
o A system control instruction may read or alter a control register.
o To read or modify a storage protection key.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

TRANSFER OF CONTROL:
- Branch: (Jump)
o Conditional: branch to x if result is positive.
o Unconditional

- Skip:
o Increment and skip if 0
- Procedure Call: (Subroutine Call)
o A procedure is a self-contained computer program that is incorporated into a larger
program.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

ADDRESSING MODES:
Most common addressing modes:
Immediate: the operand value is present in the
instruction.
Operand = A

Direct: the address field contains the effective address of

the operand
EA = A

Indirect: the address field refer to the address of a word

in memory, which in turn contains the full-length
address of the operand.
EA = (A)

Register: the address field refers to a register, in which

the operand value is contained.
EA = R

Register Indirect: the address field refers to the address

of a register, which in turn contains the full length
address of the operand.
EA = (R)

Displacement: combines the capabilities of direct and

Stack: a form of implied addressing. Machine

instructions need not include a memory reference but
implicitly operate on the top of the stack.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

LECTURE – 09 (INSTRUCTION CYCLE)

The processor must be capable of doing the following things:
- Fetch Instruction: reads instruction from the memory.
- Interpret Instruction: instruction is decoded to determine which action is required.
- Fetch Data: execution of instruction may require reading of data from memory or an I/O
module.
- Process Data: execution of instruction may require performing some arithmetic or logical
operation on data.
- Write Data: the results of an execution may require writing data to memory or I/O module.

System performance depends upon:

- Smaller clock cycle, better system performance.
- The lesser the memory access the, the better the performance.

INSTRUCTION CYCLE:
The instruction cycle includes the following stages:
- Fetch: read the instruction from the memory into the processor.
- Execute: interpret the opcode and perform the indicated operation.
- Interrupt: if interrupts are enabled and an interrupt has occurred, save the current process
state and perform the indicated operation.

The Indirect Cycle: This is an additional stage. The execution of an instruction may involve one or
more operands in memory, each of which requires a memory access.
If indirect addressing is used, then additional memory accesses are required. We can think of
fetching of indirect addresses as one more instruction stage.
The main line of activity consists of alternating instruction fetch and instruction execution activities.
After an instruction is fetched, it is examined to determine if any indirect addressing is involved. If
so, the required operands are fetched using indirect addressing.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

Data Flow:
- It depend on the processor design.
- For instruction fetch:
o PC contains address of next instruction
o Address is moved to MAR.
o Address is then placed on the address bus.
o Control unit requests memory read signal.
o Result is placed on data bus, then copied to
MBR and then to IR.
o Meanwhile PC is incremented by 1.

- For data fetch:

o IR is examined.
o If indirect addressing, then indirect cycle is
performed,
 Right most N bits of MBR transferred to
MAR.
 Control unit request memory read signal.
 Result (address of operand) is moved to
MBR.
- For interrupt:
o Simple
o Predictable
o Current contents of the PC must be saved so that the processor can resume normal
activity after interrupt.
o Contents of PC copied to MBR.
o Special memory location (stack pointer) is loaded into MAR from the CU.
o Content of MBR is written onto memory.
o PC is loaded with the address of the interrupt handling routine.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

o Next instruction cycle begins by fetching the appropriate instruction.

INSTRUCTION PIPELINING:
In instruction pipelining, instruction execution cycle is perceived as being divided into a number of
stages, where new instruction are accepted at one end before previously accepted instruction
completes itself and appears as outputs at the other end.

An instruction can be divided into two stages: fetch instruction and execute instruction. There is
time during the execution of instruction when main memory is not being accessed. This time can be
used to fetch the next instruction in parallel with the execution of the current one.

The pipeline has two independent stages:

- The first stage fetches instruction and buffers it.
- When the second stage is free, the first stage passes it the buffered instruction.
- While the second stage is executing the instruction, the first stage takes advantage of any
unused memory cycles to fetch and buffer the next instruction. This is called instruction pre-
fetch or fetch overlap.

Pipelining requires registers to store data between stages.

This process will speed up instruction execution. If the fetch and execute stages were of equal
duration, the instruction cycle time would be halved. However, this doubling of execution rate is
unlikely for two reasons:

1. The execution time will generally be longer than the fetch time. Execution will involve reading
and storing operands and the performance of some operation. Thus, the fetch stage may have to wait
for some time before it can empty its buffer.
2. A conditional branch instruction makes the address of the next instruction to be fetched unknown.
Thus, the fetch stage must wait until it receives the next instruction address from the execute stage.
The execute stage may then have to wait while the next instruction is fetched.

Guessing can reduce the time loss from the 2nd reason.

A simple rule is the following: When a conditional branch instruction is passed on from the fetch to
the execute stage, the fetch stage fetches the next instruction in memory after the branch instruction.
Then, if the branch is not taken, no time is lost.

If the branch is taken, the fetched instruction must be discarded and a new instruction fetched.

To gain further speedup, the pipeline must have more stages. A suggestive decomposition of the
instruction processing:

- Fetch instruction (FI): Read the next expected instruction into a buffer.
- Decode instruction (DI): Determine the opcode and the operand specifiers.
- Calculate operands (CO): Calculate the effective address of each source operand. This may
involve displacement, register indirect, indirect, or other forms of address calculation.
- Fetch operands (FO): Fetch each operand from memory. Operands in registers need not be
fetched.
- Execute instruction (EI): Perform the indicated operation and store the result, if any, in the
specified destination operand location.
- Write operand (WO): Store the result in memory.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

With this decomposition, the various stages will be of more nearly equal duration. Using this
assumption, a six-stage pipeline can reduce the execution time for 9 instructions from 54 time units
to 14 time units.
Using this assumption, we can see that a six-stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.

Assumptions:

1) We assumes that each instruction goes through all six stages of the pipeline. This will not always
be the case.
For example, a load instruction does not need the Write Operand WO stage. However, to simplify
the pipeline hardware, the timing is set up assuming that each instruction requires all six stages.

2) We assumes that all of the stages can be performed in parallel and there are no memory conflicts.
For example, the FI, FO, and WO stages involve a memory access. We assume that all these
accesses can occur simultaneously. Most memory systems will not permit that. However, the desired
value may be in cache, or the FO or WO stage may be null. Thus, much of the time, memory
conflicts will not slow down the pipeline.

Several other factors serve to limit the performance enhancement.

If all the stages are not of equal duration, there will be some waiting involved at various pipeline
stages.

In case of a conditional branch instruction, several instruction fetches can be invalidated.

An interrupt can also invalidate several instruction fetches.

Other problems arise that did not appear in our simple two-stage organization. The CO stage may
depend on the contents of a register that could be altered by a previous instruction that is still in the
pipeline. Other such register and memory conflicts could occur. The system must contain logic to
account for this type of conflict.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

It appears that the greater the number of stages in the pipeline, the faster the execution rate.

Instruction pipelining is a powerful technique for enhancing performance but requires careful design
to achieve optimum results with reasonable complexity.

PIPELINE HAZARD

A pipeline hazard occurs when the pipeline, or some portion of the pipeline, must stall because
conditions do not permit continued execution. Such a pipeline stall is also referred to as a pipeline
bubble. There are three types of hazards: Resource, Data, and Control.

Resource Hazards:
A resource hazard occurs when two (or more) instructions that are already in the pipeline need the
same resource.
The result is that the instructions must be executed in serial rather than parallel for a portion of the
pipeline.
A resource hazard is sometime referred to as a structural hazard.

Assume a simplified five-stage pipeline, in which each stage takes one clock cycle. Now assume that
main memory has a single port and that all instruction fetches and data reads and writes must be
performed one at a time. Further, ignore the cache. In this case, an operand read to or write from
memory cannot be performed in parallel with an instruction fetch. Therefore, the fetch instruction
stage of the pipeline must idle for one cycle before beginning the instruction fetch for instruction I3.

Another example of a resource conflict is a situation in which multiple instructions are ready to enter
the execute instruction phase and there is a single ALU.

One solutions to such resource hazards is to increase available resources, such as having multiple
ports into main memory and multiple ALU units.

Data Hazards:
A data hazard occurs when there is a conflict in the access of an operand location.

In general terms, we can state the hazard in this form:

- Two instructions in a program are to be executed in sequence and both access a particular
memory or register operand. If the two instructions are executed in strict sequence, no
problem occurs. However, if the instructions are executed in a pipeline, then it is possible for
the operand value to be updated in such a way as to produce a different result than would
occur with strict sequential execution. In other words, the program produces an incorrect
result because of the use of pipelining.

There are three types of data hazards;

- Read after write (RAW), or true dependency: An instruction modifies a register or memory
location and a succeeding instruction reads the data in that memory or register location. A
hazard occurs if the read takes place before the write operation is complete.
- Write after read (WAR), or anti-dependency: An instruction reads a register or memory
location and a succeeding instruction writes to the location. A hazard occurs if the write
operation completes before the read operation takes place.

AQSA WAHEED B17101016

BSCS- 511 Computer Organization And Architecture

- Write after write (WAW), or output dependency: Two instructions both write to the same
location. A hazard occurs if the write operations take place in the reverse order of the
intended sequence.

Control Hazards:
A control hazard, also known as a branch hazard, occurs when the pipeline makes the wrong
decision on a branch prediction and therefore brings instructions into the pipeline that must
subsequently be discarded.

AQSA WAHEED B17101016

Common questions

Associative mapping is the most flexible, allowing any memory block to fill any cache line but requires complex circuitry for tag matching. Direct mapping is simple and less costly, accelerating access at the expense of potentially high conflict misses. Set-associative mapping strikes a balance by grouping cache lines into sets, reducing conflicts compared to direct mapping, while maintaining simpler hardware than fully associative mapping. Each technique offers trade-offs between complexity, cost, and performance .

As one moves down the memory hierarchy, the cost per bit decreases, and access time increases. This results from transitioning from expensive, high-speed memories like cache to larger, slower ones like main memory and disk. The implications are a trade-off between memory cost and speed; thus, data is managed to keep frequently accessed information in faster storage while less accessed data remains in slower memory, maximizing cost-effectiveness while minimizing performance impact .

Instruction pipelining improves processor performance by overlapping the fetch and execute stages of instruction cycles. The pipeline allows new instructions to be loaded before the completion of previous ones, theoretically doubling throughput if stages are perfectly balanced. However, discrepancies in fetch and execute time, and control flow changes (e.g., branches) may limit its effectiveness, necessitating strategies like instruction pre-fetching to mitigate execution delays .

The fetch-execute cycle in the Von Neumann architecture involves reading an instruction from memory (fetch) and then interpreting and executing it (execute). Instructions are stored in a single read-write memory and executed sequentially unless a branch instruction modifies this order. This cycle implies a linear workflow where instructions are processed one by one, making it efficient for simple tasks but potentially bottlenecked for complex operations where parallelism is beneficial .

In the Von Neumann architecture, the MBR is used for temporary storage of data being transferred to and from memory, acting as a buffer between memory and other CPU components. The MAR, on the other hand, holds the address of the data in memory to be accessed, directing data flow either for reading or writing. They function together to facilitate memory operations within the sequential fetch-execute cycle .

The Instruction Register (IR) holds the opcode of the currently executing instruction. During the fetch phase, the instruction is placed in the IR from memory, allowing the control unit to decode it and trigger the correct sequence of operations within the CPU. The IR simplifies control unit design, focusing execution processes on one instruction at a time .

The program counter holds the address of the next instruction to be executed, ensuring the sequential flow of instruction execution. In branching, the program counter is updated to point to a non-sequential address, redirecting execution flow based on branching logic. This ability is central to control flow in programs, enabling loops and conditional operations .

Direct mapping in cache memory offers simplicity and low cost by restricting each block of main memory to map to a single possible cache line, but it has disadvantages such as high conflict misses due to fixed mapping positions, which can lead to thrashing if frequently accessed data map to the same line, reducing cache hit rates. To mitigate this, victim caches or higher associativity mapping can be utilized .

Locality of reference refers to the tendency of a program to access a relatively small portion of its address space at any given time. This concept influences cache memory design by optimizing the hierarchy to ensure that frequently accessed data is kept in faster, more expensive cache levels, thus reducing access time and execution time. As a result, cache memory takes advantage of spatial and temporal locality to minimize the need for slower main memory accesses .

A Victim Cache is a small cache placed between a direct-mapped cache and the next memory level, designed to store blocks evicted due to cache conflicts. It temporarily holds data that might be re-accessed shortly, reducing the recurrence of misses caused by replacement. It allows data blocks that are frequently swapped out to be quickly retrieved, improving the overall hit rate despite the simplicity and limitations of the direct-mapped organization .

Computer Organization vs Architecture
No ratings yet
Computer Organization vs Architecture
30 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
67 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
125 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
10 pages
Computer Architecture and Performance Basics
No ratings yet
Computer Architecture and Performance Basics
38 pages
Understanding Computer Organization
No ratings yet
Understanding Computer Organization
46 pages
Understanding Von Neumann Architecture
No ratings yet
Understanding Von Neumann Architecture
11 pages
Lec2 Ias Computer Component
No ratings yet
Lec2 Ias Computer Component
37 pages
Computer System Architecture Overview
No ratings yet
Computer System Architecture Overview
40 pages
Mca @#$%
No ratings yet
Mca @#$%
34 pages
Computer Architecture Basics Explained
No ratings yet
Computer Architecture Basics Explained
40 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
56 pages
Computer Architecture Overview and Registers
No ratings yet
Computer Architecture Overview and Registers
28 pages
IAS Computer Architecture Overview
No ratings yet
IAS Computer Architecture Overview
5 pages
Computer Architecture Overview Guide
No ratings yet
Computer Architecture Overview Guide
40 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
36 pages
CPU Architecture and Functionality Explained
No ratings yet
CPU Architecture and Functionality Explained
64 pages
Overview of Computer Architecture and Buses
No ratings yet
Overview of Computer Architecture and Buses
18 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
59 pages
Computer Organization and Architecture Overview
No ratings yet
Computer Organization and Architecture Overview
184 pages
Overview of Computer Architecture in CSC 303
100% (1)
Overview of Computer Architecture in CSC 303
36 pages
Computer Architecture Overview in Nigeria
No ratings yet
Computer Architecture Overview in Nigeria
62 pages
Overview of Computer Architecture Functions
No ratings yet
Overview of Computer Architecture Functions
14 pages
Computer Organization & Architecture Overview
No ratings yet
Computer Organization & Architecture Overview
158 pages
Computer Functions: Processing, Storage, Control
No ratings yet
Computer Functions: Processing, Storage, Control
40 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
59 pages
Computer Organization and Architecture Guide
No ratings yet
Computer Organization and Architecture Guide
39 pages
CPU Command Execution Time Analysis
No ratings yet
CPU Command Execution Time Analysis
10 pages
Computer Architecture: Instruction Execution
No ratings yet
Computer Architecture: Instruction Execution
38 pages
Coa Module1
No ratings yet
Coa Module1
51 pages
Understanding CPU and MDR Functions
No ratings yet
Understanding CPU and MDR Functions
6 pages
Computer Architecture Overview and Evolution
No ratings yet
Computer Architecture Overview and Evolution
59 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
31 pages
Unit 1 1
No ratings yet
Unit 1 1
81 pages
Computer Architecture and Organization Notes
100% (1)
Computer Architecture and Organization Notes
18 pages
Computer Organization Lecture Notes
No ratings yet
Computer Organization Lecture Notes
126 pages
Microcomputer Architecture Overview
No ratings yet
Microcomputer Architecture Overview
12 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
59 pages
Introduction to Computer Architecture
No ratings yet
Introduction to Computer Architecture
24 pages
Computer Organization & Architecture Course
No ratings yet
Computer Organization & Architecture Course
103 pages
Computer Architecture and Organization Overview
No ratings yet
Computer Architecture and Organization Overview
26 pages
Last Minute Notes on CPU Organization
No ratings yet
Last Minute Notes on CPU Organization
44 pages
CPU Architecture and Instruction Set Overview
No ratings yet
CPU Architecture and Instruction Set Overview
18 pages
Understanding Computer Architecture and Buses
No ratings yet
Understanding Computer Architecture and Buses
14 pages
DPCO Engineering Overview and Questions
No ratings yet
DPCO Engineering Overview and Questions
14 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
422 pages
Overview of Computer Organization
No ratings yet
Overview of Computer Organization
46 pages
Computer Organization & Architecture Basics
No ratings yet
Computer Organization & Architecture Basics
61 pages
Computer Architecture Overview & 8086 Insights
No ratings yet
Computer Architecture Overview & 8086 Insights
121 pages
Overview of Computer System Architecture
100% (1)
Overview of Computer System Architecture
45 pages
Introduction to Computer Systems Course
No ratings yet
Introduction to Computer Systems Course
31 pages
CSC 311: Computer Organization Overview
No ratings yet
CSC 311: Computer Organization Overview
25 pages
Computer Organization and Design Basics
No ratings yet
Computer Organization and Design Basics
46 pages
Computer Organization and Architecture Overview
No ratings yet
Computer Organization and Architecture Overview
39 pages
Key Components of a Computer System
No ratings yet
Key Components of a Computer System
36 pages
Understanding Computer Architecture Basics
No ratings yet
Understanding Computer Architecture Basics
20 pages
Computer Components Overview in Architecture
No ratings yet
Computer Components Overview in Architecture
4 pages
Computer Organization Overview and Functions
No ratings yet
Computer Organization Overview and Functions
10 pages
von Neumann Model & LC-3 Overview
No ratings yet
von Neumann Model & LC-3 Overview
36 pages
KU All-Karachi Literary Festival 2025
No ratings yet
KU All-Karachi Literary Festival 2025
21 pages
Smart Parking System WBS Overview
No ratings yet
Smart Parking System WBS Overview
1 page
Movie Rental Database Management System
No ratings yet
Movie Rental Database Management System
16 pages
University Event Management App Plan
No ratings yet
University Event Management App Plan
6 pages
Adjusted FP: Evaluating System Complexity
No ratings yet
Adjusted FP: Evaluating System Complexity
1 page
Comparative Analysis of Sufi Ghazals
No ratings yet
Comparative Analysis of Sufi Ghazals
14 pages
CS 459 Probability & Statistics Assignments
No ratings yet
CS 459 Probability & Statistics Assignments
9 pages
Two-Sample Hypothesis Testing Guide
No ratings yet
Two-Sample Hypothesis Testing Guide
2 pages
SRS Document Template Overview
No ratings yet
SRS Document Template Overview
2 pages
Statistics Cheat Sheet: Distributions Guide
No ratings yet
Statistics Cheat Sheet: Distributions Guide
1 page
Java Array Operations and Examples
No ratings yet
Java Array Operations and Examples
10 pages
VFD Annual Maintenance Overview
No ratings yet
VFD Annual Maintenance Overview
11 pages
T Rec G.1020 200607 I!!pdf e
No ratings yet
T Rec G.1020 200607 I!!pdf e
40 pages
VT2730-1M SM 1a
No ratings yet
VT2730-1M SM 1a
109 pages
Credit Card Fraud Detection Using AI
100% (1)
Credit Card Fraud Detection Using AI
12 pages
EX600 Series Fieldbus System Manual
No ratings yet
EX600 Series Fieldbus System Manual
1 page
Managing Resources in UiPath Orchestrator
No ratings yet
Managing Resources in UiPath Orchestrator
17 pages
Software Testing Essentials Explained
No ratings yet
Software Testing Essentials Explained
51 pages
CBSE Science Challenge 2021-22 Guide
0% (1)
CBSE Science Challenge 2021-22 Guide
2 pages
Early Childhood Online Learning Resources
No ratings yet
Early Childhood Online Learning Resources
2 pages
Core Programming Concepts Explained
No ratings yet
Core Programming Concepts Explained
6 pages
Node Addition Procedure in NFM-T
No ratings yet
Node Addition Procedure in NFM-T
13 pages
Wireless Safety System for Mine Workers
No ratings yet
Wireless Safety System for Mine Workers
2 pages
Advantages of Linux Operating System
No ratings yet
Advantages of Linux Operating System
9 pages
Cloud Data Security and Accountability Framework
No ratings yet
Cloud Data Security and Accountability Framework
4 pages
OLAP in Data Warehousing and Mining
No ratings yet
OLAP in Data Warehousing and Mining
28 pages
Amazon Cell Phone Reviews Analysis
No ratings yet
Amazon Cell Phone Reviews Analysis
14 pages
CSEIJ: International Journal Overview
No ratings yet
CSEIJ: International Journal Overview
2 pages
60RT Lift Tek Parts Manual
No ratings yet
60RT Lift Tek Parts Manual
17 pages
Advantages of Database Approach
No ratings yet
Advantages of Database Approach
18 pages
Essential Network Protocols Explained
No ratings yet
Essential Network Protocols Explained
2 pages
Academic Prize List 2024-25
No ratings yet
Academic Prize List 2024-25
3 pages
Application Development and Emerging Technologies Prelim
No ratings yet
Application Development and Emerging Technologies Prelim
9 pages
MT830/MT831 Precision Energy Meter
No ratings yet
MT830/MT831 Precision Energy Meter
2 pages
Flipkart Mobile Bill for Samsung M32 5G
No ratings yet
Flipkart Mobile Bill for Samsung M32 5G
1 page
List of Jakarta Construction Companies
No ratings yet
List of Jakarta Construction Companies
18 pages
Wearable Sensor Network For Lower Limb Angle Estimation in Robotics Applications
No ratings yet
Wearable Sensor Network For Lower Limb Angle Estimation in Robotics Applications
10 pages
Linear Amplifier KL 500-24 Overview
No ratings yet
Linear Amplifier KL 500-24 Overview
2 pages
SRAM Design for 3nm Nanosheet Technology
No ratings yet
SRAM Design for 3nm Nanosheet Technology
4 pages
Ultrasonic Sensor Characteristics Study
No ratings yet
Ultrasonic Sensor Characteristics Study
6 pages
Nutanix Kubernetes Platform Overview
No ratings yet
Nutanix Kubernetes Platform Overview
6 pages