Q1.
Explain basic concept of pipelined processors with Stage → T1 T2 T3 T4 T5 T6
space-time diagrams. I1 FI DI EX FO WB
Q2. Explain Handler's classification of pipeline processor I2 FI DI EX FO WB
according to levels of processing. I3 FI DI EX FO
Q3. Explain the basic structure of linear pipeline I4 FI DI EX
processor Explanation of Diagram:
Q4. Explain Ramamoorthy and Li's classification of • Each row represents one instruction13.
pipeline processor according to pipeline configurations • Each column represents one pipeline clock tick14.
and control strategies. • Instructions enter the pipeline one cycle apart15.
Q5. Explain any four performance evaluation factors for • After the initial filling, results come out every
pipeline processors. cycle16.
• This overlapping execution achieves Higher
Q6. Draw and explain S-access memory organization.
instruction throughput and Better utilization of
Q7. What is the advantage of Interleaved Memory hardware resources.
Organizations? Define memory bandwidth and explain
the factors affecting on memory bandwidth?
Q2. Explain Handler's classification of pipeline processor
Q8. Illustrate three classes of data-dependent hazards according to levels of processing.
according to various data update patterns.
Handler classified pipeline processors into three
Q9. Draw and explain C-access memory organization. hierarchical levels, based on how deeply parallelism is
Q10. Explain in short how basic scheduling and loop used inside a processor:
unrolling is used to increase the ILP. 1. Processor-Level Pipelining
2. Functional-Level Pipelining
------------------------------------------------------------------------------
3. Subsystem-Level Pipelining
Q1. Explain basic concept of pipelined processors with These levels show how instructions, operations, and
space-time diagrams. micro-operations flow through pipelines.
• Higher levels give coarse-grain pipelining.
1. Definition of Pipelining
• Lower levels give fine-grain pipelining.
Pipelining is a method of segmenting a sequential process • Modern CPUs use all three levels together.
into a number of suboperations1. Each suboperation is • Handler’s classification helps understand how parallelism
performed in a dedicated pipeline stage, and multiple is used inside a processor at multiple layers.
tasks are processed simultaneously in different stages2.
1. Processor-Level Pipelining (Instruction-Level Pipelining)
2. Basic Concept
• This is the highest level of pipelining.
A pipelined processor divides instruction execution into
• Here, whole instructions pass through pipeline stages
several stages, such as:
•
like:
FI - Fetch Instruction 3
•
Fetch → Decode → Execute → Write-back
DI - Decode Instruction 4
• Many instructions stay in different stages at the same
• EX - Execute Operation 5
time, increasing speed.
• FO - Fetch Operand / Memory Access 6
• Examples: RISC processors, scalar pipelines, classical
•
WB - Write Back 7 instruction pipelines.
Each stage completes one small part of the instruction8. • Main goal: improve instruction throughput (more
While one instruction moves to the next stage, the instructions per second).
following instruction enters the previous stage9. This
allows the pipeline to remain continuously busy10. After 2. Functional-Level Pipelining (Operation-Level Pipelining)
the initial filling of the pipeline, one instruction is • At this level, each instruction is divided into smaller
completed per clock cycle, which increases throughput11. operations, such as:
3. Space-Time Diagram – ALU operations
The space-time diagram shows how the instruction stages – Address calculations
overlap across time (clock ticks)12. – Shifts and logic operations
• Different functional units are pipelined so multiple
operations can overlap. • Fixed Sequence of Stages: The flow is always left-
• Examples: to-right, passing through all segments10.
– Pipelined floating-point adder or multiplier • Clock Synchronization: All stages operate under
– Multiple pipelined units working together a common clock, and each stage must complete
• Purpose: increase parallel execution of operations inside one suboperation per cycle11.
an instruction. • Register Buffers: Pipeline registers ($R_i$) are
placed between every two stages to hold
3. Subsystem-Level Pipelining (Segment-Level or Task- intermediate results12. These registers isolate
Level) the stages and allow parallel execution13.
• This is the lowest and most detailed level of pipelining.
• Work is divided into micro-operations inside subsystems
like: • Uniform or Non-uniform Stages: The stages can
– Memory subsystem either have equal delay (Uniform pipeline) or
– Cache subsystem vary in delay (Non-uniform pipeline)14.
– I/O subsystem
• Each subsystem has its own mini-pipeline to speed up
Q4. Explain Ramamoorthy and Li's classification of
internal tasks.
pipeline processor according to pipeline configurations
• Example: memory performing address translation,
and control strategies.
access, buffering, and write-back in overlapping pipeline
stages. Ramamoorthy and Li classified pipeline processors based
on how the pipeline stages are organized and controlled.
This classification helps in understanding different pipeline
Q3. Explain the basic structure of linear pipeline structures used in processors and arithmetic units. They
processor
divided pipeline processors into four major categories:
A linear pipeline processor is a type of pipeline where 1. Asynchronous Pipelines
operations flow through a sequence of stages arranged in a. These pipelines do not use a global
a straight line. clock.
Each stage completes a fixed suboperation, and the output b. Each stage starts processing when the
of one stage becomes the input of the next. This is the previous stage finishes and passes a
most common form of pipelining used in arithmetic units, “done” signal.
instruction pipelines, and vector processors. c. Communication happens through
handshaking signals.
d. Main features:
i. No clock skew problem
ii. Suitable when stage delays
are irregular
iii. More flexible and adaptive
e. Applications: Used in variable-speed
arithmetic units and special-purpose
hardware.
2. Synchronous Pipelines (Clocked Pipelines)
a. All pipeline stages operate under a
common global clock.
b. Each stage must complete its
operation within one clock cycle.
Structural Diagram (Linear Pipeline) c. Pipeline registers hold intermediate
results between stages.
d. Main features:
Where each S_i$ is a pipeline stage9.
i. Simple control mechanism
2. Key Characteristics
ii. Easy to design and
The structure is defined by these characteristics:
implement
e. Applications: Used in most RISC and
CISC processors.
f. Note: This is the most common form of
pipeline in modern CPUs.
3. Dynamic Pipelines (Data-Driven Pipelines)
a. Operation flow depends on the data.
b. Stages may be bypassed or
reconfigured depending on the
instruction type or operand
availability.
c. Example: A floating-point unit may skip
stages for integer operations.
d. Main features:
i. High flexibility
ii. Better suited for processors
executing mixed instruction
types
iii. Improves hardware
utilization
e. Applications: Often used in complex
arithmetic units and vector pipelines.
4. Static Pipelines (Fixed-Function Pipelines)
a. Pipeline configuration is fixed and
predetermined.
b. Every instruction uses the same
sequence of stages.
c. No stage skipping or dynamic behavior.
d. Main features:
i. Simple hardware
ii. Predictable timing
e. Applications: Suitable for simple scalar Q6. Draw and explain S-access memory organization.
pipelines and fixed-function units such Concept:
as adders or multipliers. • Memory words are stored in a single memory
Q5. Explain any four performance evaluation factors for module.
pipeline processors.
• CPU can read or write only one word per
memory cycle.
• Addresses are supplied sequentially; memory
responds one at a time.
• No interleaving or overlapping of accesses.
Working:
• CPU sends an address with Read/Write
command.
• Memory decodes the address and accesses the
word.
• Only one operation per memory cycle; CPU waits
until it completes.
Characteristics:
1. Single Bank / Single Module: One access at a
time.
2. No Parallelism: No simultaneous reads/writes or • Consecutive addresses do not map to the same
instruction/data overlap. bank, avoiding conflicts.
3. Fixed Memory Cycle: Access time is constant. 2. Memory Bandwidth
4. Simple Hardware: Easy address decoding, data • Definition: Memory bandwidth is the rate at
transfer, and control. which data can be read from or written to
5. Eliminates Hotspot Problem: Simple control memory, usually measured in bits or words per
reduces hotspots. second.
Block Diagram: 3. Factors Affecting Memory Bandwidth:
• Shows a single memory module connected to a) Number of Memory Banks: More banks → more
CPU with address, data, and control lines. parallelism → higher bandwidth.
b) Memory Cycle Time (Tc): Shorter cycle time → faster
access → higher bandwidth.
c) Width of Memory Bus: Wider bus transfers more bits
per cycle (e.g., 64-bit bus has double bandwidth of 32-bit
bus).
d) Interleaving Degree: Higher interleaving reduces access
conflicts → increases bandwidth.
e) Access Patterns: Sequential access benefits more;
random access may cause conflicts → lower bandwidth.
f) Cache Support: Reduces main memory requests →
effective bandwidth increases.
Advantages:
• Very simple organization. Q8. Illustrate three classes of data-dependent hazards
according to various data update patterns.
• Low hardware cost.
• Easy to design and control. Concept:
• Data-dependent hazards occur when an
Q7. What is the advantage of Interleaved Memory instruction depends on the result of a previous
Organizations? Define memory bandwidth and explain instruction that has not yet completed in the
the factors affecting on memory bandwidth? pipeline.
1. Advantage of Interleaved Memory Organization • According to Kai Hwang, there are three main
• Memory is divided into multiple banks; classes:
consecutive addresses are stored in different o Read After Write (RAW)
banks. o Write After Read (WAR)
• Allows parallel access to multiple memory o Write After Write (WAW)
words.
Main Advantages: 1. Read After Write (RAW Hazard)
a) Higher Throughput: • Definition: Occurs when an instruction reads a
• CPU can access different banks in successive value that has not yet been written by an earlier
cycles. instruction. Also called a true dependency.
• Example: With 4 banks, 4 words can be accessed • Example:
in 4 consecutive cycles. o I1: R1 ← R2 + R3
b) Reduced Memory Stall: o I2: R4 ← R1 + 5
• While one bank is busy, another bank can be o I2 reads R1 before I1 writes → hazard
accessed. occurs.
• Hides memory latency and keeps pipeline busy. • Reason: Write stage of I1 occurs after the read
c) Better Support for Pipelined and Vector Processors: stage of I2.
• Vector instructions need a continuous stream of • Impact: Can cause pipeline stalls unless resolved
data. by forwarding/bypassing.
• Interleaving ensures uninterrupted data supply.
2. Write After Read (WAR Hazard) • Multiple words can be read or written
• Definition: Occurs when a later instruction writes concurrently depending on CPU requests.
a value before an earlier instruction reads it. Also • Used in high-speed, pipelined, and vector
called an anti-dependency. processors.
• Example:
o I1: R4 ← R1 + R2 2. Block Diagram:
o I2: R1 ← 10 CPU
o I2 may write to R1 before I1 reads → -----------------
hazard occurs. | Address/Data |
• Reason: Out-of-order execution or early writes in -----------------
the pipeline can cause conflict. / | \
• Resolution: Register renaming or in-order read +--------+--------+ ...
scheduling. | Bank 0 | Bank 1 | ... Bank n
+--------+--------+
3. Write After Write (WAW Hazard)
• Definition: Occurs when two instructions write • Each bank is accessed independently.
to the same destination and writes happen in • CPU can issue multiple requests simultaneously.
the wrong order. Also called an output • Banks operate concurrently, avoiding stalls.
dependency.
• Example: 3. Working of C-access Memory:
o I1: R5 ← R2 + 3 1. CPU sends multiple addresses in one cycle.
o I2: R5 ← R4 + 2 2. Memory controller decodes addresses and
o If I2 writes before I1 → incorrect final assigns them to the appropriate banks.
value. 3. Each bank completes its read/write
• Reason: Out-of-order completion or different independently.
write stages. 4. Data is returned to CPU concurrently, improving
• Resolution: Register renaming or in-order write- throughput.
back.
4. Characteristics:
Conclusion: a) High Parallelism: Supports multiple simultaneous
• RAW, WAR, and WAW hazards arise due to data accesses; keeps pipelines fed.
dependencies in pipelines. b) Complex Control: Memory controller manages
• They are resolved using techniques like concurrent access, avoiding conflicts.
forwarding, pipeline stalls, register renaming, c) High Throughput: Increases with number of banks and
and in-order execution to ensure correct concurrency.
program results. d) Suitable for Vector/Pipelined Processors: Supplies
multiple data elements per instruction without stalls.
Q9. Draw and explain C-access memory organization.
5. Advantages:
Introduction: • Higher memory bandwidth than S-access
• C-access memory allows simultaneous access of memory.
multiple memory modules by the CPU. • Reduces CPU idle time.
• Unlike S-access memory, which accesses only • Supports parallel processing efficiently.
one module at a time, C-access improves system 6. Limitations:
throughput through parallelism. • More expensive hardware (more banks, more
control logic).
1. Concept of C-access Memory: • Complex management of bank conflicts.
• Memory is divided into multiple banks/modules. • Higher power consumption with more active
• Each bank has independent access lines. banks.
Q10. Explain in short how basic scheduling and loop • Scheduling + Loop Unrolling exposes more
unrolling is used to increase the ILP. independent instructions to the pipeline.
Introduction: • Reduces data hazards and control hazards.
• Instruction-Level Parallelism (ILP): Ability of a • Maximizes functional unit usage → improves
processor to execute multiple instructions CPU throughput.
simultaneously.
• Two common techniques to exploit ILP: 4. Key Points:
o Basic Instruction Scheduling • Program results remain unchanged.
o Loop Unrolling • Can be implemented by compiler or hardware.
• These reduce pipeline stalls and increase parallel • Particularly useful in RISC and superscalar
execution. architectures.
• ILP improvement depends on the number of
1. Basic Instruction Scheduling: independent instructions available.
• Definition: Rearranging instruction order without
changing program results to avoid pipeline
hazards.
• Goal: Minimize stalls due to data or control
dependencies.
• Example:
o Original:
I1: R1 ← R2 + R3
I2: R4 ← R1 × 5
I3: R5 ← R6 + R7
o Scheduled:
I1: R1 ← R2 + R3
I3: R5 ← R6 + R7 (moved to avoid stall)
I2: R4 ← R1 × 5
• Effect: I3 executes while waiting for I1 →
reduces pipeline bubbles → increases ILP.
2. Loop Unrolling:
• Definition: Replicating loop body multiple times,
reducing loop control instructions.
• Effect: Exposes more independent instructions
for parallel execution.
• Example:
o Original Loop:
for i = 1 to 4
A[i] = B[i] + C[i]
o Unrolled (factor 2):
A[1] = B[1] + C[1]
A[2] = B[2] + C[2]
A[3] = B[3] + C[3]
A[4] = B[4] + C[4]
• Effect: Independent operations (A[1], A[2], …)
can execute simultaneously → reduces loop
overhead → improves pipeline utilization.
3. Combined Effect: