Cache and Pipeline Performance Analysis

Uploaded by

guttamaneesha456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

Cache and Pipeline Performance Analysis

Uploaded by

guttamaneesha456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Q1.

How many total bits are required for a direct-mapped cache with 128 KB of data and 1-
word block size, assuming a 32-bit address?
Q2. How many total bits are required for a direct-mapped cache with 128 KB of data and 4-
word block size, assuming a 32-bit address?
Q3. Consider a cache with 64 blocks and a block size of 16 bytes. What block number does
byte address 1200 map to?
Q4. Assume for a given machine and program:
 instruction cache miss rate 2%
 data cache miss rate 4%
 miss penalty always 40 cycles
 CPI of 2 without memory stalls
 frequency of load/stores 36% of instructions
a) How much faster is a machine with a perfect cache that never misses?

b) What happens if we speed up the machine by reducing its CPI to 1 without changing
the clock rate?
c) What happens if we speed up the machine by doubling its clock rate, but if the
absolute time for a miss penalty remains same?
Q5. Find the number of misses for a cache with four 1-word blocks given the following
sequence of memory block accesses:0, 8, 0, 6, 8, for each of the following cache
configurations
a) direct mapped
b) 2-way set associative (use LRU replacement policy)
c) fully associative

Q6. Assume a 500 MHz machine with

 base CPI 1.0
 main memory access time 200 ns.
 miss rate 5%
How much faster will the machine be if we add a second-level cache with 20ns access time
that decreases the miss rate to 2%?
Q7. In this exercise, we examine how resource hazards, control hazards, and Instruction Set
Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to
the following fragment of MIPS code:
sw r16,12(r6)
lw r16,8(r6)
beq r5,r4,Label # Assume r5!=r4
add r5,r1,r4
slt r5,r15,r4
Assume that individual pipeline stages have the following latencies:
IF: 200ps ID:120ps EX : 150ps MEM 190ps WB: 100ps
a) For this problem, assume that all branches are perfectly predicted (this eliminates all
control hazards) and that no delay slots are used. If we only have one memory (for
both instructions and data), there is a structural hazard every time we need to fetch an
instruction in the same cycle in which another instruction accesses data. To guarantee
forward progress, this hazard must always be resolved in favor of the instruction that
accesses data. What is the total execution time of this instruction sequence in the 5-
stage pipeline that only has one memory?
b) We have seen that data hazards can be eliminated by adding nops to the code. Can you
do the same with this structural hazard? Why?
c) For this problem, assume that all branches are perfectly predicted (this eliminates all
control hazards) and that no delay slots are used. If we change load/store instructions
to use a register (without an off set) as the address, these instructions no longer need
to use the ALU. As a result, MEM and EX stages can be overlapped and the pipeline
has only 4 stages. Change this code to accommodate this changed ISA. Assuming this
change does not affect clock cycle time, what speedup is achieved in this instruction
sequence?
d) Assuming stall-on-branch and no delay slots, what speedup is achieved on this code if
branch outcomes are determined in the ID stage, relative to the execution where
branch outcomes are determined in the EX stage?
e) Given these pipeline stage latencies, repeat the speedup calculation from 4.10.2, but
take into account the (possible) change in clock cycle time. When EX and MEM are
done in a single stage, most of their work can be done in parallel. As a result, the
resulting EX/MEM stage has a latency that is the larger of the original two, plus 20 ps
needed for the work that could not be done in parallel.
f) Given these pipeline stage latencies, repeat the speedup calculation from 4.10.3,
taking into account the (possible) change in clock cycle time. Assume that the latency
ID stage increases by 50% and the latency of the EX stage decreases by 10ps when
branch outcome resolution is moved from EX to ID.

Q8. A processor X1 operating at 2 GHz has a standard 5-stage RISC instruction pipeline
having a base CPI (cycles per instruction) of one without any pipeline hazards. For a given
program P that has 30% branch instructions, control hazards incur 2 cycles stall for every
branch. A new version of the processor X2 operating at same clock frequency has an
additional branch predictor unit (BPU) that completely eliminates stalls for correctly
predicted branches. There is neither any savings nor any additional stalls for wrong
predictions. There are no structural hazards and data hazards for X1 and X2. If the BPU has a
prediction accuracy of 80% what will be the speed up (rounded off to two decimal places)
obtained by X2 over X1 in executing P?
Q9. Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to
complete an instruction. You are going to make a 5-stage pipeline out of this processor.
Overheads associated with pipelining force you to operate the pipelined processor at 2 GHz.
In a given program, assume that 30% are memory instructions, 60% are ALU instructions and
the rest are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles
each due to cache misses and 50% of the branch instructions cause stalls of 2 cycles each.
Assume that there are no stalls associated with the execution of ALU instructions. For this
program,what will be the speedup achieved by the pipelined processor over the non-pipelined
processor (round off to 2 decimal places)?
Q10. Consider the following processors (ns stands for nanoseconds). Assume that the pipeline
registers have zero latency.
Four-stage pipeline with stage latencies 1ns, 2 ns, 2ns, 1ns
Four-stage pipeline with stage latencies 1ns, 1.5ns, 1.5ns, 1.5ns
Five-stage pipeline with stage latencies 0.5ns, 1ns, 1ns, 0.6ns, 1 ns
Five-stage pipeline with stage latencies 0.5ns, 0.5 ns, 1ns, 1ns, 1.1ns
Which processor has the highest peak clock frequency?
Q11. Consider an instruction pipeline with four stages (S1, S2, S3, and S4) and each with
combinational circuit only. The pipeline registers are required between each stage and at the
end of the last stage. Delays for the stages and for the pipeline registers are as given in the
figure.

What is the approximate speed up of the pipeline in steady state under ideal conditions when
compared to the corresponding non-pipeline implementation?
Q12. A -stage pipelined processor has Instruction Fetch (IF) Instruction Decode
(ID) Operand Fetch (OF) Perform Operation (PO) and Write Operand (WO) stages.
The IF, ID, OF and WO stages take 1 clock cycle each for ADD and SUB any instruction.
The PO stage takes 1 clock cycle for ADD and SUB instructions, 3 clock cycles
for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand
forwarding is used in the pipeline. What is the number of clock cycles needed to execute the
following sequence of instructions?
Q13. Consider a 4 stage pipeline processor. The number of cycles needed by the four
instructions I1, I2, I3, and I4, in stages S1, S2, S3, S4 is shown below.

What is the number of cycles needed to execute the following loop?

For (i=1 to 2){I1; I2; I3; I4;}
Q14. Consider a pipelined processor with the following four stages
IF: Instruction Fetch
ID: Instruction Decode and Operand Fetch
EX: Execute
WB: Write Back
The IF, ID, and WB stages take one clock cycle each to complete the operation. The number
of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions
need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand
forwarding is used in the pipelined processor. What is the number of clock cycles taken to
complete the following sequence of instructions?

Q15. A CPU has five stages pipeline and runs at 1GHz frequency. Instruction fetch happens
in the first stage of the pipeline. A conditional branch instruction computes the target address
and evaluates the condition in the third stage of the pipeline. The processor stops fetching
new instruction following a conditional branch until the branch outcome is known. A program
executes 109 instructions out of which 20% are conditional branches. If each instruction takes
one cycle to complete on average, then what will be the total execution time of the program?
Q16. Consider an instruction pipeline with five stages without any branch prediction: Fetch
Instruction (FI) Decode Instruction (DI) Fetch Operand (FO) Execute Instruction (EI) and
Write Operand (WO) The stage delays for FI, DI, FO, EI, and WO are 5ns, 7ns, 10ns, 8ns,
and 6ns respectively. There are intermediate storage buffers after each stage and the delay of
each buffer is 1ns. A program consisting of 12 instructions I1, I2, I3, ………I12 is executed
in this pipelined processor. Instruction I4 is the only branch instruction and its branch target
is I9. If the branch is taken during the execution of this program, how much time (in ns) will
be needed to complete the program?
Q17. A processor with 16 general purpose registers uses a 32-bit instruction format. The
instruction format consists of an opcode field, an addressing mode field, two register operand
fields, and a 16-bit scalar field. If 8 addressing modes are to be supported, what is the
maximum number of unique opcodes possible for every addressing mode?
Q18. Consider a computer architecture where instructions are 16 bits long. The first 6 bits of
the instruction are reserved for the opcode, and the remaining 10 bits are used for the
operands. There are three addressing modes: immediate, direct, and register. For immediate
addressing, the operand is included in the instruction itself. For direct addressing, the operand
is a memory address. For register addressing, the operand is a register number. Write the
instruction format for each of the addressing modes.
Q19. A computer has a 256 KByte, 4-way set associative, write back data cache with block
size of 32 Bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag
directory entry contains, in addition to address tag, 2 valid bits, 1 modified bit and 1
replacement bit. The number of bits in the tag field of an address is
Q20. An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size
32-bytes. The processor generates 32-bit addresses. The cache controller maintains the tag
information for each cache block comprising of the following. 1 Valid bit 1 Modified bit As
many bits as the minimum needed to identify the memory block mapped in the cache. What is
the total size of memory needed at the cache controller to store meta-data (tags) for the
cache?
Q21. A computer system has an L1 cache, an L2 cache, and a main memory unit connected as
shown below. The block size in L1 cache is 4 words. The block size in L2 cache is 16 words.
The memory access times are 2 nanoseconds. 20 nanoseconds and 200 nanoseconds for L1
cache, L2 cache and main memory unit respectively.

When there is a miss in L1 cache and a hit in L2 cache, a block is transferred from L2 cache
to L1 cache. What is the time taken for this transfer?
Q22. A given program has 25% load/store instructions. Suppose the ideal CPI (cycles per
instruction) without any memory stalls is 2. The program exhibits 2% miss rate on instruction
cache and 8% miss rate on data cache. The miss penalty is 100 cycles. What is the
speedup (rounded off to two decimal places) achieved with a perfect cache (i.e., with NO data
or instruction cache misses)?
Q23. Explain in detail the Flynn’s classification of computer architectures. Differentiate
between SISD, SIMD, MISD, and MIMD architectures with suitable diagrams. Discuss how
instruction and data streams are handled in each category of Flynn’s taxonomy. What are the
limitations of Flynn’s classification in the context of modern parallel processing systems?
Q24. Describe the evolution of computers from the first generation to the fifth generation.
What were the main technological developments that distinguished each generation of
computers? Describe the differences in hardware, software, and performance among the five
generations of computers. Discuss the invention of transistors and how it revolutionized
computer design. Explain the role of integrated circuits (ICs) in the development of the third-
generation computers. Describe how microprocessors led to the birth of personal computing.
What are the main features and goals of the fifth-generation computers? How does artificial
intelligence fit into this generation?
Q25. Problems in this exercise assume that logic blocks needed to implement a processor’s
datapath have the following latencies:
I-Mem : 200 ps Add: 70ps Mux 20ps ALU 90ps Regs 90ps D-Mem 250ps Sign-Extend
15ps Shift-Left-2 10ps
a) If the only thing we need to do in a processor is fetch consecutive instructions, what
would the cycle time be?
b) Consider a datapath where the processor that only has one type of instruction:
unconditional PC-relative branch. What would the cycle time be for this datapath?
c) Repat b) but this time we need to support only conditional PC-relative branches.

Instruction Sequences for CPU Architectures
No ratings yet
Instruction Sequences for CPU Architectures
28 pages
Computer Architecture Overview
No ratings yet
Computer Architecture Overview
70 pages
GSEB Class 12 Computer Ch 2 MCQs
No ratings yet
GSEB Class 12 Computer Ch 2 MCQs
21 pages
Performance Metrics in Computer Architecture
No ratings yet
Performance Metrics in Computer Architecture
33 pages
CPUSIM Installation and Usage Guide
No ratings yet
CPUSIM Installation and Usage Guide
25 pages
Functional Units of Computer Systems
100% (1)
Functional Units of Computer Systems
3 pages
Control Structures in C Programming
No ratings yet
Control Structures in C Programming
15 pages
HTML Forms Creation in KompoZer
No ratings yet
HTML Forms Creation in KompoZer
106 pages
M-Commerce MCQs for Class 12
No ratings yet
M-Commerce MCQs for Class 12
6 pages
GSEB Class 12 Computer Science Syllabus
50% (2)
GSEB Class 12 Computer Science Syllabus
2 pages
Java Programming Concepts for Class XII
No ratings yet
Java Programming Concepts for Class XII
72 pages
Java Arrays and Strings Exercise Questions
100% (1)
Java Arrays and Strings Exercise Questions
10 pages
8085 Microprocessor Instruction Set Guide
No ratings yet
8085 Microprocessor Instruction Set Guide
23 pages
8085 Microprocessor Lab Viva Questions
No ratings yet
8085 Microprocessor Lab Viva Questions
15 pages
Vim Editor and Shell Scripting Basics
No ratings yet
Vim Editor and Shell Scripting Basics
8 pages
Free Tools and Services MCQ Guide
No ratings yet
Free Tools and Services MCQ Guide
3 pages
Fundamentals of Computer Systems
No ratings yet
Fundamentals of Computer Systems
14 pages
E-Commerce MCQs for Class 12 Students
No ratings yet
E-Commerce MCQs for Class 12 Students
4 pages
CAO Question Bank with Answers
No ratings yet
CAO Question Bank with Answers
4 pages
Pipelining Techniques and Speedup Analysis
No ratings yet
Pipelining Techniques and Speedup Analysis
9 pages
Java Basics MCQs for Beginners
No ratings yet
Java Basics MCQs for Beginners
4 pages
Designing Websites with KompoZer Guide
No ratings yet
Designing Websites with KompoZer Guide
5 pages
8086 Microprocessor Pin Configuration
No ratings yet
8086 Microprocessor Pin Configuration
21 pages
B.Tech 3rd Semester Syllabus JUT
No ratings yet
B.Tech 3rd Semester Syllabus JUT
11 pages
STD - 12 Chapter-8 Mcq's
No ratings yet
STD - 12 Chapter-8 Mcq's
2 pages
COA Important Questions Overview
No ratings yet
COA Important Questions Overview
3 pages
Types of Schedulers in Operating Systems
No ratings yet
Types of Schedulers in Operating Systems
15 pages
Understanding Programming Language Levels
No ratings yet
Understanding Programming Language Levels
7 pages
8085 Assembly Program for 8-bit Multiplication
No ratings yet
8085 Assembly Program for 8-bit Multiplication
14 pages
Multiprocessor Architecture Overview
100% (1)
Multiprocessor Architecture Overview
10 pages
CPU Structure and Control Unit Functions
No ratings yet
CPU Structure and Control Unit Functions
120 pages
Mod-10 Asynchronous Ripple Counter Guide
No ratings yet
Mod-10 Asynchronous Ripple Counter Guide
13 pages
Data Types in Computer Architecture
100% (1)
Data Types in Computer Architecture
88 pages
Kumar Paper Set 12 Computer Guide
No ratings yet
Kumar Paper Set 12 Computer Guide
89 pages
HTML Forms with KompoZer Guide
No ratings yet
HTML Forms with KompoZer Guide
4 pages
Type Conversion in C: Implicit vs Explicit
No ratings yet
Type Conversion in C: Implicit vs Explicit
4 pages
Binary to BCD Converter Design Guide
No ratings yet
Binary to BCD Converter Design Guide
129 pages
Memory Management in Operating Systems
No ratings yet
Memory Management in Operating Systems
5 pages
HTML and C Programming Basics
No ratings yet
HTML and C Programming Basics
12 pages
CSS and JavaScript Fundamentals
100% (1)
CSS and JavaScript Fundamentals
4 pages
Isomorphic Graphs Explained
No ratings yet
Isomorphic Graphs Explained
24 pages
Computer Architecture Overview and Functions
No ratings yet
Computer Architecture Overview and Functions
86 pages
SGGU BCA Semester 3 Syllabus 2023-24
No ratings yet
SGGU BCA Semester 3 Syllabus 2023-24
34 pages
8085 Microprocessor Lab Manual
No ratings yet
8085 Microprocessor Lab Manual
58 pages
Introduction to Data Structures Basics
No ratings yet
Introduction to Data Structures Basics
14 pages
Shift Register Operations and Types
100% (1)
Shift Register Operations and Types
21 pages
Understanding Data Types in Communication
No ratings yet
Understanding Data Types in Communication
21 pages
Std 11 Computer MCQ Answers
No ratings yet
Std 11 Computer MCQ Answers
29 pages
CPU Performance Metrics Explained
No ratings yet
CPU Performance Metrics Explained
13 pages
10 MCQs on HTML Forms and KompoZer
No ratings yet
10 MCQs on HTML Forms and KompoZer
59 pages
IT 2nd Year 3rd Semester Previous Year Paper - Compressed
No ratings yet
IT 2nd Year 3rd Semester Previous Year Paper - Compressed
43 pages
Object-Oriented Programming MCQs
No ratings yet
Object-Oriented Programming MCQs
4 pages
8085 Microprocessor Block Diagram Overview
No ratings yet
8085 Microprocessor Block Diagram Overview
32 pages
Granularity and Program Partitioning
No ratings yet
Granularity and Program Partitioning
39 pages
Pipelining and Branch Prediction Basics
100% (1)
Pipelining and Branch Prediction Basics
58 pages
Sai Kasht Nivaran Mantra Overview
100% (1)
Sai Kasht Nivaran Mantra Overview
3 pages
Pipeline Performance and Speedup Analysis
100% (1)
Pipeline Performance and Speedup Analysis
13 pages
Tutorial 2 CS305
No ratings yet
Tutorial 2 CS305
3 pages
Pipelined CPU Performance Analysis
No ratings yet
Pipelined CPU Performance Analysis
6 pages
Instruction Pipeline Analysis and Solutions
100% (1)
Instruction Pipeline Analysis and Solutions
5 pages
UniLiquidityTracker for Uniswap V4 Insights
No ratings yet
UniLiquidityTracker for Uniswap V4 Insights
4 pages
Lehlohonolo Thabiso Godfrey Lebete Amended
No ratings yet
Lehlohonolo Thabiso Godfrey Lebete Amended
3 pages
Buea Public Works School Timetable 2025
No ratings yet
Buea Public Works School Timetable 2025
18 pages
HSK 2 Grammar Points Overview
No ratings yet
HSK 2 Grammar Points Overview
17 pages
IR 35217 Data Overview
No ratings yet
IR 35217 Data Overview
5 pages
Web-Based Campus Event Management System
No ratings yet
Web-Based Campus Event Management System
6 pages
Understanding SQL Joins and Types
50% (2)
Understanding SQL Joins and Types
24 pages
Cellulose Extraction from Banana Peels
No ratings yet
Cellulose Extraction from Banana Peels
14 pages
ISO 2446 - Milk - Determination of Fat Content (Butirometro)
No ratings yet
ISO 2446 - Milk - Determination of Fat Content (Butirometro)
17 pages
Elevator System Analysis Report
100% (1)
Elevator System Analysis Report
16 pages
The Lighthouse at The Edge of Winter
No ratings yet
The Lighthouse at The Edge of Winter
9 pages
Leatherhead Theatre Tech Rider 2019
No ratings yet
Leatherhead Theatre Tech Rider 2019
11 pages
Thermal Recovery Methods in EOR Analysis
No ratings yet
Thermal Recovery Methods in EOR Analysis
35 pages
NS-2 Simulation Tutorial Guide
No ratings yet
NS-2 Simulation Tutorial Guide
3 pages
Labour Welfare in Tamil Nadu Transport
No ratings yet
Labour Welfare in Tamil Nadu Transport
8 pages
Sample Loan Agreement Template
No ratings yet
Sample Loan Agreement Template
2 pages
Sliding Door Design Specifications
No ratings yet
Sliding Door Design Specifications
2 pages
IPC TM-650: Tensile Testing Method
No ratings yet
IPC TM-650: Tensile Testing Method
3 pages
Disciplines Contributing to Organizational Behavior
No ratings yet
Disciplines Contributing to Organizational Behavior
1 page
GCSE English Language Exam Paper 1
No ratings yet
GCSE English Language Exam Paper 1
24 pages
Creating More Effective Graphs
No ratings yet
Creating More Effective Graphs
8 pages
Head & Neck Anatomy Worksheets
No ratings yet
Head & Neck Anatomy Worksheets
4 pages
Step-by-Step Lapbook Creation Guide
No ratings yet
Step-by-Step Lapbook Creation Guide
6 pages
Storyline 1 2 PDF Free
No ratings yet
Storyline 1 2 PDF Free
2 pages
Class 10 Trigonometry MCQ Test
No ratings yet
Class 10 Trigonometry MCQ Test
3 pages
New House Design for Erf 280
No ratings yet
New House Design for Erf 280
1 page
KT5G-2P1112 Contrast Sensor Data Sheet
No ratings yet
KT5G-2P1112 Contrast Sensor Data Sheet
7 pages
Bahrain Marina Hotel P2 Loading Plan
No ratings yet
Bahrain Marina Hotel P2 Loading Plan
1 page
Residential Layout for Queensland Property
No ratings yet
Residential Layout for Queensland Property
1 page
A Santa Cecilia: Vaclav Klimek Collection
No ratings yet
A Santa Cecilia: Vaclav Klimek Collection
24 pages

Cache and Pipeline Performance Analysis

Uploaded by

Cache and Pipeline Performance Analysis

Uploaded by

Q1.

Q6. Assume a 500 MHz machine with

What is the number of cycles needed to execute the following loop?

You might also like