0% found this document useful (0 votes)
2 views3 pages

Tutorial 2 CS305

The document contains a tutorial on computer architecture focusing on various aspects of pipelining, branch prediction, instruction encoding, and performance evaluation of processors. It includes problems related to branch hazards, speedup calculations, instruction execution times, and instruction set architectures. Each section presents a scenario requiring analysis and calculations to understand the impact of different architectural features on processor performance.

Uploaded by

mt2502193025
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Tutorial 2 CS305

The document contains a tutorial on computer architecture focusing on various aspects of pipelining, branch prediction, instruction encoding, and performance evaluation of processors. It includes problems related to branch hazards, speedup calculations, instruction execution times, and instruction set architectures. Each section presents a scenario requiring analysis and calculations to understand the impact of different architectural features on processor performance.

Uploaded by

mt2502193025
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Architecture (CS-305)

Tutorial-2

1. Suppose the branch frequencies (as percentages of all instructions) are as follows: –Conditional
branches 15%
–Jumps and calls 1%
–Taken conditional branches 60% are taken

(a) We are examining a four-stage pipeline where the branch is resolved at the end of the
second cycle for unconditional branches and at the end of the third cycle for conditional
branches. Assuming that only the first pipe stage can always be completed independent
of whether the branch is taken and ignoring other pipeline stalls, how much faster would
the machine be without any branch hazards?
(b) Now assume a high-performance processor in which we have a 15-deep pipeline where
the branch is resolved at the end of the fifth cycle for unconditional branches and at the
end of the tenth cycle for conditional branches. Assuming that only the first pipe stage
can always be completed independent of whether the branch is taken and ignoring other
pipeline stalls, how much faster would the machine be without any branch hazards?

2. A processor X1 operating at 2 GHz has a standard 5-stage RISC instruction pipeline having
a base CPI (cycles per instruction) of one without any pipeline hazards. For a given program
P that has 30% branch instructions, control hazards incur 2 cycles stall for every branch. A
new version of the processor X2 operating at same clock frequency has an additional branch
predictor unit (BPU) that completely eliminates stalls for correctly predicted branches. There
is neither any savings nor any additional stalls for wrong predictions. There are no structural
hazards and data hazards for X1 and X2. If the BPU has a prediction accuracy of 90%, the
speed up (rounded off to two decimal places) obtained by X2 over X1 in executing P.

3. Consider a non-pipelined processor operating at 2.5 GHz. It takes 5 clock cycles to complete
an instruction. You are going to make a 5-stage pipeline out of this processor. Overheads
associated with pipelining force you to operate the pipelined processor at 2 GHz. In a given
program, assume that 30% are memory instructions, 60% are ALU instructions and the rest
are branch instructions. 5% of the memory instructions cause stalls of 50 clock cycles each due
to cache misses and 50% of the branch instructions cause stalls of 2 cycles each. Assume that
there are no stalls associated with the execution of ALU instructions. For this program, the

1
Tutorial-2

speedup achieved by the pipelined processor over the non-pipelined processor (round off to 2
decimal places).

4. The instruction pipeline of a processor has the following stages: Instruction Fetch, Instruction
Decode, Operand Fetch, Perform Operation and Writeback The IF, ID, OF and WB stages
take 1clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO
stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the
remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and
no control hazards. Compute the number of clock cycles required for completion of execution
of the sequence of instructions.

5. An instruction pipeline has five stages where each stage takes 2 nanoseconds and all instructions
use all five stages. Branch instructions are not overlapped, i.e., the instruction after the branch
is not fetched till the branch instruction is completed. Under ideal conditions. (a) Calculate the
average instruction execution time assuming that 20% of all instruction executed are branch
instructions. Ignore the fact that some branch instructions may be conditional.
(b) If a branch instruction is a conditional branch instruction, the branch need not be taken. If
the branch is not taken, the following instructions can be overlapped. When 50% of all branch
instructions are conditional branch instructions, and 80% of the conditional branch instructions
are such that the branch is taken, calculate the average instruction execution time.

6. A five-stage pipeline has stage delays of 150, 120, 150, 160 and 140 nanoseconds. The registers
that are used between the pipeline stages have a delay of 5 nanoseconds each. The total time
to execute 100 independent instructions on this pipeline, assuming there are no pipeline stalls.

7. A processor has 64 registers and uses 16-bit instruction format. It has two types of instructions:
I-type and R-type. Each I-type instruction contains an opcode, a register name, and a 4-bit
immediate value. Each R-type instruction contains an opcode and two register names. If there
are 8 distinct I-type opcodes, then the maximum number of distinct R-type opcodes will be.

8. A processor has 16 integer registers and 64 floating point [Link] uses a 2-byte instruction
format. There are four categories of instructions: Type-1,Type-2, Type-3, and Type-4. Type-1
category consists of four instructions, each with 3 integer register operands. Type-2 category
consists of eight instructions, each with 2 floating point register [Link]-3 category
consists of fourteen instructions, each with one integer register operand and one floating point
register operand. Type-4category consists of N instructions, each with a floating point register
operand What is the maximum value of N.

9. For the following, we consider instruction encoding for instruction set architectures.
(a) Consider the case of a processor with an instruction length of 14 bits and with 64 general-
purpose registers so the size of the address fields is 6 bits. Is it possible to have instruction
encodings for the following?
–3 two-address instructions
–63 one-address instructions
–45 zero-address instructions

Page 2 of 3
Tutorial-2

(b) Assuming the same instruction length and address field sizes as above, determine if it is
possible to have
–3 two-address instructions
–65 one-address instructions
–35 zero-address instructions
Explain your answer.
(c) Assume the same instruction length and address field sizes as above. Further assume there
are already 3 two-address and 24 zero-address instructions. What is the maximum number
of one-address instructions that can be encoded for this processor?
(d) Assume the same instruction length and address field sizes as above. Further assume there
are already 3 two-address and 65 zero-address instructions. What is the maximum number
of one-address instructions that can be encoded for this processor?

10. Consider the following instruction sequence where register R1, R2 and R3 are general purpose
and MEMORY[X] denotes the content at the memory location X.
Instruction Semantics Instruction Size (bytes)

MOV R1, (5000)  MEMORY[5000]


R1 4

MOV R2, (R3) R2  MEMORY[R3] 4

ADD R2, R1 R2  R1 + R2 2

MOV (R3), R2 MEMORY[R3]  R2 4

INC R3 R3  R3 + 1 2

DEC R1 R1  R1 – 1 2

BNZ 1004 Branch if not zero to 2


the given absolute address

HALT Stop 1

Assume that the content of the memory location 5000 is 10, and the content of the register R3
is 3000. The content of each of the memory locations from 3000 to 3010 is 50. The instruction
sequence starts from the memory location 1000. All the numbers are in decimal format. Assume
that the memory is byte addressable.
After the execution of the program, the content of memory location 3010 is

Page 3 of 3

You might also like