66 Chapter A.
Tutorial Solution
A.1 Assignment 01 (2025)
A.1.1 Exercise 01
1. Let’s first find the number of clock cycles required for the program on A (see equation 8,
slide 84):
CPU clock cyclesA
CPU execution timeA =
Clock rateA
CPU clock cyclesA
10 seconds = cycles (A.1)
2 × 109 second
cycles
CPU clock cyclesA = 10 seconds × 2 × 109 = 20 × 109 cycles
second
CPU time for B can be found using this equation:
1.2 ×CPU clock cyclesA
CPU execution timeB =
Clock rateB
1.2 × 20 × 109 cycles
6 seconds =
Clock rateB
1.2 × 20 × 109 cycles 0.2 × 20 × 109 cycles 4 × 109 cycles
Clock rateB = = = = 4 GHz
6 seconds seconds seconds
(A.2)
To run the program in 6 seconds, B must have twice the clock rate of A.
2. We know that each computer executes the same number of instructions for the program; let’s
call this number I. First, find the number of processor clock cycles for each computer:
CPU clock cyclesA = I × 2.0
(A.3)
CPU clock cyclesB = I × 1.2
Now we can compute the CPU time for each computer (see equation 7, slide 84):
CPU execution timeA = CPU clock cyclesA ×Clock cycle time
(A.4)
= I × 2.0 × 250 ps = 500 × I ps
Likewise, for B:
CPU execution timeB = I × 1.2 × 500 ps = 600 × I ps (A.5)
Clearly, computer A is faster. The amount faster is given by the ratio of the execution times:
CPU per f ormanceA Execution timeB 600 × I ps
= = = 1.2 (A.6)
CPU per f ormanceB Execution timeA 500 × I ps
A.1 Assignment 01 (2025) 67
We can conclude that computer A is 1.2 times as fast as computer B for this program.
3. Sequence 1 executes 2+1+2 = 5 instructions. Sequence 2 executes 4+1+1 = 6 instructions.
Therefore, sequence 1 executes fewer instructions.
We can use the equation for CPU clock cycles based on the instruction count and CPI to find
the total number of clock cycles for each sequence:
n
CPU clock cycles = ∑ (CPIi ×Ci ) (A.7)
i=1
This yields
CPU clock cycles1 = (2 × 1) + (1 × 2) + (2 × 3) = 2 + 2 + 6 = 10 cycles
(A.8)
CPU clock cycles2 = (4 × 1) + (1 × 2) + (1 × 3) = 4 + 2 + 3 = 9 cycles
So code sequence 2 is faster, even though it executes one extra instruction. Since code
sequence 2 takes fewer overall clock cycles but has more instructions, it must have a lower
CPI. The CPI values can be computed by
CPU clock cycles
CPI =
Instruction count
CPU clock cycles1 10
CPI1 = = = 2.0 (A.9)
Instruction count1 5
CPU clock cycles2 9
CPI2 = = = 1.5
Instruction count2 6
4. From equation 9 slide 86:
CPU timeold =ICold ×CPIold ×Clock Cycle Time
(A.10)
CPU timenew =ICnew ×CPInew ×Clock Cycle Time
The Clock Cycle Time is the same because it is the same computer:
CPU timeold CPU timenew
=
ICold ×CPIold ICold × 0.6 ×CPIold × 1.1
CPU timenew (A.11)
15 =
0.6 × 1.1
CPU timenew =15 × 0.6 × 1.1 = 9.9 seconds
Therefore, b is the correct answer.
5. a. The clock rates are the inverse of the clock cycle time.
1
P1 = = 3 GHz
0.33 × 10−9 seconds
1
P2 = = 2.5 GHz (A.12)
0.40 × 10−9 seconds
1
P3 = = 4GHz
0.25 × 10−9 seconds
68 Chapter A. Tutorial Solution
Thus, P3 has the highest clock rate.
b. Since all have the same instruction set architecture, all programs have the same in-
struction count, so we can measure performance as the product of average clock cycles
per instruction (CPI) times clock cycle time, which is also the average time of an
instruction:
i. P1 = 1.5 × 0.33 ns = 0.495 ns (you could also calculate average instruction time
using CPI/clock rate, or 1.5/3.0 GHz = 0.495 ns)
ii. P2 = 1.0 × 0.40 ns = 0.400 ns (or 1.0/2.5 GHz = 0.400 ns)
iii. P3 = 2.2 × 0.25 ns = 0.550 ns (or 1.0/4.0 GHz = 0.550 ns)
P2 is the fastest and P3 is the slowest. Despite having the highest clock rate, on average
P3 takes so many more clock cycles that it loses the benefit of a higher clock rate.
c. The CPI calculation was based on running some benchmarks. If they are representative
of real workloads, the answers to these questions are correct. If the benchmarks are
unrealistic, they may not be. The difference between things that are easy to advertise,
like clock rate and actual performance highlights the importance of developing good
benchmarks.
A.1.2 Exercise 02
1. For the three processors, we have the clock rate and the CPI:
Clock cycles
Clock rate =
Second
Clock cycles
CPI =
Instruction
(A.13)
Clock rate × Second =CPI × Instruction
Instruction Clock rate
=
Second CPI
3 × 109
per f ormance o f P1 (instruction/sec) = = 2 × 109
1.5
2.5 × 109
per f ormance o f P2 (instruction/sec) = = 2.5 × 109 (A.14)
1.0
4 × 109
per f ormance o f P3 (instructions/sec) = = 1.8 × 109
2.2
A.1 Assignment 01 (2025) 69
2.
cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
(A.15)
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
30 × 109
IC(P1) = = 20 × 109
1.5
25 × 109
IC(P2) = = 25 × 109
1 (A.16)
40 × 109
IC(P3) = = 18.18 × 109
2.2
3.
CPInew = CPIold × 1.2
CPI(P1) = 1.8
(A.17)
CPI(P2) = 1.2
CPI(P3) = 2.6
IC ×CPI
Clock rate =
time
20 × 109 × 1.8
Clock rate(P1) = = 5.14GHz
7
(A.18)
25 × 109 × 1.2
Clock rate(P2) = = 4.28GHz
7
18.18 × 109 × 2.6
Clock rate(P1) = = 6.75GHz
7
A.1.3 Exercise 03
1. Class A: 105 instructions Class B: 2 × 105 instructions Class C: 5 × 105 instructions Class D:
2 × 105 instructions
70 Chapter A. Tutorial Solution
IC ×CPI
Time =
clock rate
(1 × 1 × 105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)
Total time P1 = = 10.4 × 10−4 s
(2.5 × 109 )
(1 × 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)
Total time P2 = = 6.66 × 10−4 s
(3 × 109 )
(A.19)
2.
10.4 × 10−4 × 2.5 × 109
CPI(P1) = = 2.6
106
(A.20)
6.66 × 10−4 × 3 × 109
CPI(P2) = = 2.0
106
3. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105
A.1.4 Exercise 04
I forgot to mention that N = 2.
From equations 2 and 3 slide 78.
1.
wa f er area π × 7.52
die area15cm = = = 2.10 cm2
dies per wa f er 84
1
yield15cm = = 0.9593
(1 + (0.020 × 2.10
2 ))
2
(A.21)
wa f er area π × 102
die area20cm = = = 3.14 cm2
dies per wa f er 100
1
yield20cm = = 0.9093
(1 + (0.031 × 3.14
2 ))
2
2. From equation 1 slide 78.
12
cost/die15cm = = 0.1489
(84 × 0.9593)
(A.22)
15
cost/die20cm = = 0.1650
(100 × 0.9093)
A.1 Assignment 01 (2025) 71
3.
wa f er area π × 7.52
die area15cm = = = 1.91cm2
dies per wa f er (84 × 1.1)
1
yield15cm = = 0.9575
(1 + (0.020 × 1.15 × 1.91
2 ))
2
(A.23)
wa f er area π × 102
die area20cm = = = 2.86cm2
dies per wa f er 100 × 1.1
1
yield20cm = = 0.9082
(1 + (0.03 × 1.15 × 2.86
2 ))
2
4.
√ √
(1 − y) (1 − 0.92)
de f ects per area0.92 = √ = √ = 0.043 de f ects/cm2
( y × die 2area ) ( 0.92 × 22 )
(A.24)
√ √
(1 − y) (1 − 0.95)
de f ects per area0.95 = √ = √ = 0.026 de f ects/cm2
( y × die 2area ) ( 0.95 × 22 )