CS 8491
COMPUTER ARCHITECTURE
TEXT BOOKS
• 1. David A. Patterson and John L. Hennessy, Computer Organization and
Design: The Hardware/Software Interface, Fifth Edition, Morgan
Kaufmann / Elsevier, 2014.
• 2. Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian,
Computer Organization and Embedded Systems, Sixth Edition, Tata
McGraw Hill, 2012.
CS8491- CA- OJECTIVES
• To learn the basic structure and operations of a computer.
• To learn the arithmetic and logic unit and implementation of fixed-
point and floating point arithmetic unit.
• To learn the basics of pipelined execution.
• To understand parallelism and multi-core processors.
• To understand the memory hierarchies, cache memories and virtual
memories.
UNIT-1
BASIC STRTUCTURE OF A COMPUTER SYSTEM
Functional Units
Basic Operational Concepts
Performance
Instructions: Language of the Computer – Operations, Operands – Instruction representation –
Logical operations – decision making
MIPS Addressing
• What is Computer?
The computer is an electronic device designed in such a way, it
automatically accepts and stores input data, process them and produce
results under the direction of a detailed step by step program.
• Computer architecture is concerned with the structure and behavior of the
various functional modules of the computer and how they interact to
provide the processing needs of the user.
Eight Great Ideas in Computer Architecture
Computer architects have invented eight great ideas in the last 60 years of
computer design development. These ideas are so powerful.
Eight Great Ideas in Computer Architecture
Computer architects have invented eight great ideas in the last 60
years of computer design. These ideas are so powerful.
1. Design for Moore’s Law – Components in IC - miniaturization
Moore’s Law states that integrated circuit resources double every
18–24 months. It is stated based on the prediction of growth in the
IC capacity. We use an “up and to the right” Moore’s Law graph
to represent designing for rapid change.
Eight Great Ideas in Computer Architecture
2. Use Abstraction to Simplify Design
(Resources/Delay/Operating frequency/complexity/Power …)
Both computer architects and programmers had to invent
techniques to make themselves more productive.
A major productivity technique for hardware and software is
to use abstractions to represent the design at different levels
of representation; lower-level details are hidden to offer a
simpler model at higher levels.
Eight Great Ideas in Computer Architecture
• 3. Make the Common Case Fast
Eight Great Ideas in Computer Architecture
4. Performance via Parallelism
Performance can be improved by doing operations in
parallel.
Throughput will be increased when using parallel
architecture
[Link] via Pipelining
A particular pattern of parallelism is so common in computer
architecture that is called as pipelining. Instead of doing the work
one by one some kind of parallelism can be introduced to improve
performance.
6. Performance via Prediction
In some cases it can be faster on average to guess and start
working rather than wait until you know for sure. It is based on an
assumption that the mechanism to recover from a misprediction is
not too expensive and your prediction is relatively accurate
7. Hierarchy of Memories
Programmers want memory to be fast, large, and
cheap. Architects have found that they can
address these conflicting demands with a
hierarchy of memories. In that the fastest,
smallest, and most expensive memory per bit at
the top of the hierarchy and the slowest, largest,
and cheapest per bit at the bottom
8. Dependability via Redundancy
Computers not only need to be fast; they need to be
dependable. Since any physical device can fail, we
make systems dependable by including redundant
components. Those redundant components can take
over when a failure occurs and to help detect
failures. We use the tractor-trailer as our icon, since
the dual tires on each side of its rear axles allow the
truck to continue driving even when one tire fails.
COMPONENTS OF A COMPUTER SYSTEM
BASIC FUNCTIONAL UNIT
Arithmetic
Input and
logic
Memory
Output Control
I/O Processor
Basic Components
• Computer is a collection of several components
working together.
• The five basic components of a computer system includes input unit,
output unit, memory unit, arithmetic and logic unit and control unit.
• The Central Processing Unit (CPU) includes, memory, arithmetic and
logic unit and control unit.
• The CPU is the brain of any computer system. This unit is responsible
for all events inside the computer. It controls all the other units of
computer system.
Input Unit
An input unit performs the following functions:
• It accepts (or reads) the list of instructions and data from the
outside world.
• It converts these instructions and data in computer acceptable
format.
• It supplies the converted instructions and data to the computer
system for further processing.
• Some of the commonly used input devices are keyboard,
mouse, joystick, digital camera, trackball, scanners etc.
Output Unit
• The following functions are performed by an output unit:
• It accepts the results produced by the computer which are in
coded form.
• It converts these coded results to human readable form.
• It supplied the converted results to the outside world.
Memory Unit
•The memory unit is used to store programs and data. The data and
instructions that are entered into the computer system through input units
have to be stored inside the computer before the actual processing starts.
•Similarly, the results produced by the computer after processing must also
be kept somewhere inside the computer system before being passed on to
the output units. Moreover, the intermediate results produced by the
computer must also be preserved for ongoing processing.
•The Storage Unit or the primary / main memory of a computer system is
designed to do all these things. It provides space for storing data and
instructions, space for intermediate results and also space for the final
results. The main memory is fast but small in size and expensive.
•The specific functions of the storage unit are to store:
•All the data to be processed and the instruction required for processing.
•Intermediate results of processing.
•Final results of processing before these results are released to an output
device.
Secondary Storage
•A secondary storage device is also known as an auxiliary storage device
or external storage.
•It can be any storage device beyond the primary storage that enables
permanent data storage. Typically, secondary storage allows for the
storage of data ranging from a few megabytes to petabytes (1024
terabytes). Some of the secondary storage devices are hard disk, floppy
disk, CD-ROM, magnetic tape.
Arithmetic and Logic Unit (ALU)
• Arithmetic and logic unit performs all the arithmetic and
logical operations.
• Arithmetic operations like addition, subtraction, multiplication
and logical operations such as comparisons are performed in
ALU.
• Data is moved from main memory to ALU for processing and
after the completion of processing, the final results are moved
back main memory.
Control Unit
• The control unit directs and controls the activities of the
internal and external devices.
• It interprets the instructions fetched into the computer,
determines what data are needed, where it is stored, where to
store the results of the operation, and sends the control signals
to the devices involved in the execution of the instructions.
• The following functions are performed by the control unit:
• Fetches the instruction stored in memory
• Identify the operation to be performed
• Determines the data needed for that operation
• Identify the location to store the result
• Generate control signals to execute the desired operation
TECHNOLOGY
EXPLANATION
TECHNOLOGY
NOTES
Technologies for Building Processors and Memory
The Table shows the technologies that have been used over
time, with an estimate of the relative performance per unit
cost for each technology. Transistor is an on / off switch
controlled by electricity. The integrated circuit (IC)
combined dozens to hundreds of transistors into a single chip.
Very Large Scale Integration (VLSI) circuit is a device
containing hundreds of thousands to millions of transistors.
Growths in DRAM Capacity
10000000
4000000
4G
1000000 2G
1G
512M
100000 256M
128M
64M
Capacity
10000
4M
1000 1M
256K
100
64K
16K
10
1976 1977 1980 1983 1985 1989 1992 1996 1998 2000 2004 2007 2010 2012
Year of Introduction
• This Graph shows the growth in Dynamic Random Access
Memory (DRAM) capacity since 1977. For decades, the
industry has consistently quadrupled capacity every 3 years,
resulting in an increase in excess of 16,000 times.
The Chip Manufacturing Process
• A substance found in sand called silicon is used for
manufacturing ICs. It is converted into one of the following
three devices with special chemical process.
1) Excellent conductors of electricity (using either
microscopic copper or aluminum wire)
2) Excellent insulators from electricity (like plastic
sheathing or glass)
3) Areas that can conduct or insulate under special
conditions (as a switch).
• Transistors fall in the last category. A VLSI circuit is just
billions of combinations of conductors, insulators, and
switches manufactured in a single small package.
Block diagram for
Chip Manufacturing Process
7 Steps used in manufacturing ICs
[Link] process starts with a silicon crystal ingot, which looks like a giant
sausage. The size of ingots is 8–12 inches in diameter and about 12–24
inches long.
2. An ingot is finely sliced into wafers no more than 0.1 inches
thick.
[Link] wafers then go through a series of processing steps, during which
patterns of chemicals are placed on each wafer, creating the transistors,
conductors, and insulators.
4.A single microscopic flaw in the wafer itself or in one of the dozens of
patterning steps can result in that area of the wafer failing. These defects
make it virtually impossible to manufacture a perfect wafer.
5. The patterned wafer is then chopped up, or diced, into these
components, called dies and more informally known as
chips.
6. Yield is the percentage of good dies from the total number of
dies on the wafer.
7 Once good dies are found, they are connected to the
input/output pins of a package, using a process called
bonding. These packaged parts are tested a final time, since
mistakes can occur in packaging, and then they are shipped to
customers.
Equations to find
Cost of an IC
PROBLEMS
Example-1:
• Assume a 15 cm diameter wafer has a cost of 12, contains 84
dies, and has 0.020 defects / cm2. Assume a 20 cm diameter wafer
has a cost of 15, contains 100 dies, and has 0.031 defects/cm2.
– Find the yield for both wafers.
– Find the cost per die for both wafers.
– If the number of dies per wafer is increased by 10% and the
defects per area unit increases by 15%, find the die area and
yield.
Answer
• Given
PERFORMANCE
EXPLANATION
Performance
• When trying to choose among different computers,
performance is an important attribute.
• Accurately measuring and comparing different computers is
critical to purchasers.
• Response time, also called as execution time is defined as the
total time required for the computer to complete a task,
including disk accesses, memory accesses, I/O activities,
operating system overhead, CPU execution time, and so on.
• Throughput, also called as bandwidth is another measure of
performance, it is the number of tasks completed per unit time.
To maximize performance, we want to minimize response time or
execution time for some task. Thus, we can relate performance
and execution time for a computer X:
• This means that for two computers X and Y, if the
performance of X is greater than the performance of Y,
we have
That is, the execution time on y is longer than on X, if X is
faster than Y.
In discussing a computer design, we oft en want to relate the
performance of two different computers quantitatively. We will
use the phrase “X is n times faster than Y”—or equivalently
“X is n times as fast as Y”—to mean
• If X is n times as fast as Y, then the execution time on Y is n
times as long as it is on X:
Measuring Performance
•Time is the measure of computer performance: the computer that
performs the same amount of work in the least time is the fastest.
Program execution time is measured in seconds per program.
•CPU execution time or simply CPU time, which recognizes this
distinction, is the time the CPU spends computing for this task and
does not include time spent waiting for I/O or running other programs.
•CPU time can be further divided into the CPU time spent in the
program, called user CPU time, and the CPU time spent in the
operating system performing tasks on behalf of the program, called
system CPU time.
• Almost all computers are constructed using a clock that
determines when events take place in the hardware. These
discrete time intervals are called clock cycles.
• Designers refer to the length of a clock period both as the
time for a complete clock cycle (e.g., 250 picoseconds) and as
the clock rate (e.g. 4 GHz), which is the inverse of the clock
period.
Alternatively
•Execution time can be defined as the number of instructions executed
multiplied by the average time per instruction. Therefore, the number
of clock cycles required for a program can be written as,
• The term clock cycles per instruction, which is the average
number of clock cycles each instruction takes to execute, is
often abbreviated as CPI.
• Since different instructions may take different amounts of time
depending on what they do, CPI is an average of all the
instructions executed in the program.
• The basic performance equation can be rewritten in terms of
instruction count, that is, number of instructions executed by
the program, CPI and clock rate.
INSTRUCTION
EXPLANATION
INSTRUCTIONS
• The words of a computer’s language are called instructions,
and its vocabulary is called an instruction set.
• Here we will study about Instruction set of MIPS processor.
• The idea that instructions and data of many types can be
stored in memory as numbers is known as stored program
concept.
Operations
• The MIPS assembly language notation
add a, b, c
Instructs a computer to add the two variables b and c and to
put their sum in a.
• The following sequence of instructions adds the four variables:
add a, b, c # The sum of b and c is placed in a
add a, a, d # The sum of b, c, and d is now in a
add a, a, e # The sum of b, c, d, and e is now in a
• Thus, it takes three instructions to sum the four variables. The
words to the right of the sharp symbol (#) on each line above
are comments for the human reader, so the computer ignores
them.
MIPS OPERANDS
• MIPS uses either registers or memory locations as operands.
• MIPS uses 32 registers for storing data temporarily and for
fast access.
• Memory locations are accessed by data transfer instructions.
MIPS assembly language
• There are five categories of instructions namely, arithmetic,
data transfer, logical, conditional branch and
unconditional jump.
Data Transfer Instructions
Logical Instructions
Conditional Branch Instructions
Unconditional Jump Instructions