0% found this document useful (0 votes)
17 views59 pages

Computer Architecture Overview and Evolution

The document covers the evolution of computer systems, detailing early devices and the advancements leading to modern computers, including the Von Neumann architecture. It distinguishes between computer architecture, which focuses on high-level design and functionality, and computer organization, which deals with the physical implementation and structure of components. Additionally, it explains the roles of microprocessors, system buses, and the fundamental functions and structure of computer systems.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views59 pages

Computer Architecture Overview and Evolution

The document covers the evolution of computer systems, detailing early devices and the advancements leading to modern computers, including the Von Neumann architecture. It distinguishes between computer architecture, which focuses on high-level design and functionality, and computer organization, which deals with the physical implementation and structure of components. Additionally, it explains the roles of microprocessors, system buses, and the fundamental functions and structure of computer systems.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture note on Computer Architecture and Organization CSC303

MODULE ONE

COMPUTER SYSTEM EVOLUTION

 Early counting devices, generations of computers (Vacuum tube, Transistor, Integrated


Circuit, Very Large Scale Integration, AI).
 Classification
o Types of data
Digital, Analogue, Hybrid
o Purpose
General purpose, Special purpose
o Size
Notebook, microcomputer, minicomputer, mainframe, supercomputer

The ancestors of the modern age computer were the mechanical and electromechanical
devices. These include:- Blaise Pascal’s machine, Difference Engine, Anal machine,
Difference Engine, Analytical Engine, ENIAC, EDSAC, EDVAC, UNIVAC, MARK I, II, III,
etc

Computer technology has made incredible improvement in the past half century. In the early
part of computer evolution, there were no stored-program computer, the computational
power was less and the sizes of the computers were very large. Nowadays, a personal
computer has more computational power, memory, disk storage, smaller in size and
available in affordable cost. This rapid improvement is as a result of advances in the
technology used to build computers and innovation in computer design.

The Von Neumann Machine. This is also referred to as the stored program computers.
Stored program computers have the following characteristics:
- Three hardware systems:
• A central processing unit (CPU)
• A main memory system
• An I/O system
-The capacity to carry out sequential instruction and sequential processing.
A single data path between the CPU and main memory. This single path is known as the Von
Neumann bottleneck.

The duty of a computer designer/architect is to

 determine what attributes are important for a new computer, then


1
Lecture note on Computer Architecture and Organization CSC303
 design a computer to maximize performance and energy efficiency while staying within
cost, power, and availability constraints. This task has many aspects, including instruction
set design, functional organization, logic design, and implementation. The
implementation may encompass integrated circuit design, packaging, power, and cooling.
Optimizing the design requires familiarity with a very wide range of technologies, from
compilers and operating systems to logic design and packaging .

Overview of Microprocessor Architecture

A Microprocessor is a multipurpose programmable logic device which reads the binary


instructions from a storage device called ‘Memory’ accepts binary data as input and process
data according to the instructions and gives the results as output. Therefore, the
Microprocessor as a programmable digital device, which can be used for both data processing
and control applications.
A microcomputer system as seen in Figure 1, consists of a CPU (microprocessor), memories
(primary and secondary) and I/O devices as shown in the figure below. The memory and I/O
devices are linked by data and address (control) buses. The CPU communicates with only
one peripheral at a time. The peripheral is been enabled by the control signal.

For example to send data to the output device, the CPU places the device address on the
address bus, data on the data bus and enables the output device.

System Buses
Buses are wires connecting memory & I/O to microprocessor. 3 main types of Buses;
– Address Bus
• Unidirectional
• Identifying peripheral or memory location
– Data Bus
• Bidirectional
• Transferring data
– Control Bus
• Synchronization signals
• Timing signals
• Control signal

Figure1: Block diagram of a Microcomputer


2
Lecture note on Computer Architecture and Organization CSC303
Computer Architecture and Computer Organization

Changes in technology not only influence organization but also result in the introduction
of more powerful and more complex architecture. However, because a computer
organization must be designed to implement a particular architectural specification, a
thorough treatment of organization requires a detailed examination of architecture as
well. Computer architecture comes before Computer organization.

Computer architecture and computer organization are related but distinct in concepts.

Computer Architecture is a functional description of requirements and design


implementation for the various parts of a computer. It deals with the functional behavior of
computer systems. It comes before the computer organization while designing a computer.

Computer Architecture refers to the design of the internal workings of a computer system,
including the CPU, memory, and other hardware components. It involves decisions about
the organization of the hardware, such as the instruction set architecture, the data path
design, and the control unit design.
Computer Architecture is concerned with optimizing the performance of a computer system
and ensuring that it can execute instructions quickly and efficiently.
On the other hand,

Computer Organization refers to the operational units and their interconnections that
implement the architecture specification. It deals with how the components of a computer
system are arranged and how they interact to perform the required operations.

Computer Organization is concerned with the physical implementation of the architecture


design and includes decisions about the interconnection and communication between
components, such as the bus structure, memory hierarchy, and input/output systems.

Computer Organization comes after the decision of Computer Architecture first.

Computer Organization is how operational attributes are linked together and contribute to
realizing the architectural specification, hence Computer Organization deals with a
structural relationship.

In summary, computer architecture is focused on the design of the internal workings of a


computer system, while computer organization is focused on the implementation of that
design. Computer architecture is concerned with the high-level design decisions, while
computer organization deals with the low-level implementation details.
Therefore, architecture describes what the computer does, organization describes how it does it.

3
Lecture note on Computer Architecture and Organization CSC303
Summary difference between Computer Architecture and Computer Organization:

S. No. Computer Architecture Computer Organization

Architecture describes what the The Organization describes how it does


1. computer does. it.

Computer Architecture deals with


Computer Organization deals with a
the functional behavior of
structural relationship.
2. computer systems.

It deals with high-level design It deals with low-level design issues as


3. issues as seen in Figure 2 seen in Figure 2.

For designing a computer, an


For designing a computer, its
organization is decided after its
architecture is fixed first.
4. architecture.

Computer Architecture is also


Computer Organization is frequently
called Instruction Set Architecture
called microarchitecture.
5. (ISA).

Computer Architecture comprises


Computer Organization consists of
logical functions such as
physical units like circuit designs,
instruction sets, registers, data
peripherals, and adders.
6. types, and addressing modes.

4
Lecture note on Computer Architecture and Organization CSC303

Figure 2: Overview of a Computer System

Computer System Structure and Function

A computer system, like any system, consists of an interrelated set of components. The system
is best characterized in terms of structure, the way in which components are interconnected,
and function, the operation of the individual components. Furthermore, a computer’s
organization is hierarchical.
Each major component can be further described by decomposing it into its major
subcomponents and describing their structure and function.

Function
Both the structure and functioning of a computer are, in essence, simple. In general terms,
there are only four basic functions that a computer can perform:
• Data processing: Data may take a wide variety of forms, and the range of processing
requirements is broad.
5
Lecture note on Computer Architecture and Organization CSC303
• Data storage: Even if the computer is processing data on the fly (i.e., data come in and
get processed, and the results go out immediately), the computer must temporarily
store at least those pieces of data that are being worked on at any given moment. Thus,
there is at least a short- term data storage function. Equally important, the computer
performs a long- term data storage function. Files of data are stored on the computer
for subsequent retrieval and update.
• Data movement: The computer’s operating environment consists of devices that serve
as either sources or destinations of data. When data are received from or delivered to a
device that is directly connected to the computer, the process is known as input– output
(I/O), and the device is referred to as a peripheral. When data are moved over longer
distances, to or from a remote device, the process is known as data communications.
• Control: Within the computer, a control unit manages the computer’s resources and
orchestrates the performance of its functional parts in response to instructions.

Structure
There are four main structural components:

• Central processing unit (CPU): Controls the operation of the computer and performs
its data processing functions; often simply referred to as processor.
• Main memory: Stores data.
• I/O: Moves data between the computer and its external environment.
• System interconnection: Some mechanism that provides for communication among
CPU, main memory, and I/O. A common example of system interconnection is by
means of a system bus, consisting of a number of conducting wires to which all the
other components attach.

Components of the Central Processing Unit


The general components upon which the central processing unit is built include:

1. Bus
A bus is a bundle of wires grouped together to serve a single purpose. The main purpose of
the bus is to transfer data from one device to another. The processor's interface to the bus
includes connections used to pass data, connections to represent the address with which the
processor is interested, and control lines to manage and synchronize the transaction. The
three major buses are Data, Address and Control buses. There are internal buses that the
processor uses to move data, instructions, configuration, and status between its subsystems.

• The Data Bus provides a path for moving data among system modules. The data bus
may
6
Lecture note on Computer Architecture and Organization CSC303
consist of 32, 64, 128, or even more separate lines, the number of lines being referred
to as the width of the data bus. Because each line can carry only 1 bit at a time, the
number of lines determines how many bits can be transferred at a time. The width of
the data bus is a key factor in determining overall system performance. A narrower
bus width means that it will take more time to communicate a quantity of data as
compared to a wider bus. For example, if the data bus is 32 bits wide and each
instruction is 64 bits long, then the processor must access the memory module twice
during each instruction cycle.

• The Address Bus is used to designate the source or destination of the data on the data
bus. For example, if the processor wishes to read a word (8, 16, or 32 bits) of data from
memory, it puts the address of the desired word on the address lines. Clearly, the
width of the address bus determines the maximum possible memory capacity of
the system. Address space refers to the maximum amount of memory and I/O that a
microprocessor can directly address.

If a microprocessor has a 16-bit address bus, it can address up to 2 16 = 65,536 bytes.


Therefore, it has a 64 kB address space. i.e.

1byte = 8 bits….

1024bytes =>1kB

65,536bytes=>64kB
Furthermore, the address lines are generally also used to address I/O ports. Note that the
address bus is unidirectional (the microprocessor asserts requested addresses to the various
devices), and the data bus is bidirectional (the microprocessor asserts data on a write and the
devices assert data on reads).

The Control Bus is used to control the access to and the use of the data and address
lines.

Because the data and address lines are shared by all components, there must be a
means of controlling their use. Control signals transmit both command and timing
information among system modules. Timing signals indicate the validity of data and
address information. Command signals specify operations to be performed. Typical
control lines include:

• Memory write: Causes data on the bus to be written into the addressed location
• Memory read: Causes data from the addressed location to be placed on the bus
• I/O write: Causes data on the bus to be output to the addressed I/O port
• I/O read: Causes data from the addressed I/O port to be placed on the bus
• Transfer ACK: Indicates that data have been accepted from or placed on the bus
• Bus request: Indicates that a module needs to gain control of the bus

7
Lecture note on Computer Architecture and Organization CSC303
• Bus grant: Indicates that a requesting module has been granted control of the bus
• Interrupt request: Indicates that an interrupt is pending
• Interrupt ACK: Acknowledges that the pending interrupt has been recognized
• Clock: Is used to synchronize operations
• Reset: Initializes all modules

2. Registers
Registers are temporary storage locations in the CPU. A register stores a binary value using
a group of latches. Although variables and pointers used in a program are all stored in
memory, they are moved to registers during periods in which they are the focus of operation.
This is so that they can be manipulated quickly. Once the processor shifts its focus, it stores
the values it doesn't need any longer back in memory. Registers may be used for several
operations. Discussion on types and usage of registers will follow in Module III of this
document.

3. Buffers
A processor does not operate in isolation. Typically there are multiple processors
supporting the operation of the main processor. These include video processors, the
keyboard and mouse interface processor, and the processors providing data from hard
drives and CDROMs. There are also processors to control communication interfaces
such as USB, and Ethernet networks. These processors all operate independently, and
therefore one may finish an operation before a second processor is ready to receive the
results.

If one processor is faster than another or if one processor is tied up with a process
prohibiting it from receiving data from a second process, then there needs to be a
mechanism in place so that data is not lost. This mechanism takes the form of a block of
memory that can hold data until it is ready to be picked up. This block of memory is called
a buffer. The figure 3 below presents the basic block diagram of a system that incorporates
a buffer.
Instead of passing data to processor B, Processor B reads data in buffer, processor A stores
data from the buffer
"m e m ory queue"
Processor Processor
A B

Effects of unbalanced throughput are eased with buffer

Figure 3: Block Diagram of a System incorporating a Buffer

8
Lecture note on Computer Architecture and Organization CSC303
The concept of buffers is presented here because the internal structure of a
processor often relies on buffers to store data while waiting for an external device
to become available.

4. The Stack
During the course of normal operation, there will be a number of times when the
processor needs to use a temporary memory, a place where it can store a number for a
while until it is ready to use it again.
For example, every processor has a finite number of registers. If an application needs
more registers than are available, the register values that are not needed immediately can
be stored in this temporary memory. When a processor needs to jump to a subroutine or
function, it needs to remember the instruction it jumped from so that it can pick back up
where it left off when the subroutine is completed. The return address is stored in this
temporary memory. The stack is a block of memory locations reserved to function as
temporary memory. It operates much like the stack of plates at the start of a restaurant
buffet line. When a plate is put on top of an existing stack of plates, the plate that was on
top is now hidden, one position lower in the stack. It is not accessible until the top plate is
removed. There are two main operations that the processor can perform on the stack: it
can either store the value of a register to the top of the stack or remove the top piece of
data from the stack and place it in a register. Storing data to the stack is referred to as
"pushing" while removing the top piece of data is called "popping". The LIFO nature of
the stack makes it so that applications must remove data items in the opposite order from
which they were placed on the stack. For example, assume that a processor needs to store
values from registers A, B, and C onto the stack. If it pushes register A first, B second, and
C last, then to restore the registers it must pull in order C, then B, then A. This is illustrated
in Figure 4a and 4b.

Re g ister A: 25
To p of stack
74 after pushes
Re g ister B: 83
83
Re g ister C: 74 25
To p of stack
befo re p ush es

9
Lecture note on Computer Architecture and Organization CSC303
Assume registers A, B, and C of a processor contain 25, 83, and 74 respectively. If the
processor pushes them onto the stack in the order A, then B, then C then pulls them off the
stack in the order B, then A, then C, what values do the registers contain afterwards? The
solution is explained as follows. First, let's see what the stack looks like after the values from
registers A, B, and C have been pushed. The data from register A is pushed first placing it at
the bottom of the stack of three data items. B is pushed next followed by C which sits at the
top of the stack. In the stack, there is no reference identifying which register each piece of
data came from.

Re gister A: 83
To p of stack
74 b efo re pu lls
Re gister B: 74
83
Re gister C: 25
25
To p of stack
after pulls

When the values are pulled from the stack, B is pulled first and it receives the value from
the top of the stack, i.e., 74. Next, A is pulled. Since the 74 was removed and placed in B,
A gets the next piece of data, 83. Last, 25 is placed in register C.

5. I/O Ports
Input/output ports or I/O ports refer to any connections that exist between the
processor and its external devices. A USB printer or scanner, for example, is connected to
the computer system through an I/O port. The computer can issue commands and send
data to be printed through this port or receive the device's status or scanned images. Some
I/O devices are connected directly to the memory bus and act just like memory devices.
Sending data to the port is done by storing data to a memory address and retrieving data
from the port is done by reading from a memory address.
If the device is incorporated into the processor, then communication with the port is done by
reading and writing to registers. This is sometimes the case for simple serial and parallel
interfaces such as a printer port or keyboard and mouse interface.

10
Lecture note on Computer Architecture and Organization CSC303
PROCESSOR DESIGN APPROACH

One of the key features used to categorize a microprocessor is whether it supports reduced
instruction set computing (RISC) or complex instruction set computing (CISC). The distinction
is how complex individual instructions are the arrangement that exist for the same basic
instruction. In practical terms, this distinction directly relates to the complexity of a
microprocessor’s instruction decoding logic; a more complex instruction set requires more
complex decoding logic. The differences are tabulated in Table 1.

CISC RISC

Instructions and addressing modes are Simple instruction decode logic since there
complex hence complex instruction decode are few instructions to decode hence few
logic operand complexity

Processor are complex hence increasing In a single instruction, smaller number of


difficulty to support clock rate because operations can be performed, using simpler
computation are complex within a single number of instruction
clock period

In a single instruction, many operations are Has separate instruction for each set of
[Link]. fetch, add, increment, store operation, hence reduce complexity by
operations all in one instruction speeding up instructions that are frequently
used

Not all instructions in CISC microprocessors The instructions that are not frequently used
are used with the same frequency. Only are removed so as to simplify the
some (core set) are called most of the time microprocessor control logic hence system
can perform faster, faster execution of
programs, leading to improved throughput
for the commonly used instruction and
increase overall performance.

The instructions that are used less often Reduces permutation of the decode logic
impose a burden on the entire system since instructions are reduced and only few
because there is increase in permutation of memory R/W operations
decode logic in a given clock cycle

11
Lecture note on Computer Architecture and Organization CSC303

RECENT TRENDS IN PROCESSOR TECHNOLOGY

Data creation is growing exponentially due to explosion in big data and machine learning, both
processor, storage and memory technology has witness fundamental change in terms of size,
speed, capacity and architecture, hence the demand for graphics processing units. GPUs are ideal
fit for so many modern applications. A Central Processing Unit (CPU) is a latency -optimized
general purpose processor that is designed to handle a wide range of distinct tasks sequentially,
while a Graphics Processing Unit (GPU) is a throughput-optimized specialized processor
designed for high-end parallel computing, as illustrated in Figure 5.

CPU Architecture

A Central Processing Unit (CPU) is the brains of your computer. The main job of the CPU is to
carry out a diverse set of instructions through the fetch-decode-execute cycle to manage all parts
of your computer and run all kinds of computer programs.

A CPU is very fast at processing your data in sequence, as it has few heavyweight cores with high
clock speed. It’s like a Swiss army knife that can handle diverse tasks pretty well. The CPU is
latency-optimized and can switch between numbers of tasks real quick, which may create an
impression of parallelism. Nevertheless, fundamentally it is designed to run one task at a time.

GPU Architecture

A Graphics Processing Unit (GPU) is a specialized processor whose job is to rapidly manipulate
memory and accelerate the computer for a number of specific tasks that require a high degree of
parallelism.

As the GPU uses thousands of lightweight cores whose instruction sets are optimized for
dimensional matrix arithmetic and floating point calculations, it is extremely fast with linear
algebra and similar tasks that require a high degree of parallelism.

As a rule of thumb, if your algorithm accepts vectorized data, the job is probably well-suited
for GPU computing.

Architecturally, GPU’s internal memory has a wide interface with a point-to-point connection
which accelerates memory throughput and increases the amount of data the GPU can work with
in a given moment. It is designed to rapidly manipulate huge chunks of data all at once.
Lecture note on Computer Architecture and Organization CSC303

Figure 5: CPU vs GPU

CPU GPU

Task parallelism Data parallelism

A few heavyweight cores Many lightweight cores

High memory size High memory throughput

Many diverse instruction sets A few highly optimized instruction sets

Explicit thread management Threads are managed by hardware

A larger number (thousands) of smaller


A smaller number of larger cores
cores

Low latency High throughput

Optimized for serial processing Optimized for parallel processing

Designed for running complex Designed for simple and repetitive


programs calculations

Performs fewer instructions per clock Performs more instructions per clock

Cost-efficient for smaller workloads Cost-efficient for bigger workloads

Allows for manual memory


Automatic cache management
management
Lecture note on Computer Architecture and Organization CSC303
When comparing the two, it is important to understand that GPUs were designed to complement
CPUs, not to replace them. The CPU and the GPU work together to increase the amount and
speed of processed data.

A GPU cannot replace a CPU in a computer system. The CPU is necessary to oversee the
execution of tasks on the system. However, the CPU can delegate specific repetitive workloads
to the GPU and free its own resources necessary for maintaining the stability of the system and
the programs that are running

GPU uses many lightweight processing cores, leverages data parallelism, and has high memory
throughput. While the specific components will vary by model, fundamentally most modern
GPUs use single instruction multiple data (SIMD) stream architecture.

FLYNN’S TAXONOMY

What is Flynn’s Taxonomy?

Flynn’s Taxonomy is a categorization of computer architectures by Stanford University’s Michael


J. Flynn. The basic idea behind Flynn’s Taxonomy is simply that computations consist of 2
streams (data and instruction streams) that can be processed in sequence (1 stream at a time) or
in parallel (multiple streams at once). It is important to understand this because it has been used
as a tool in design of modern processors and their functionalities.

Two data streams with two possible methods to process them leads to the 4 different categories
in Flynn’s Taxonomy. Let’s take a look at each, as illustrated in Figure 6

Figure 6: Flynn’s Taxonomy


Lecture note on Computer Architecture and Organization CSC303
 Single Instruction Single Data (SISD)

SISD stream is an architecture where a single instruction stream (e.g. a program) executes on one
data stream. This architecture is used in older computers with a single-core processor, as well as
many simple compute devices.

 Single Instruction Multiple Data (SIMD)

A SIMD stream architecture has a single control processor and instruction memory, so only one
instruction can be run at any given point in time. That single instruction is copied and ran across
each core at the same time. This is possible because each processor has its own dedicated memory
which allows for parallelism at the data-level (a.k.a. “data parallelism”).

The fundamental advantage of SIMD is that data parallelism allows it to execute computations
quickly (multiple processors doing the same thing) and efficiently (only one instruction unit).

 Multiple Instruction Single Data (MISD)

MISD stream architecture is effectively the reverse of SIMD architecture. With MISD multiple
instructions are performed on the same data stream. The use cases for MISD are very limited
today. Most practical applications are better addressed by one of the other architectures.

 Multiple Instruction Multiple Data (MIMD)

MIMD stream architecture offers parallelism for both data and instruction streams. With MIMD,
multiple processors execute instruction streams independently against different data streams.

What makes SIMD best for GPUs?

Now that we understand the different architectures, let’s consider why SIMD is the best choice
for GPUs. The answer becomes intuitive when you understand that fundamentally graphics
processing and many other common GPU computing use cases are simply running the same
mathematical function over and over again at scale. In this case, many processors running the
same instruction on multiple data sets is ideal.

What about SIMT?

So where does SIMT fit into Flynn’s Taxonomy? SIMT can be viewed as an extension of SIMD. It
adds multithreading to SIMD which improves efficiency as there is less instruction fetching
overhead.
Lecture note on Computer Architecture and Organization CSC303
Terminologies for Future Trends in Computer Architecture

These trends are actively being researched and developed by scientists, engineers, and
tech companies around the world.

While some trends, such as quantum computing, are still in the experimental stage,
others like in-memory computing and reconfigurable architecture are already making
their way into practical applications to drive transformative changes across various
industries. Quantum computing could revolutionize fields like cryptography and drug
discovery, while neuromorphic architecture could lead to breakthroughs in artificial
intelligence. In-memory computing could accelerate data-driven insights, and photonic
computing might reshape communication networks. Reconfigurable architecture could
optimize computing resources for different tasks, improving overall efficiency.

1. Quantum computing

Quantum computing utilizes principles of quantum mechanics to process information


using quantum bits or qubits. Unlike classical bits, qubits can exist in multiple states
simultaneously, enabling quantum computers to perform complex calculations
exponentially faster than classical computers. Quantum computing has the potential to
revolutionize fields like cryptography, optimization, and materials science.

Quantum computing uses quantum-mechanical phenomena like superposition and


entanglement to process information. It can potentially be beneficial because it can
tackle issues that are difficult for traditional computers to handle, like factoring big
numbers, modelling complicated systems, and optimising complex functions.

Besides, the number of potential states and interactions multiplies exponentially as the
complexity of the problem rises. Although it is still in its initial phase, quantum
computing has the potential to change industries, including cryptography, banking,
and drug discovery. Building a quantum computer can be done in several ways, such
as using topological qubits, trapped ions, and superconducting circuits.

2. Neuromorphic architecture

Neuromorphic architecture is inspired by the human brain’s neural networks. It aims


to create computer systems that can process information and learn in ways similar to
Lecture note on Computer Architecture and Organization CSC303
biological systems. By emulating the brain’s efficiency and adaptability, neuromorphic
architecture enhances machine learning and artificial intelligence capabilities, enabling
computers to perform tasks intuitively and efficiently.

Neuromorphic computing is motivated by the structure and operation of the human brain. It
processes information in a way that is fundamentally distinct from conventional computing by
using specialised hardware and software to replicate the brain’s neuronal structure. For instance,
neuromorphic computing relies on analogue rather than digital computations, it may be more
energy-efficient. Because it can learn from and adjust to new information in real-time, it can also
be more versatile and adaptive. Several computing fields, such as artificial intelligence, robotics,
and sensory processing, stand to benefit from it.

3. In-memory computing

In-memory computing challenges the traditional separation of processing and memory


units by performing computations directly within the memory. This approach
eliminates the need to transfer data between components, leading to faster and more
efficient data processing. In-memory computing is particularly beneficial for data-
intensive tasks like big data analytics and machine learning. memory technologies
address some of the major issues in computer architecture, including power
consumption, performance, and scalability.

4. Reconfigurable architecture
Reconfigurable architecture is a computer architecture combining some of the
flexibility of software with the high performance of hardware.

Reconfigurable architecture allows computer systems to dynamically adjust their


hardware configurations to optimize performance for specific tasks. This adaptability
is crucial in environments with rapidly changing workloads and applications.
Reconfigurable architecture offers versatility and efficiency, making it well-suited for
diverse computing needs, including edge computing and scientific simulations. E.g
FPGA.
Lecture note on Computer Architecture and Organization CSC303
5. Cloud-based computing

Cloud-based computing, commonly referred to as cloud computing, uses remote


servers and networks in place of just a local computer or server to store, administer,
and process data and applications. Cloud computing enables greater flexibility and
scalability in computer resources because resources and services are offered over the
[Link] as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure
as a Service are the three primary divisions of cloud computing (IaaS).

6. Edge computing

This is a distributed computing paradigm that processes data at the network’s edge,
nearer to the data source. Edge computing enables data to be processed and analysed
locally, on devices or systems closer to the source of data generation, rather than
transferring all of the data to a centralised data center or cloud for processing. This
method is frequently applied to decrease latency and speed up data processing.

CPU Pipelining
Microprocessor designers, in an attempt to squeeze every last bit of performance from their
designs, try to make sure that every circuit of the CPU is doing something productive at all times.
The most common application of this practice applies to the execution of instructions. It is
based on the fact that there are steps to the execution of an instruction, each of which uses
entirely different components of the CPU.
Assuming that the execution of a machine code instruction can be broken into three stages:

• Fetch – get the next instruction to execute from its location in memory
• Decode – determine which circuits to energize in order to execute the fetched instruction
• Execute – use the ALU and the processor to memory interface to execute the instruction

By comparing the definitions of the different components of the CPU shown with the needs of
these three different stages or cycles, it can be seen that three different circuits are used for these
three tasks.

• The internal data bus and the instruction pointer perform the fetch.
• The instruction decoder performs the decode cycle.
• The ALU and CPU registers are responsible for the execute cycle.
Lecture note on Computer Architecture and Organization CSC303
Once the logic that controls the internal data bus is done fetching the current instruction,
what's to keep it from fetching the next instruction? It may have to guess what the next
instruction is, but if it guesses right, then a new instruction will be available to the instruction
decoder immediately after it finishes decoding the previous one.

Once the instruction decoder has finished telling the ALU what to do to execute the current
instruction, what's to keep it from decoding the next instruction while it's waiting for the ALU
to finish? If the internal data bus logic guessed right about what the next instruction is, then the
ALU won't have to wait for a fetch and subsequent decode in order to execute the next
instruction.

This process of creating a queue of fetched, decoded, and executed instructions is called
pipelining, and it is a common method for improving the performance of a processor.

Therefore, a fast processor can be built by making the rate of execution of instruction fast. This
can be achieved by increasing the number of instructions that can be executed
simultaneously. Some CPUs break the fetch-decode execute cycle down into smaller steps,
where some of these smaller steps can be performed in parallel. This overlapping speeds up
execution. i.e. The CPU fetches and executes simultaneously. This method, used by all current
CPUs, is known as pipelining. This is a process whereby the CPU fetches and executes at the
same time, achieved by splitting the microprocessor into two; (1) bus interface unit (BIU) and (2)
execution unit (EU). It is a way of improving the processing power of the CPU. The BIU access
the memory and peripherals while the EU executes instructions. The idea of pipelining is to
have more than one instruction being processed by the processor at the same time. Figure 7a
shows the time-line sequence of the execution of five instructions on a non-pipelined processor.
Notice how a full fetch - decode-execute cycle must be performed on instruction 1 before
instruction 2 can be fetched. This sequential execution of instructions allows for a very simple
CPU hardware, but it leaves each portion of the CPU idle for 2 out of every 3 cycles. During the
fetch cycle, the instruction decoder and ALU are idle; during the decode cycle, the bus interface
and the ALU are idle; and during the execute cycle, the bus interface and the instruction decoder
are idle.
Lecture note on Computer Architecture and Organization CSC303
Figure 7b on the other hand shows the time-line sequence for the execution of five
instructions using a pipelined processor. Once the bus interface has fetched
instruction 1 and passed it to the instruction decoder for decoding, it can begin its
fetch of instruction 2.

Notice that the first cycle in the figure only has the fetch operation. The second
cycle has both the fetch and the decode cycle happening at the same time. By the
third cycle, all three operations are happening in parallel.

Figure 7a: Non-Pipelined Execution of Five Instructions

Figure 7b: Pipelined


Execution of Five
Instructions
Lecture note on Computer Architecture and Organization CSC303
Without pipelining, five instructions take 15 cycles to execute. In a pipelined
architecture, those same five instructions take only 7 cycles to execute, a savings of
over 50 %. In general, the number of cycles it takes for a non-pipelined architecture
using three cycles to execute an instruction is equal to three times the number of
instructions.

Disadvantages of Pipelining/Pipeline Conflicts

1. Resource Hazards. When an instruction is storing a value to memory, and another value is
being fetched from the memory, both need access to memory, this result in a conflict. This
occurs when two or more instructions that are already in the pipeline need the same
resources. It can also occur when multiple instructions are ready to enter the execute phase,
and there exist only a single ALU. This can be taken care of in 2 ways. (1) Instruction execution
will continue while instruction fetch will wait. (2) providing more resources such as multiple
ports into main memory and multiple ALU

2. Data Hazards. This happens when the result of one instruction, not yet available is to be used
as an operand for a following instruction. This is a situation when there is a conflict in the
access of an operand location i.e. two or more instructions accessing a particular register or
memory operand {NB in sequential processing, this is not a problem but in parallel, the values
will be different}. This can be resolved by altering the flow of execution in a program i.e.
specialized hardware can be used to detect the conflict and route data through special paths
that exists between various stages in the pipeline, thereby reducing the time needed for the
instruction to access the required operand.
Lecture note on Computer Architecture and Organization CSC303
3. Control Hazard: This occurs when the pipeline makes the wrong decision on a branch
prediction, and brings a wrong instruction into the pipe. A conditional branch instruction
makes the address of the next instruction to be fetched unknown. After a conditional
branch, predicting the instruction that will be needed next becomes a problem. This may be
overcome by (i) rearranging the machine code to cause a delayed branch. (ii) Fetching the
beginning and branch instruction at the same time and save the branch until it is actually
needed of which at that time the true execution path will be known.

History/ Evolution of Microprocessors

The first Microprocessor (4004) was designed by Intel Corporation which was founded by Moore
and Noyce in 1968.
In the early years, Intel focused on developing semiconductor memories (DRAMs and EPROMs)
for digital computers.
In 1969, a Japanese Calculator manufacturer, Busicom approached Intel with a design for a small
calculator which need 12 custom chips. Ted Hoff, an Intel Engineer thought that a general
purpose logic device could replace the multiple components.
This idea led to the development of the first so called microprocessor. So, Microprocessors
started with a modest beginning of drivers for calculators.

With developments in integration technology Intel was able to integrate the additional chips like
8224 clock generator and the 8228 system controller along with 8080 microprocessor within a
single chip and released the 8 bit microprocessor 8085 in the year 1976.
The 8085microprocessor consisted of 6500 MOS transistors and could work at clock frequencies
of 3-5MHz. The other improved 8 bit microprocessors include Motorola MC 6809, Zilog Z-80 and
RCA COSMAC.

In 1978, Intel introduced the 16 bit microprocessor 8086 and 8088 in 1979. IBM selected the Intel
8088 for their personal computer (IBM-PC). 8086 microprocessor made up of 29,000 MOS
transistors and could work at a clock speed of 5-10 MHz. It has a 16-bit ALU with 16-bit data bus
and 20-bit address bus. It can address up to 1MB of address space.
The pipelining concept was used for the first time to improve the speed of the processor. It had
a pre-fetch queue of 6 instructions where in the instructions to be executed were fetched during
the execution of an instruction. It means 8086 architecture supports parallel processing.

The 8088 microprocessor is similar to 8086 processor in architecture, but the basic difference is it
has only 8-bit data bus even though the ALU is of 16-bit.
Lecture note on Computer Architecture and Organization CSC303
In 1982 Intel released another 16-bit processor called 80186 designed by a team under the
leadership of Dave Stamm. This is having higher reliability and faster operational speed but at a
lower cost. It had a pre-fetch queue of 6-instructions and it is suitable for high volume
applications such as computer workstations, word-processor and personal computers.
It is made up of 134,000 MOS transistors and could work at clock rates of 4 - 6 MHz.

Intel released another 16 bit microprocessor 80286 having 1, 34,000 transistors in 1981. It was
used as CPU in PC-ATs in 1982. It is the second generation microprocessor, more advanced to
80186 processor. It could run at clock speeds of 6 to 12.5 MHz. It has a 16-bit data bus and 24bit
address bus, so that it can address up to 16MB of address space and 1GB of virtual memory.
Intel introduced the concept of protected mode and virtual mode to ensure proper operation. It
also had on-chip memory management unit (MMU).This was popularly called as Intel 286 in
those days.

In 1985, Intel released the first 32 bit processor 80386, with 275,000 transistors. It has 32-bit data
bus and 32-bit address bus so that it can address up to a total of 4GB memory also a virtual
memory space of 64TB. It could process five million instructions per second and could work with
all popular operating systems including Windows. It is incorporated with a concept called
paging in addition to segmentation technique. It uses a math co-processor called 80387.

Intel introduced 80486 microprocessor with a built-in maths co-processor and with 1.2 million
transistors. It could run at the clock speed of 50 MHz. This is also a 32 bit processor but it is twice
as fast as [Link] additional features in 486 processor are the built-in Cache and built-in math
co-processors. The address bus here is bidirectional because of presence of cache memory.

On 19th October, 1992, Intel released the Pentium-I Processor with 3.1 million transistors. So, the
Pentium began as fifth generation of the Intel x86 architecture. This Pentium was backward
compatible while offering new features. The revolutionary technology is that the CPU is able to
execute two instructions at the same time. This is known as super scalar technology. The Pentium
uses a 32-bit expansion bus, however the data bus is 64 bits.

The 7.5 million transistors based chip, Intel Pentium II processor was released in 1997. It works
at a clock speed of [Link]. Pentium II uses the Dynamic Execution Technology which consists
of three different facilities namely, Multiple branch prediction, Data flow analysis, and
Speculative execution unit. Another important feature is a thermal sensor located on the mother
board which monitor the die temperature of the processor.
Lecture note on Computer Architecture and Organization CSC303
Intel Celeron Processors, the Pentium-III processor with 9.5 million transistors was introduced in
1999. It uses dynamic execution micro-architecture, a unique combination of multiple branch
prediction, dataflow analysis and speculative execution.
The Pentium III has improved MMX and processor serial number feature. The improved MMX
enables advanced imaging, 3D streaming audio and video, and speech recognition and enhanced
Internet facility.

Pentium-IV with 42 million transistors and 1.5 GHz clock speed was released by Intel in
November 2000. The Pentium -IV processor has a system bus with 3.2 G-bytes per second of
bandwidth. This high bandwidth is a key reason for applications that stream data from memory.
This bandwidth is achieved with 64 –bit wide bus capable of transferring data at a rate of
400MHz. The Pentium -IV processor enables real-time MPEG2 video encoding and near real-time
MPEG4 encoding, allowing efficient video editing and video conferencing.

Intel with partner Hewlett-Packard developed the next generation 64-bit processor architecture
called [Link] first implementation was named Itanium. Itanium processor which is the first
in a family of 64 bit products was introduced in the year [Link] Itanium processor was
specially designed to provide a very high level of parallel processing, to enable high performance
without requiring very high clock frequencies. The Itanium processor can handle up to 6
simultaneous 64 –bit instructions per clock cycle.

The Itanium II is an IA-64 microprocessor developed jointly by Hewlett-Packard (HP) and Intel
and released on July 8, 2002. It is theoretically capable of performing nearly 8 times more work
per clock cycle than other CISC and RISC architectures due to its parallel computing micro-
architecture.
Pentium 4EE was released by Intel in the year 2003 and Pentium 4E was released in the year 2004.

The Pentium Dual-Core brand was used for mainstream X86-architecture microprocessors from
Intel from 2006 to 2009. The 64 bit Intel Core2 was released on July 27, 2006. In terms of features,
price and performance at a given clock frequency, Pentium Dual Core processors were positioned
above Celeron but below Core and Core 2 microprocessors in Intel's product range.

The Pentium Dual Core, which consists of 167 million transistors was released on January 21,
2007. Intel Core Duo consists of two cores on one die, a 2 MB L2 cache shared by both cores, and
an arbiter bus that controls L2 cache.
Core 2 Quad processors are multi-chip modules consisting of two dies similar to those used in
Core 2 Duo, forming a quad-core processor.
Lecture note on Computer Architecture and Organization CSC303
In September 2009, new Core i7 models based on the Lynnfield desktop quad-core processor and
the Clarksfield quad-core mobile were added, The first six-core processor in the Core lineup is
the Gulftown, which was launched on March 16, 2010. Both the regular Core i7 and the Extreme
Edition are advertised as five stars in the Intel Processor Rating.

Features of 8086 Microprocessor

– It is a 16-bit microprocessor.
– 8086 has a 20 bit address bus can access up to 220 memory locations (1 MB).
– It can support up to 64K I/O ports.
– It provides 14, 16 -bit registers.
– It has multiplexed address and data bus AD0- AD15 and A16 – A19.
– It requires single phase clock with 33% duty cycle to provide internal timing.
– 8086 is designed to operate in two modes, Minimum and Maximum.
– It can prefetches up to 6 instruction bytes from memory and put them in instr queue in order
to speed up instruction execution.
– It requires +5V power supply.
– A 40 pin dual in line package.

8086 employs parallel processing. The 8086 has 2 parts which operates at the same time; the bus
interface unit BIU and execution unit EU as seen in the Figure 8 below

Figure 8: Illustrating Parallel processing

Bus Interface Unit (BIU):

– The BIU performs all bus operations such as instruction fetching, reading and writing
Lecture note on Computer Architecture and Organization CSC303
operands for memory and calculating the addresses of the memory operands.
– The instruction bytes are transferred to the instruction queue.
– It provides a full 16 bit bidirectional data bus and 20 bit address bus.
– The bus interface unit is responsible for performing all external bus operations.
Specifically it has the following functions:
– Instruction fetch , Instruction queuing, Operand fetch and storage, Address calculation
relocation and Bus control.
– The BIU uses a mechanism known as an instruction queue to implement a pipeline architecture.
The BIU contains the following registers:

IP - the Instruction Pointer


CS - the Code Segment Register
DS - the Data Segment Register
SS - the Stack Segment Register
ES - the Extra Segment Register

The BIU fetches instructions using the CS and IP, written CS:IP, to contract the 20-bit address.
Data is fetched using a segment register (usually the DS) and an effective address (EA) computed
by the EU depending on the addressing mode.

Execution Unit (EU):


– The Execution unit is responsible for decoding and executing all instructions.

– The EU extracts instructions from the top of the queue in the BIU, decodes them, generates
operands if necessary, passes them to the BIU and requests it to perform the read or write by
cycles to memory or I/O and perform the operation specified by the instruction on the operands.

– During the execution of the instruction, the EU tests the status and control flags and updates
them based on the results of executing the instruction.
– If the queue is empty, the EU waits for the next instruction byte to be fetched and shifted to top
of the queue.
– When the EU executes a branch or jump instruction, it transfers control to a location
corresponding to another set of sequential instructions.
– Whenever this happens, the BIU automatically resets the queue and then begins to fetch
instructions from this new location to refill the queue.
Lecture note on Computer Architecture and Organization CSC303

MODULE TWO

INSTRUCTION SET ARCHITECTURE (ISA)

An instruction set architecture (ISA), is the part of the computer architecture related to
programming, including the native data types, instructions, registers, addressing modes,
memory architecture, interrupt and exception handling, and external I/O. The ISA also includes
a specification of the set of opcodes (machine language) - the native commands for a particular
processor. ISA is the hardware – software interface

Figure 8: Illustrating the ISA

An instruction set is a list of all the instructions that a processor can execute.

Typical Categories of Instructions are:


 Arithmetic - add, subtract
 Logic - and, or and not
 Data movement move input output load and store
 Control flow - goto, if ... goto, call and return.

An instruction is a form of control code, which supplies the inf ormation about an
operation and the data on which the operation is to be performed. Each instruction
consists of several elements. An instruction element is a unit of information required by
the CPU for execution. One thing which should be kept in mind is that the instruction set
is a boundary which is looked upon in the same fashion by a computer designer and the
Lecture note on Computer Architecture and Organization CSC303
programmer. From the computer designer’s point of view, the instruction set provides
the functional requirements of the CPU.

Features of Instruction Set

• Instructions and Instruction Formats

• Data Types, Encodings, and Representations

• Programmable Storage: Registers and Memory

• Addressing Modes: Accessing Instructions and Data

ISA is at the interface between software and hardware. It is an abstraction which hides
hardware complexity from software through a set of operations and devices. One of the
crucial features of any processor is its instruction set, i.e. the set of machine code
instructions that the processor can carry out. Each processor has it s own unique instruction
set specifically designed to make best use of the capabilities of that processor. The actual
number of instructions provided ranges from a few dozen for a simple 8-bit
microprocessor to several hundred for a 32-bit (virtual address extension) VAX processor.
However, it should be pointed out that a large instruction set does not necessarily imply a
more powerful processor.

Instruction set architecture (ISA) describes the processor in terms of what the
programmer sees, i.e. the instructions and registers. Two machines may have the same
ISA, but different organizations. Organization is concerned with the internal design of
the processor, the design of the bus system and its interfaces, the design of memory and
so on. Two machines with the same organization may have different hardware
implementations.

The components/elements of an instruction

i. An operation code, also termed an opcode which specifies the operation to be


performed ii. A reference to the operands on which data processing is to be
performed. For example, an address of an operand
Lecture note on Computer Architecture and Organization CSC303
iii. A reference to the operands which store the results of data processing operations
performed by the instruction.

iv. A reference for the next instruction, to be fetched and executed. The next
instruction which is to be executed is normally the next instruction following the current
instruction in the memory. Therefore, no explicit reference to the next instruction is
provided.

Where are those operands located? In the memory or in the CPU registers or in the I/O

device.

If the operands are located in the registers then an instruction can be executed faster than
that of the operands located in the memory. The main reason here is that memory access
time is higher in comparison to the register access time.

How is an instruction represented? Instructions are represented as sequence of bits. An


instruction is divided into a number of fields. Each of these fields corresponds to a
constituent element of instruction. A layout of instruction is known as instruction format.

Categories of Instructions/Instruction types

i. Data Processing Instructions: These instructions are used for arithmetical and logic
operations in a machine. Examples of data processing instructions are: Arithmetic,
Boolean, shift, character and string processing instructions, stack and register,
manipulation instructions, vector instructions, etc.

ii. Data Storage/Retrieval Instructions: Since the data processing operations are
normally performed on the data stored in CPU registers, we need instructions to bring
data to and from memory to registers. These are called data storage/retrieval instructions.
Examples of data storage and retrieval instructions are load and store instructions.

iii. Data Movement Instructions: These are basically input/output instructions. They
are required to bring in programs and data from various devices to memory or to
communicate the results to the input/output devices. Some of these instructions can be:
start, halt, test etc.
Lecture note on Computer Architecture and Organization CSC303
iv. Control Instructions: These instructions are used for testing the status of
computation through Processor Status Word (PSW). Branch instruction.

v. Miscellaneous Instructions: These instructions do not fit in any of the above


categories. Some of these instructions are: interrupt or supervisory call, swa pping, return
from interrupt, halt instruction or some privileged instruction of operating systems.

Factors to consider in the designing of instruction set for a machine?

Instruction set design is the most complex yet interesting and very much analyzed aspect
of computer design. The instruction set plays an important role in the design of the CPU
as it defines many functions of it. Since instruction sets are the means by which a
programmer can control the CPU, therefore, users’ views must be considered while
designing the instruction set.

Some of the important design issues relating to instruction design are:

i How many and what operations should be provided?

ii. What are the operand data types to be provided?

iii. What should be the instruction format? This includes issues like:
- instruction length,

- number of address,

- length of various elements of instructions, etc.

iv. What is the number of registers which can be referenced by an instruction and how are they
used?

v. What are the modes of specifying an operand address?

Operand Data Types


An operand data type specifies the type of data on which a particular operation can be
performed. For example, for an arithmetical operation, numbers are to be used as data
types. In general the operands which can be used in an instruction can be categorised into
four general
Lecture note on Computer Architecture and Organization CSC303
categories.

These are:

Addresses Numbers Characters Logical data

Addresses: Addresses are treated as a form of data which is used in the calculation of
actual physical memory address of an operand. In most of the cases, the addresses
provided in instruction are operand references and not the actual physical memory
addresses.

Numbers: All machines provide numeric data types. One special feature of numbers used
in computers is that they are limited in magnitude, and hence the underflow and overflow
may occur during arithmetical operations on these numbers. The maximum and
minimum magnitude is fixed for an integer number while a limit of precision of numbers
and exponent exist in the floating point numbers. The three numeric data types which are
common in computers are:

- Fixed point numbers or Integers (signed or unsigned)


- Floating point numbers
- Decimal numbers
All the machines provide instructions for performing arithmetical operations on fixed
point and floating point numbers. Many machines provide arithmetical instructions
which perform operations on packed decimal digits.

Characters: Another very common data type is the character or string of characters. The
most widely used character representation is ASCII(American National Standard Code of
Information Interchange). It has 7 bits for coding data pattern which implies 128 different
characters.

Some of these characters are control characters which may be used in data
communication. The eighth bit of ASCII may be used as a parity bit. One special mention
about ASCII which facilitates the conversion of a 7 bit ASCII and a 4 bit packed decimal
number is that the last four digits of ASCII number are binary equivalents of digits 0 -9.
Lecture note on Computer Architecture and Organization CSC303
Logical Data: In general a data word or any other addressable unit such as byte, half word
etc. are treated as a single unit of data. But can we consider an n-bit data unit consisting
of n items of 1 bit each? If we treat each bit of an n-bit data as an item then it can be
considered to be logical data. Each of these n items can have a value 0 or 1. What are the
advantages of such a bit oriented view of data? The advantages of such a view will be:

i. We can store an array of Boolean or binary data items most


efficiently.
ii. We will be in a position to manipulate the bits of any data item

Instruction Format

Information involved in any operation performed by the CPU needs to be addressed. In


computer terminology, such information is called the operand.

Therefore, any instruction issued by the processor must carry at least two types of
information.
These are the operation to be performed, encoded in what is called the op-code field, and
the address information of the operand on which the operation is to be performed,
encoded in what is called the address field.

Based on the number of operands, instructions can be classified as:

i. three-address,
ii. two-address,
iii. one-and-half-address,
iv. one-address, and
v. zero-address.

Using the convention - operation, source, destination to express an instruction, wherein


operation represents the operation to be performed, for example, Add, Subtract, Write,
or Read.
Lecture note on Computer Architecture and Organization CSC303
The source field represents the source operand(s). The source operand can be a constant,
a value stored in a register, or a value stored in the memory. The destination field
represents the place where the result of the operation is to be stored, for example, a
register or a memory location.

 A three-address instruction takes the form operation add-1, add-2, add-3.


In this form, each of add-1, add-2, and add-3 refers to a register or to a memory location.
For example, the instruction ADD R1, R2, R3
This instruction indicates that the operation to be performed is addition. It also indicates
that the values to be added are those stored in registers R1 and R2, that the results should
be stored in register R3.
An example of a three-address instruction that refers to memory locations may take the
form ADD A,B,C. The instruction adds the contents of memory location A to the contents
of memory location B and stores the result in memory location C.

 A two-address instruction takes the form operation add-1, add-2.


In this form, each of add-1 and add-2 refers to a register or to a memory location.
Consider, for example, the instruction
ADD R1, R2. This instruction adds the contents of register R1 to the content of R2 and
stores the results in register R2.

The original contents of register R2 are lost due to this operation while the original
contents of register R1 remain intact.
A similar instruction that uses memory locations instead of registers can take the form
ADD A, B. In this case, the contents of memory location A are added to t he contents of
memory location B and the result is used to override the original contents of memory
location B.

 A one-address instruction takes the form ADD R1

In this case the instruction implicitly refers to a register, called the Accumulator Racc ,
such that the contents of the accumulator is added to the contents of the register R1 and
the results are stored back into the accumulator Racc
Lecture note on Computer Architecture and Organization CSC303
If a memory location is used instead of a register, then an instruction of the form ADD B
is used. In this case, the instruction adds the content of the accumulator Racc to the
content of memory location B and stores the result back into the accumulator Racc

 Between the two- and the one-address instruction, there can be a one-and-half
address instruction.

Consider, for example, the instruction ADD B, R1. In this case, the instruction adds the
contents of register R to the contents of memory location B and stores the result in register
R1

Owing to the fact that the instruction uses two types of addressing, that is, a register and
a memory location, it is called a one-and-half-address instruction. This is because register
addressing needs a smaller number of bits than those needed by memory addressing.

 zero-address instructions.

These are the instructions that use stack operation. A stack is a data organization
mechanism in which the last data item stored is the first data item retrieved. Two specific
operations can be performed on a stack. These are the push and the pop operations. A
special register called stack pointer (SP), is used to indicate the stack location that can be
addressed. The classes of instruction is summarized in the table below.

Table 3: Instruction Classification


Lecture note on Computer Architecture and Organization CSC303
MEMORY OPERATIONS

The main memory can be modeled as an array of millions of adjacent cells, each
capable of storing a binary digit (bit), having value of 1 or 0. These cells are organized in
the form of groups of fixed number, say n, of cells that can be dealt with as an entity.

An entity consisting of 8 bits is called a byte. This address will be used to determine the
location in the memory in which a given word is to be stored. This is called a memory
WRITE operation. Similarly, the address will be used to determine the memory location
from which a word is to be retrieved from the memory. This is called a memory READ
operation.
During a memory write operation a word is stored into a memory location whose address
is specified. During a memory read operation a word is read from a memory location
whose address is specified. Typically, memory read and memory write operations are
performed by the central processing unit (CPU). The 3 basic steps needed in order for the
CPU to perform a write operation into a specified memory location:

1. The word to be stored into the memory location is first loaded by the CPU into a
specified register, called the memory data register (MDR).

[Link] address of the location into which the word is to be stored is loaded by
the CPU into a specified register, called the memory address register (MAR).

3. WRITE signal is issued by the CPU indicating that the word stored in the MDR is to
be stored in the memory location whose address in loaded in the MAR.

Similar to the write operation, three basic steps are needed in order to perform a memory
read

operation:

1. The address of the location from which the word is to be read is loaded into the
MAR.

2. A READ signal issued by the CPU indicating that the word whose address is in
the MAR is to be read into the MDR.
Lecture note on Computer Architecture and Organization CSC303
3. After some time, corresponding to the memory delay in reading the specified
word, the required word will be loaded by the memory into the MDR ready for
use by the CPU.

FETCH-EXECUTE CYCLE

Fetch and Execute are the fundamental operations of the processor. The fetch-
decode execute cycle represents the steps that a computer follows to run a program. The
program which is to be executed is a set of instructions that is stored in the memory, hence,
the CPU executes the instructions that it finds in the computer’s memory. In order to
execute an instruction;

- the CPU must first fetch (transfer) the instruction from memory into one of its
registers. -the CPU then decodes the instruction, i.e. it decides which instruction
has been fetched and -finally it executes (carries out) the instruction.
The CPU then repeats this procedure, i.e. it fetches an instruction, decodes and executes
it. This process is repeated continuously and is known as the fetch-execute cycle.

This cycle begins when the processor is switched on and continues until the CPU is halted
(via a halt instruction, e.g. 8086 HLT instruction or the machine is switched off). The fetch-
execute cycle operates by first fetching an instruction.

Instruction Fetch

An instruction fetch involves the reading of an instruction from the memory location(s)
to the CPU. The execution of this instruction may involve several operations, depending
on the nature of the instruction. The processing needed for a single instruction (fetch and
execution) is referred to as an instruction cycle

- The Program Counter (PC) keeps track of the instruction that is to be executed next
after the execution of an on-going instruction. i.e. PC always contains the address
of the next instruction to be executed. A program counter is used for a fetch cycle
in a typical CPU.
Lecture note on Computer Architecture and Organization CSC303
- The instructions are loaded into the Instruction Register (IR), before their
execution. i.e. the IR holds the instruction to be executed.

Figure 9: The Fetch Execute Cycle

Instruction Execution

The instruction execution takes place in the CPU registers. The following are CPU
registers:

• Memory Address Register (MAR): It specifies the address of the memory location
from which the data or instruction is to be accessed (for a read operation) or to
which the data is to be stored (for a written operation).

 Memory Buffer Register (MBR): It is a register which contains the data to be


written in the memory (for a written operation) or it receives the data from the
memory (for read operation).

• Program Counter (PC): It keeps track of the instruction that is to be executed next,
after the execution of an on-going instruction.
• Instruction Register (IR): the instructions are loaded here before their execution.
Lecture note on Computer Architecture and Organization CSC303

THE EVOLUTION OF INTEL X86 ARCHITECTURE


In terms of market share, Intel has ranked as the number one maker of microprocessors for
non- embedded systems for decades.

It is worthwhile to list some of the highlights of the evolution of the Intel product line:

- 8080: The world’s first general- purpose microprocessor. This was an 8-bit machine,
with an 8-bit data path to memory. The 8080 was used in the first personal computer, the
Altair.

- 8086: A far more powerful, 16-bit machine. In addition to a wider data path and
larger registers, the 8086 sported an instruction cache, or queue, that prefetches a few
instructions before they are executed. A variant of this processor, the 8088, was used in
IBM’s first personal computer, securing the success of Intel.

-80286: This extension of the 8086 enabled addressing a 16-MB memory instead of just 1
MB.

-80386: Intel’s first 32-bit machine, and a major overhaul of the product. With a 32-bit
architecture, the 80386 rivaled the complexity and power of minicomputers and
mainframes introduced just a few years earlier. This was the first Intel processor to support
multitasking, meaning it could run multiple programs at the same time.

-80486: The 80486 introduced the use of much more sophisticated and powerful cache
technology and sophisticated instruction pipelining. The 80486 also offered a built -in math
coprocessor, offloading complex math operations from the main CPU.
-Pentium: With the Pentium, Intel introduced the use of superscalar techniques, which
allow multiple instructions to execute in parallel.

Pentium Pro: The Pentium Pro continued the move into superscalar
organization begun with the Pentium, with aggressive use of register
renaming, branch prediction, data flow analysis, and speculative
execution.

-Pentium II: The Pentium II incorporated Intel MMX technology, which is designed
specifically to process video, audio, and graphics data efficiently.
Lecture note on Computer Architecture and Organization CSC303
-Pentium III: The Pentium III incorporates additional floating point instructions: The
Streaming SIMD Extensions (SSE) instruction set extension added 70 new instructions
designed to increase performance when exactly the same operations are to be performed
on multiple data objects. Typical applications are digital signal processing and graphics
processing.

-Pentium 4: The Pentium 4 includes additional floating point and other enhancements for
multimedia.

-Core: This is the first Intel x86 microprocessor with a dual core, refer ring to the
implementation of two cores on a single chip.

- Core 2: The Core 2 extends the Core architecture to 64 bits. The Core 2 Quad provides
four cores on a single chip. More recent Core offerings have up to 10 cores per chip. An
important addition to the architecture was the Advanced Vector Extensions instruction
set that provided a set of 256-bit, and then 512 bit, instructions for efficient processing of
vector data.

Although the organization and technology of the x86 machines have changed dramatica lly
over the decades, the instruction set architecture has evolved to remain backward
compatible with earlier versions. Thus, any program written on an older version of the x86
architecture can execute on newer versions.
Lecture note on Computer Architecture and Organization CSC303

MODULE THREE

REGISTERS, TYPES OF REGISTERS, 80X86 PROGRAMMING MODEL.

Registers are extremely fast memory locations within the CPU that are used to create and
store the results of CPU operations and other calculations. Computers differ in register
sets, number of registers, register types, and the length of each register. They also differ in
the usage of each register.

General-purpose registers can be used for multiple purposes and assigned to a variety of
functions by the programmer.

Special-purpose registers are restricted to only specific functions. In some cases, some
registers are used only to hold data and cannot be used in the calculations of operand
addresses.

Therefore, the Internal Registers of 8086 is s hown in the figure 8 below

Figure 10: 8086 Internal Registers

• The 8086 has the following groups of the user accessible internal registers. These are
- Instruction pointer(IP)
Lecture note on Computer Architecture and Organization CSC303
_ Four General purpose registers(AX,BX,CX,DX)
_ Four pointer (SP,BP,SI,DI)
_ Four segment registers (CS,DS,SS,ES)
_ Flag Register(FR)

• The 8086 has a total of fourteen 16-bit registers including a 16 bit register called the
status register (flag register), with 9 of bits implemented for status and control flags.

Segment Registers
1) Code segment (CS) is a 16-bit register containing address of 64 KB segment with
processor instructions. The processor uses CS segment for all accesses to instructions
referenced by instruction pointer (IP) register.

2) Stack segment (SS) is a 16-bit register containing address of 64KB segment with
program stack. By default, the processor assumes that all data referenced by the stack
pointer (SP) and base pointer (BP) registers is located in the stack segment. SS register can
be changed directly using POP instruction.

3) Data and Extra segment (DS and ES) is a 16-bit register containing address of 64KB
segment with program data. By default, the processor assumes that all data referenced by
general registers (AX, BX, CX, and DX) and index register (SI, DI) is located in the data
and Extra segment.

Data Registers

1) AX (Accumulator)
• It consists of two 8-bit registers AL and AH, which can be combined together and used
as a 16-bit register AX. AL in this case contains the low order byte of the word, and AH
contains the high-order byte.
*Accumulator can be used for I/O operations and string manipulation.

2) BX (Base register)
• It is consists of two 8-bit registers BL and BH, which can be combined together and
used as a 16-bit register BX. BL in this case contains the low order byte of the word, and
BH contains the high-order byte.
• BX register usually contains a offset for data segment.
Lecture note on Computer Architecture and Organization CSC303

3) CX (Count register)
• It is consists of two 8-bit registers CL and CH, which can be combined together and
used as a 16-bit register CX. When combined, CL register contains the low-order byte of
the word, and CH contains the high-order byte.
• Count register can be used in Loop, shift/rotate instructions and as a counter in string
manipulation.

4) DX (Data register)
• It is consists of two 8-bit registers DL and DH, which can be combined together and
used as a 16-bit register DX. When combined, DL register contains the low-order byte of
the word, and DH contains the high-order byte.
• DX can be used as a port number in I/O operations.
• In integer 32-bit multiply and divide instruction the DX register contains high-order
word of the initial or resulting number.

Pointer register

1. Stack Pointer (SP) is a 16-bit register is used to hold the offset address for stack segment.

2. Base Pointer (BP) is a 16-bit register is used to hold the offset address for stack segment.
i. BP register is usually used for based, based indexed or register indirect
addressing.

ii. The difference between SP and BP is that the SP is used internally to store the
address in case of interrupt and the CALL instruction.

3. Source Index (SI) and Destination Index (DI). These two 16 -bit register is used to hold
the offset address for DS and ES in case of string manipulation instruction.
i. SI is used for indexed, based indexed and register indirect addressing, as well as a
source data addresses in string manipulation instructions.

ii. DI is used for indexed, based indexed and register indirect addressing, as well as
a destination data addresses in string manipulation instructions.
Lecture note on Computer Architecture and Organization CSC303
Instruction Pointer (IP)
It is a 16-bit register. It acts as a program counter and is used to hold the offset address
for CS.

Flag Register
A flag is a 16-bit register containing 9 one bit flags. i.e. the Flag Register is addressable by
bit as shown in the figure below. Each of the bit depicts a status flag of the
microprocessor.

i. Overflow Flag (OF)


_ This flag is set if an overflow occurs. i.e. if the result of a signed operation is large enough to be
accommodated in a destination register.

ii. Direction Flag (DF) –


_ This is used by string manipulation instructions. If this flag bit is ‘0’, the string is processed
beginning from the lowest address to the highest address. i.e. auto-incrementing mode.
_ Otherwise, the string is processed from the highest address towards the lowest address, i.e.
auto-decrementing mode.

iii. Interrupt-enable Flag (IF) –


_ If this flag is set, the maskable interrupts are recognized by the CPU. Otherwise they are
ignored. Setting this bit enables maskable interrupts.

iv. Single-step Flag (TF) –


_ If this flag is set, the processor enters the single step execution mode. In other words, a trap
interrupt is generated after execution of each instruction. The processor executes the current
instruction and the control is transferred to the Trap interrupt service routine.
Lecture note on Computer Architecture and Organization CSC303
v. Sign Flag (SF) –
_ This flag is set when the result of any computation is negative. For signed computations, the
sign flag equals the MSB of the result.

vi. Zero Flag (ZF) - set if the result is zero.

vii. Auxiliary carry Flag (AF) –


_ set if there was a carry from or borrow to bits 0-3 in the AL register.

viii. Parity Flag (PF) –


_ set if parity (the number of "1" bits) in the low-order byte of the result is even.

ix. Carry Flag (CF) –

_This flag is set when there is a carry out of MSB in case of addition or, a borrow in case of
subtraction. For example. When two numbers are added, a carry may be generated out of the
most significant bit position. The carry flag, in this case, will be set to 1’. In case, no carry is
generated, it will be ‘0.

Memory Access Registers

Some registers are used in memory references. Two registers are essential in memory write and
read operations: the memory data register (MDR) and memory address register (MAR). The MDR
and MAR are used exclusively by the CPU and are not directly accessible to programmers

Instruction Fetching Registers

Two main registers are involved in fetching an instruction for execution: the program counter
(PC) and the instruction register (IR). The PC is the register that contains the address of the next
instruction to be fetched. The fetched instruction is loaded in the IR for execution. After a
successful instruction fetch, the PC is updated to point to the next instruction to be executed.
In the case of a branch operation, the PC is updated to point to the branch target instruction after
the branch is resolved, that is, the target address is known.

Condition Registers

Condition registers, or flags, are used to maintain status information. Some architectures
contain a special program status word (PSW) register. The PSW contains bits that are set by the
CPU to indicate the current status of an executing program. These indicators are typically for
arithmetic operations, interrupts, memory protection information, or processor status.
Lecture note on Computer Architecture and Organization CSC303

Special-Purpose Address Registers

- Index Register, used in index addressing. The address of the operand is obtained by
adding a constant to the content of a register, called the index register. The index register
holds an address displacement. Index addressing is indicated in the instruction by
including the
name of the index register in parentheses and using the symbol X to indicate the constant to
be added.

- Segment Pointers used in order to support segmentation, the address issued by the
processor should consist of a segment number (base) and a displacement (or an offset)
within the segment. A segment register holds the address of the base of the segment.

- Stack Pointer. A stack is a data organization mechanism in which the last data item stored
is the first data item retrieved. Two specific operations can be performed on a stack. These
are the Push and the Pop operations. The stack pointer (SP) is used to indicate the stack
location that can be addressed. In the stack push operation, the SP value is used to indicate
the location (called the top of the stack).

ADDRESSING MODE

This refers to the different ways in which operands can be addressed. Addressing mode differ in
the way the address information of operands is specified. The basic addressing modes are;

i. IMMEDIATE ADDRESSING

- The operand is given explicitly as part of the instruction. No memory access is required. Also
operand could follow immediately after the instruction. According to this addressing mode, the
value of the operand is (immediately) available in the instruction itself. For example, the case of
loading the decimal value 9000 into a register Ri. This operation can be performed using an
instruction LOAD 9000, Ri.

In this instruction, the operation to be performed is to load a value into a register. The
source operand is (immediately) given as 9000, and the destination is the register Ri.
Lecture note on Computer Architecture and Organization CSC303
ii. DIRECT (ABSOLUTE) ADRESSING

- The address of operand is given explicitly as part of the instruction. According to this
addressing mode, the address of the memory location that holds the operand is included in the
instruction.
For example, the case of loading the value of the operand stored in memory location 5000 into
register Ri. This operation can be performed using an instruction;

MOV DH, [5000]

In this instruction, the source operand is the value stored in the memory location whose address
is 5000 , and the destination is the register DH.

iii. REGISTER INDIRECT ADDRESSING

- The effective address of the operand is in the register or main memory location whose address
appears in the instruction. The name of a register or a memory location that holds the (effective)
address of the operand is included in the instruction. In order to indicate the use of indirection
in the instruction, it is customary to include the name of the register or the memory location in
parentheses. For example, the

instruction; MOV CL, [BX]

This instruction moves the content of address indicated by SI into CL. i.e. address of operand is
held in register SI.

iv. INDEXED RELATIVE ADDRESSING

- The effective address (EA) of the operand is generated by adding an index register value (X) to
the direct address (DA)

i.e. EA = X + DA

In this addressing mode, the address of the operand is obtained by adding a constant to the
content of a register, called the index register. For example, the instruction

LOAD 5+ [DI], AX.

This instruction loads register AX with the contents of the memory location whose address is the
sum of the contents of register DI and the value 5. Index addressing is indicated in the instruction
by including the name of the index register in parentheses and symbol X indicate the constant
to be added.
Lecture note on Computer Architecture and Organization CSC303
v. BASE RELATIVE ADDRESSING

The effective address of the operand is generated by adding a constant to the direct address
indicated in the instruction. Hence, the address of the operand is obtained by adding a constant
to the content of a base register indicated in parenthesis. For example, the instruction

MOV DX, [BP] +10


This instruction moves the content of register (BP plus 10) into register DX i.e. the effective
address of operand to be moved into DX is obtained by summing the constant 10 with the value
in register BP.

Comparing Intel 8086 and 80x86 PROGRAMMER’S MODEL

Programmer’s model is the available hardware resources of a microprocessor available for


direct programming. These include; the internal registers, addressing mode, instruction set,
and data types. Intel processors, depending on family generation, can be programmed to operate
on integer data, floating point data, and multimedia data. The Figure 11 below shows the
programmer’s model of Intel 8086 and Intel 80x86.
Lecture note on Computer Architecture and Organization CSC303

Figure 11: Intel 8086 vs 80x86 Registers


Lecture note on Computer Architecture and Organization CSC303

The Intel processors architecture are 80x86/Pentium, where x ≥ 3, processors. Its features
include:
 Increased data bus from 16bits to 32 bits
 IA-32 processors are 32 bit integrated processors that can operate on integer and floating
point data
 It is backward compatible with 16 bit 8086 in real mode
 IA-32 operates in real mode by default, hence it has to be switched to protected mode
 Pentium II processors, as a family of IA-32, support MMX, i.e. multimedia data structure
which is SIMD (single instruction multiple data) in nature

IA-32 Processor Internal Registers:

 The Intel 80x86 extends the 4 general purpose registers, the pointer register and the index
registers to 32 bits. It adds 2 extra segment registers FS and GS

 8 of 32 bit Registers (General Purpose Registers, GPRs): EAX, EBX, ECX, EDX, EBP, ESP,
ESI, EDI;

 All the general registers (16 bit/8bit) of 8086 (AX, BX,CX, DX, BP, SP, SI,DI, AH, AL,

BH, BL, CH, CL. DH, DL) and 16 bit IP, and 16 bit Flags ;

 6 of 16 bit Segment Registers: CS, SS, DS, ES, FS,GS ;

 1 of 32 bit Instruction Pointer: EIP ;

 1 of 32 bit Flags: EFP ;

 IA-32 increased address bus from 16 bits to 32 bits so that the physical addressable
memory is

232 = 4GB of memory.

 The default segment register in 32 bit programming is DS register

 The extended registers can all be used as pointer with DS register as the offset unlike 8086
where only SI, DI, BP, SP can be used as pointer.

IA-32 Data Types:

Byte, Word, Double Word, Single precision floating point, Double precision floating point,
Lecture note on Computer Architecture and Organization CSC303

Temporary Real floating point, Packed BCD

IA-32 Addressing Techniques:

(i) Immediate addressing (ii) Register addressing (iii) Direct addressing (v) Register Indirect

(iv) Relative base addressing (v) Relative Index addressing (vi) Based Index addressing

(vii) Relative based index addressing (viii) Scaled index addressing

(i) Immediate Addressing

Operand is specified as source operand in the instruction-e.g

MOV AL, 22h

MOV EBX, 12345678h

(ii) Register Addressing

Operand is in the specified register which is ≥ 80386 registers:


EAX, AX, AL, AH, EBX, BX, BH, BL, ECX, CX, CH, CL, EDX, DX, DH, DL, EBP, BP, ESP,
SP, ESI, SI, EDI, DI

(iii) Direct Addressing

Operand has its memory address specified in the instruction-e.g

MOV AL, [1234h];

MOV EBX, list;

(iv) Register Indirect

▪ Memory address of the operand is pointed to by the register contents of either a base register
(BX, BP), an index register (SI, DI), or any of the general purpose 32 bit registers (EAX, EBX, ECX,
EDX, EBP, ESI, EDI)

▪ Operand size: byte, word, double word

e.g MOV AL, [ECX]


(v) Base + Index Addressing
Lecture note on Computer Architecture and Organization CSC303

-is a register indirect addressing where a combination of (base + index) registers is used as an
operand memory address pointer.

- any pair of the general purpose 32 bit registers (EAX, EBX, ECX, EBP, EDI, ESI) can be used -
the first register is the base and the second is the index regardless of any of the 32 bit general

purpose register used

-e.g MOV [EAX + ECX ] , BL

(vi) Register Relative Addressing

- Effective memory address of an operand is the combination of any of the index


register + displacement, base register + displacement, or any of the general purpose
registers + displacement

- displacement size:2n where n = 2, 3, 4…..e.g

MOV AX, [BX + 4]

MOV AX, [ECX + 4]

(vii) Relative Base + Index Addressing

-Same as register relative addressing except that displacement is added to a pair of base + index
register to form the effective memory address of the operand. Any pair of general purpose
registers + displacement can be used. e.g

MOV AX, [AX + DI + 30]

MOV ECX, ARRAY[EBX + ECX]

(viii) Scaled index addressing

-available on ≥ 80386 only

-the second general purpose register is scaled by a factor of 1, 2, 4, or 8 and added to another
general purpose register to form a memory pointer for the operand.

-e.g

MOV EDX, [EAX + 4 * EBX]


Lecture note on Computer Architecture and Organization CSC303

Scaled index addressing mode allows easy access to multidimensional arrays. In this addressing
mode any of the 32bit registers except ESP can be used as a pointer which is multiplied by a
scaled factor as stated above corresponding to byte, word, doubleword and quadword operands.

Note that only the 32 bit register can be used for scale addressing mode. 16 bit register cannot
be used as a scaled index.

Example: Find the effective address in each of the following cases. Assume that ESI =
200h, ECX =100h, EBX = 50h and EDI = 100h.

1. MOV AX, [2000 + ESI *4] 3. MOV ECX, [2400 + EBX *4]
2. MOV AX, [5000 + ECX *2] 4. MOV DX, [100 + EDI*8]
Solution:

1. 2000h + 200h x 4 = 2000h + 800h = 2800h. Therefore the address of the operand moved into

AX is DS:2800h

2. Effective address = 5000h + 100 x 2 = 5000h + 200h = 5200h.

3. EA= 2400 +50x4= 2400+200= 2600h

4. EA = 100 + 100x8= 100+800 =900h

80X86 FLAG REGISTER BIT AS REGARDS ADD INSTRUCTION

The flag bits affected by the ADD instruction are carry flag (CF), parity flag (PF), auxiliary carry
flag (AF), zero flag (ZF), sign flag (SF) and overflow flag (OF).
CF- This flag is set whenever there is a carry out either from d7 after an 8bit operation or from
d15 after a 16bit data operation.
PF – After certain operations, the parity of the result’s low order byte is checked. If the byte has
an even number of 1’s, the parity flag is set to 1, otherwise it is cleared i.e. 0. Parity is checked for
the lower 8 bits only in a 16 bit operation
AF – If there is a carry from d3 to d4 of an operation, this bit is set, otherwise it is cleared.
Lecture note on Computer Architecture and Organization CSC303

ZF – Is set to 1 if the result of an arithmetic or logical operation is zero, otherwise, it is cleared. SF


– the binary representation of signed numbers uses the most significant bit as the sign bit. After
arithmetic or logic operations, the status of this sign bit is copied into the SF, thereby indicating
the sign of the result.
OF – is set whenever the result of a signed number operation is too large, causing the high order
bit to overflow into the sign bit.
Examples:-Show how the flag register is affected by the addition of 38h and 2Fh in the following
lines of code. MOV BH, 38h ; ADD BH, 2Fh
38h 0011 1000
2Fh 00101111

67h 01100111

CF = 0 since there is no carry beyond d7


PF = 0 since there is an odd number of 1’s in the result
AF = 1 since there is a carry from d3 to d4
ZF = 0 since the result is not zero
SF = 0 since d7 of the result is zero

1. Show how the flag register is affected by the following lines of


code.

MOV AL, 9Ch 9Ch 1001 1100


MOV DH, 64h 64h 0110 0100
ADD AL, DH 00h 0000 0000
CF = 1 since there is a carry beyond d7
PF = 1 since there is an even number of 1’s in the result
AF = 1 since there is a carry from d3 to d4
ZF = 1 since the result is zero
SF = 0 since the d7 of the result is zero

2. Show how the flag register is affected by


MOV BX, AAAAh
ADD BX, 5556 h
AAAAh 1010 1010 1010 1010
Lecture note on Computer Architecture and Organization CSC303

5556h

CF = 1 since there is a carry beyond d15


PF = 1 since there is an even number of 1’s in the lower byte
AF = 1 since there is a carry from d3 to d4
ZF = 1 since the result is zero
SF = 0 since the d15 of the result is zero

3. How would the status flags be set after the processor performed
the 8-bit addition of 101101012 and 100101102 ?

Exercise:

1. Show how the flag register is affected by the instruction

MOV AL, 0F5 h


ADD AL, 0Bh
Lecture note on Computer Architecture and Organization CSC303

MODULE FOUR

Numbering Systems

A single transistor, can only remember one of two possible numbers, a one or a zero. This
isn't useful for anything more complex than controlling a light bulb, so for larger values,
transistors are grouped together so that their combination of ones and zeros can be used to
represent larger numbers.

Some of the methods that are used to represent numbers with groups of transistors or bits are
hereby discussed in number system.
A number system uses a specific radix (base). Radices that are power of 2 are widely used in digital
systems. These radices include binary (base 2), quaternary (base 4), octagonal (base 8), and
hexagonal (base 16). The base 2 binary system is dominant in computer systems.

Binary notation directly represents digital logic states. Hexadecimal numbering system is a
convenient means of representing binary numbers because 1 hexadecimal digit represents four
binary digits. The letters A-F are borrowed for use as hexadecimal digits beyond 9.

The table below shows the relationship between base 2, 10 and 16. Note that in each base,
when one more is added to the highest digit, that digit becomes zero and a 1 is carried to the
next highest digit position. i.e. when all the 3 columns are completed, a carry remains which is
pushed to a fourth column.

Increasing power of 2 forms the weight of the numbers. i.e. 2n where n=0,1……N. The first
few values of the decimal, binary and hexadecimal equivalent is shown in Table 5.
Lecture note on Computer Architecture and Organization CSC303

Table 5: Binary Coded Decimal representation of numbers


Decimal Binary Hexadecimal
0 00000 0
1 00001 1
2 00010 2
3 00011 3
4 00100 4
5 00101 5
6 00110 6
7 00111 7
8 01000 8
9 01001 9
10 01010 A
11 01011 B
12 01100 C
13 01101 D
14 01110 E
15 01111 F
16 10000 10
17 10001 11
18 10010 12
19 10011 13
20 10100 14
21 10101 15
22 10110 16
23 10111 17
24 11000 18
25 11001 19
26 11010 1A
27 11011 1B
28 11100 1C
29 11101 1D
30 11110 1E
31 11111 1F
Lecture note on Computer Architecture and Organization CSC303

Conversion between bases: Binary, Decimal and Hexadecimal

Conversion from decimal to binary and from binary to decimal can be done using the weight
that is associated with each binary bit position. Examples;

1. 19110 = 101111112 can be converted to hexadecimal by grouping the 8 bits into two nibbles
and representing each nibble with a single hexadecimal digit. i.e. (1011) (1111)

1011 = (8 + 2+1)10 = 1110 = B16

1111 = (8+4+2+1)10 = 1510 = F16 .

Therefore 19110 =BF16

2. Convert 2510 to binary


Using the weight; ……....32 16 8 4 2 1

1 1001
16 + 8 +1 = 25. Therefore 2510 = 110012 .

3. Convert 3910 to binary

………32 16 8 4 2 1

1 0 0 11 1

32+4+2+1 = 39. Therefore 3910 = 1001112 .

4. Convert 110012 to decimal

...............16 8 4 2 1

11001

16+8+1= 2510.
5. Convert 29Bh to binary.

2 9 B

0010 1001 1011 put together and drop the leading zeroes
10100110112 .

6. Convert 1111010001002 to hexadecimal


Lecture note on Computer Architecture and Organization CSC303

Group in nibbles, pack with zeroes where needed and represent each nibble with a
hexadecimal digit. 1111 0100 0100

F 4 4

= F44 h

7. Convert 4510 to hexadecimal

45/16 = 2 r 13(D) lsd

2/16 = 0 r 2 msd

Therefore 4510 = 2D16 .

8. Convert 6B216 to decimal

Expand each digit. Start the expansion from the right.

2 x 160 = 2 x1 =2

11 x 161 = 11 x 16 = 176
6 x 162 = 6 x 256 = 1536
2 + 176 + 1536 = 171410 .

9. Add 11012 + 10012 + 101102 .

Binary Decimal
1101 13
1001 9

2 10 .
Lecture note on Computer Architecture and Organization CSC303
10. Add 23D9h + 94BEh. Start from the lsd. If result of addition is less
than 16, write it as result for that position, else subtract 16 from the
result of the addition, write the remainder as answer and carry 1 to
the next digit.

23 D9h
+ 94BEh

B897 h

11. Subtract 59Fh – 2B8h. If the digit to be subtracted (subtrahend) is


greater than the minuend, borrow 16 from the preceding digit.

59Fh
- 2B8 h
2 E7h

You might also like