Pipelining in Computer Architecture
Pipelining is a technique for breaking down a sequential process into various sub-operations
and executing each sub-operation in its own dedicated segment that runs in parallel with all
other segments.
The most significant feature of a pipeline technique is that it allows several computations to run
in parallel in different parts at the same time.
Linear Pipeline and Non-Linear Pipeline
1. Linear Pipeline : Linear pipeline is a pipeline in which a series of processors are
connected together in a serial manner. In linear pipeline the data flows from the first
block to the final block of processor. The processing of data is done in a linear and
sequential manner. The input is supplied to the first block and we get the output from
the last block till which the processing of data is being done. The linear pipelines can be
further be divided into synchronous and asynchronous models. Linear pipelines are
typically used when the data transformation process is straightforward and can be
performed in a single path.
2. 2. Non-Linear Pipeline : Non-Linear pipeline is a pipeline which is made of different
pipelines that are present at different stages. The different pipelines are connected to
perform multiple functions. It also has feedback and feed-forward connections. It is
made such that it performs various function at different time intervals. In Non-Linear
pipeline the functions are dynamically assigned. In Non-Linear pipeline the functions are
dynamically assigned.
Differentiate between linear and non-linear
pipeline.
Feature Linear Pipeline Non-linear Pipeline
Flow of Sequential flow, step- Parallel or branching
Execution by-step progression. flow, multiple paths.
Steps may be
Each step depends on
Dependency independent or have
the previous one.
dependencies.
Less flexible, fixed More flexible, allows for
Flexibility
order of execution. dynamic sequencing.
Enables parallel
Limited parallelism,
Parallelism execution of multiple
one step at a time.
steps.
Generally simpler and Can be more complex
Complexity
easier to understand. due to diverse paths.
Feature Linear Pipeline Non-linear Pipeline
Well-suited for
Suitable for complex,
Use Cases straightforward
interconnected tasks.
processes.
Linear data
Software development,
Examples processing, assembly
decision trees.
line.
Example of Pipelining in computer architecture
Let us consider a real-life example of taking food from a counter:
The entire process of taking food from the counter can be divided into various steps -
Picking utensils, taking salad, taking food, taking vegetables, etc. Now consider the
following two ways of executing this:
1. One person enters and takes utensils, salad, food, vegetables, and leaves. Then another
person enters and repeats the process.
2. People stand in a queue such that when one person is taking vegetables, some other
person will be taking food, someone will be taking salad and utensils.
You can see that the first process will have much lower efficiency than the second. While
one person is taking food, the utensils, salad, and vegetable stalls are unused. On the
other hand, people are simultaneously using the counter in the second process. Thus we
have improved the efficiency of the process just by simultaneously executing multiple
processes. Note that we have not used any extra resources.
Types of Pipeline in Computer Architecture
The pipeline is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
1. Arithmetic Pipeline
An arithmetic pipeline focuses on dividing a single arithmetic operation (like addition,
multiplication, etc.) into smaller stages. These stages could involve fetching operands
from registers, performing the actual arithmetic calculation, and storing the result back
in a register.
By pipelining arithmetic operations, the processor can potentially begin processing the
next instruction while the current instruction is still completing in later stages. This
improves the efficiency of the processor by keeping the arithmetic logic unit (ALU)
constantly working on calculations.
2. Instruction Pipeline
An instruction pipeline breaks down the entire instruction fetch-decode-execute cycle
into distinct stages. This might involve fetching the instruction from memory, decoding it
to understand its operation, fetching operands, performing the operation, and storing
the result.
With instruction pipelining, multiple instructions can be at different stages of execution
concurrently, improving overall processor performance. This is because the processor is
not stuck waiting for one instruction to complete all stages before it can begin
processing the next one.
Advantages of Pipelining
Increased Throughput: By executing multiple instructions concurrently, pipelining allows
a processor to complete more work in a shorter time. Imagine a factory assembly line -
pipelining enables a processor to work on several instructions at once, like different
stages of assembling a product, significantly boosting its output.
Improved Performance: The overall performance of the processor is enhanced due to
the increased throughput. More instructions completed in a shorter time translates to a
faster and more responsive computer system.
Efficient Resource Utilization: Pipelining keeps the functional units of the processor busy
most of the time, reducing idle cycles. By dividing tasks into smaller stages, pipelining
ensures the processor's resources are constantly being used, minimizing wasted time.
Potential for Higher Clock Speeds: Pipelining can enable processors to operate at higher
clock speeds because the work is divided into smaller stages. Each stage can potentially
be completed in a shorter amount of time, allowing the processor to handle instructions
at a faster rate.
Reduced Waiting Time: Pipelining can streamline the processing flow, reducing the time
the processor spends waiting for data or resources. With instructions continuously
moving through the pipeline, the processor experiences less downtime and can focus on
completing tasks more efficiently.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
What are Pipeline Hazards?
As we all know, the CPU’s speed is limited by memory. There’s one more case to
consider, i.e. a few instructions are at some stage of execution in a pipelined design.
There is a chance that these sets of instructions will become dependent on one another,
reducing the pipeline’s pace. Dependencies arise for a variety of reasons, which we will
examine shortly. The dependencies in the pipeline are referred to as hazards since they
put the execution at risk.
We can swap the terms, dependencies and hazards since they are used interchangeably
in computer architecture. A hazard, in essence, prevents an instruction present in the
pipe from being performed during the specified clock cycle. Since each of the
instructions may be in a separate machine cycle, we use the term clock cycle.
Types of Pipeline Hazards in Computer Architecture
The three different types of hazards in computer architecture are:
1. Structural
2. Data
3. Control
Dependencies can be addressed in a variety of ways. The easiest is to introduce a bubble
into the pipeline, which stalls it and limits throughput. The bubble forces the next
instruction to wait until the previous one is completed.
Structural Hazard
Hardware resource conflicts among the instructions in the pipeline cause structural
hazards. Memory, a GPR Register, or an ALU might all be used as resources here. When
more than one instruction in the pipe requires access to the very same resource in the
same clock cycle, a resource conflict is said to arise. In an overlapping pipelined
execution, this is a circumstance where the hardware cannot handle all potential
combinations. Know more about structural hazards here.
Data Hazards
Data hazards in pipelining emerge when the execution of one instruction is dependent
on the results of another instruction that is still being processed in the pipeline. The
order of the READ or WRITE operations on the register is used to classify data threats
into three groups. Know more about data hazards here.
Control Hazards
Branch hazards are caused by branch instructions and are known as control hazards in
computer architecture. The flow of program/instruction execution is controlled by
branch instructions. Remember that conditional statements are used in higher-level
languages for iterative loops and condition testing (correlate with while, for, and if case
statements). These are converted into one of the BRANCH instruction variations. As a
result, when the decision to execute one instruction is reliant on the result of another
instruction, such as a conditional branch, which examines the condition’s consequent
value, a conditional hazard develops. Know more about control hazards in pipelining
here.
Instruction Set Principles
Introduction
The field of computer design encompasses an essential component known as Instruction
Set Architecture (ISA). This critical element involves the segment of the computer system
that interfaces with programmers and compilers. Essentially, ISA serves as the language
through which a computer interprets commands. It delineates a collection of
instructions and protocols governing the communication between software and
hardware, effectively serving as their intermediary interface.
Classifying Instruction Set Architectures
In the realm of computing, the efficient handling and internal storage of data within
processors are vital components contributing to the overall performance of a computer
system. Instruction Set Architectures (ISAs) hold a crucial role in determining the
framework for these internal operations.
One of the fundamental differentiators among ISAs revolves around the type of internal
storage utilized within a processor. Three primary alternatives delineate this aspect:
stack, accumulator, or a set of registers. Each of these storage types operates uniquely,
accommodating diverse architectures tailored to specific computing needs.
Press enter or click to view image in full size
1. Stack Architecture: In stack architecture, operands are inherently positioned atop the
stack.
2. Accumulator Architecture: In this configuration, one operand assumes the role of the
accumulator by default.
3. General-Purpose Register (GPR) Architecture: This category relies on explicit operands,
such as registers or memory locations.
The approach towards operand handling constitutes another critical aspect in classifying
ISAs. Depending on the design choices, explicit operands might be directly accessed
from memory or necessitate temporary storage before processing.
Instruction Level Parallelism
Instruction-Level Parallelism (ILP) refers to the capability of a processor to execute
multiple instructions at the same time. Instead of running each instruction strictly one
after another, ILP uses hardware and compiler techniques to overlap instruction
execution wherever dependencies allow.
Identifies independent instructions and runs them in parallel.
Works within a single processor, not across multiple cores.
Basis of modern CPUs with organised instruction scheduling.
ILP processors have the same execution hardware as RISC processors. The machines
without ILP have complex hardware, which is hard to implement. A typical ILP allows
multiple-cycle operations to be pipelined.
Instruction Level Parallelism (ILP) Architecture
Instruction Level Parallelism is achieved when multiple operations are performed in a
single cycle, which is done by either executing them simultaneously or by utilizing gaps
between two successive operations that are created due to the latencies. Now, the
decision of when to execute an operation depends largely on the compiler rather than
the hardware. However, the extent of the compiler's control depends on the type of ILP
architecture where information regarding parallelism given by the compiler to hardware
via the program varies.
Classification of ILP Architectures
The classification of ILP architectures can be done in the following ways -
Sequential Architecture: Here, the program is not expected to explicitly convey any
information regarding parallelism to hardware, like superscalar architecture.
Dependence Architectures: Here, the program explicitly mentions information regarding
dependencies between operations like dataflow architecture.
Independence Architecture: Here, programme m gives information regarding which
operations are independent of each other so that they can be executed instead of the
'nops.
Advantages of Instruction-Level Parallelism
Improved Performance: ILP can significantly improve the performance of processors by
allowing multiple instructions to be executed simultaneously or out-of-order. This can
lead to faster program execution and better system throughput.
Efficient Resource Utilization: ILP can help to efficiently utilize processor resources by
allowing multiple instructions to be executed at the same time. This can help to reduce
resource wastage and increase efficiency.
Reduced Instruction Dependency: ILP can help to reduce the number of instruction
dependencies, which can limit the amount of instruction-level parallelism that can be
exploited. This can help to improve performance and reduce bottlenecks.
Increased Throughput: ILP can help to increase the overall throughput of processors by
allowing multiple instructions to be executed simultaneously or out-of-order. This can
help to improve the performance of multi-threaded applications and other parallel
processing tasks.
Disadvantages of Instruction-Level Parallelism
Increased Complexity: Implementing ILP can be complex and requires additional
hardware resources, which can increase the complexity and cost of processors.
Instruction Overhead: ILP can introduce additional instruction overhead, which can slow
down the execution of some instructions and reduce performance.
Data Dependency: Data dependency can limit the amount of instruction-level
parallelism that can be exploited. This can lead to lower performance and reduced
throughput.
Reduced Energy Efficiency: ILP can reduce the energy efficiency of processors by
requiring additional hardware resources and increasing instruction overhead. This can
increase power consumption and result in higher energy costs.
Limitations of ILP
1. The Hardware Model
An ideal processor is one where all artificial constraints on ILP are removed. The only limits on
ILP in such a processor are those imposed by the actual data flows either through registers or
memory.
1. Register renaming—There are an infinite number of virtual registers available and hence all
WAW and WAR hazards are avoided and an unbounded number of instructions can begin
execution simultaneously.
2. Branch prediction—Branch prediction is perfect. All conditional branches are predicted
exactly.
3. Jump prediction—All jumps (including jump register used for return and computed jumps)
are perfectly predicted. When combined with perfect branch prediction, this is equivalent to
having a processor with perfect speculation and an unbounded buffer of instructions available
for execution.
4. Memory-address alias analysis—All memory addresses are known exactly and a load
can be moved before a store provided that the addresses are not identical.
Assumptions 2 and 3 eliminate all control dependences. Likewise, assumptions 1 and 4
eliminate all but the true data dependences. Together, these four assumptions mean that any
instruction in the of the program’s execution can be scheduled on the cycle immediately
following the execution of the predecessor on which it depends.
Superscalar Architecture
A more aggressive approach is to equip the processor with multiple processing units to handle
several instructions in parallel in each processing stage. With this arrangement, several
instructions start execution in the same clock cycle, and the process is said to use multiple
issues.
Such processors are capable of achieving an instruction execution throughput of more
than one instruction per cycle.
They are known as 'Superscalar Processors'.
In the above diagram, there is a processor with two execution units; one for integer and one for
floating point operations.
The instruction fetch unit is capable of reading the instructions at a time and storing them in the
instruction queue. In each cycle, the dispatch unit retrieves and decodes up to two instructions
from the front of the queue. If there is one integer, one floating point instruction and no
hazards, both the instructions are dispatched in the same clock cycle.
Example: A Dual-Issue Superscalar CPU
Imagine a processor with:
Two Execution Units: One for integer math (like adding numbers) and one for floating-
point math.
A fast instruction fetch unit that grabs two instructions at once and puts them in a
queue.
A dispatch unit that decodes and sends up to two instructions per cycle.
How it Works in One Clock Cycle:
Fetch grabs two instructions → puts them in queue.
Dispatch checks the first two:
If one is integer and one is floating-point,
And there are no conflicts (hazards),
Then both start executing at the same time.