0% found this document useful (0 votes)
6 views7 pages

Module 3 - Part 2

Pipelining is a technique that decomposes a sequential process into sub-operations executed concurrently across dedicated segments, enhancing processing speed. The document explains the structure of a four-segment pipeline, its speedup ratio compared to non-pipelined processing, and various applications such as instruction and arithmetic pipelines. It also discusses the challenges of instruction-level parallelism and the operation of supercomputers, emphasizing the importance of efficient data handling in pipelined architectures.

Uploaded by

manomitkundu1590
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

Module 3 - Part 2

Pipelining is a technique that decomposes a sequential process into sub-operations executed concurrently across dedicated segments, enhancing processing speed. The document explains the structure of a four-segment pipeline, its speedup ratio compared to non-pipelined processing, and various applications such as instruction and arithmetic pipelines. It also discusses the challenges of instruction-level parallelism and the operation of supercomputers, emphasizing the importance of efficient data handling in pipelined architectures.

Uploaded by

manomitkundu1590
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pipelining

Pipelining is a technique of decomposing a sequential process into sub-operations; with each


sub-process being executed in a special dedicated segment that operates concurrently with all
other segments. A pipeline can be visualized as a collection of processing segments through
which binary information flows.
General Considerations
Any operation that can be decomposed into a sequence of sub-operations of about the same
complexity can be implemented by a pipeline processor. The general structure of a four-
segment pipeline is illustrated in Fig. 46. The operands pass through all four segments in a
fixed sequence.

The space-time diagram of a four-segment pipeline is demonstrated in Fig47.

The speedup(S) of a pipeline processing over an equivalent non-pipeline processing is defined


𝑛𝑡
by the ratio: 𝑆= 𝑛
(𝑘+𝑛−1)𝑡𝑝
As the number of tasks increases, n becomes much larger than 𝑘 − 1, and 𝑘 + 𝑛 − 1
approaches the value of n. Under this condition, the speedup becomes:
𝑡𝑛
𝑆=
𝑡𝑝
numerical example: Let the time it takes to process a sub-operation in each segment be equal
to 𝑡𝑝= 20 ns. Assume that the pipeline has 𝑘 = 4 segments and executes 𝑛 = 100 tasks in
sequence. The pipeline system will take
(𝑘 + 𝑛 − 1)𝑡𝑝 = (4 + 99) × 20 = 2060𝑛𝑠
to complete. Assuming that t = ktp = 4 x 20 = 80 ns,
a non-pipeline system requires:
𝑛𝑘𝑡𝑝 = 100 × 80 = 8000𝑛𝑠
to complete the 100 tasks. The speedup ratio is equal to:
8000⁄
2060 = 3.88
Instruction Pipeline
The computer needs to process each instruction with the following sequence of steps:
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
Figure 48 shows how the instruction cycle in the CPU can be processed with a four-segment
pipeline. While an instruction is being executed in segment 4, the next instruction in sequence is
busy fetching an operand from memory in segment 3.
The four segments are represented in the flowchart:
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the effective address.
3. FO is the segment that fetches the operand.
4. EX is the segment that executes the instruction.
A pipeline operation is said to have been stalled if one unit (stage) requires more time to perform
its function, thus forcing other stages to become idle. Consider, for example, the case of an
instruction fetch that incurs a cache miss. Assume also that a cache miss requires three extra time
units.

Instruction-Level Parallelism
Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the idea of
multiple issue processors (MIP). An MIP has multiple pipelined datapaths for instruction
execution. Each of these pipelines can issue and execute one instruction per cycle. Figure 49
shows the case of a processor having three pipes. For comparison purposes, we also show in the
same figure the sequential and the single pipeline case.
Arithmetic Pipeline
Pipeline arithmetic units are usually found in very high speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
an example of a pipeline unit for floating-point addition and subtraction. The inputs to the
floating-point adder pipeline are two normalized floating-point binary numbers.

A, B are two fractions that represent the mantissas and a, b are the exponents. The sub-
operations that are performed in the four segments are:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.
Numerical example may clarify the sub-operations performed in each segment. For simplicity,
we use decimal numbers, although Fig.49 refers to binary numbers. Consider the two normalized
floating-point numbers:

The two exponents are subtracted in the first segment to obtain (3 − 2 = 1). The larger exponent
3 is chosen as the exponent of the result. The next segment shifts the mantissa of Y to the right
to obtain:

This aligns the two mantissas under the same exponent. The addition of the two mantissas in
segment 3 produces the sum:
Suppose that the time delays of the four segments are 𝑡1 = 60𝑛𝑠, 𝑡2 = 70𝑛𝑠, 𝑡3 = 100𝑛𝑠,
𝑡4 = 80𝑛𝑠, and the interface registers have a delay of 𝑡𝑟 = 10𝑛𝑠. The clock cycle is chosen to be
𝑡𝑝 = 𝑡3 + 𝑡𝑟 = 110𝑛𝑠 . An equivalent non-pipeline floating point adder-subtractor will have
a delay time 𝑡𝑛 = 𝑡1 + 𝑡2 + 𝑡3 + 𝑡4 + 𝑡𝑟 = 320𝑛𝑠. In this case the pipelined adder has a speedup
of 320/110 = 2.9 over the non-pipelined adder.
Supercomputers
Supercomputers are very powerful, high-performance machines used mostly for scientific
computations. To speed up the operation, the components are packed tightly together to minimize
the distance that the electronic signals have to travel. Supercomputers also use special techniques
for removing the heat from circuits to prevent them from burning up because of their close
proximity.
A supercomputer is a computer system best known for its high computational speed, fast and
large memory systems, and the extensive use of parallel processing.
Delayed Branch
Consider now the operation of the following four instructions:

If the three-segment pipeline proceeds: (I: Instruction fetch, A:ALU operation, and E: Execute
instruction) without interruptions, there will be a data conflict in instruction 3 because the operand
in R2 is not yet available in the A segment. This can be seen from the timing of the pipeline
shown in Fig. 50(a). The E segment in clock cycle 4 is in a process of placing the memory data
into R2. The A segment in clock cycle 4 is using the data from R2, but the value in R2 will not
be the correct value since it has not yet been transferred from memory. It is up to the compiler
to make sure that the instruction following the load instruction uses the data fetched from
memory. It was shown in Fig. 50 that a branch instruction delays the pipeline operation by NOP
instruction until the instruction at the branch address is fetched.

You might also like