0% found this document useful (0 votes)

7 views10 pages

Unit III Pipelining

The document discusses pipelining as a method to enhance CPU performance by allowing simultaneous execution of multiple instructions, illustrated through a water bottle packaging analogy. It outlines the design of a basic pipeline, execution sequences, and the performance metrics of pipelined versus non-pipelined processors. Additionally, it covers various parallel processor systems, cache coherence issues, and methods to resolve these issues, including different cache coherence protocols.

Uploaded by

mamta devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views10 pages

Unit III Pipelining

Uploaded by

mamta devi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT III

Pipelining
To improve the performance of a CPU we have two options: 1) Improve the
hardware by introducing faster circuits. 2) Arrange the hardware such that
more than one operation can be performed at the same time. Since there is a
limit on the speed of hardware and the cost of faster circuits is quite high, we
have to adopt the 2nd option.
Pipelining is a process of arrangement of hardware elements of the CPU
such that its overall performance is increased. Simultaneous execution of
more than one instruction takes place in a pipelined processor. Let us see a
real-life example that works on the concept of pipelined operation. Consider
a water bottle packaging plant. Let there be 3 stages that a bottle should
pass through, Inserting the bottle(I), Filling water in the bottle(F), and Sealing
the bottle(S). Let us consider these stages as stage 1, stage 2, and stage 3
respectively. Let each stage take 1 minute to complete its operation. Now, in
a non-pipelined operation, a bottle is first inserted in the plant, after 1 minute
it is moved to stage 2 where water is filled. Now, in stage 1 nothing is
happening. Similarly, when the bottle moves to stage 3, both stage 1 and
stage 2 are idle. But in pipelined operation, when the bottle is in stage 2,
another bottle can be loaded at stage 1. Similarly, when the bottle is in stage
3, there can be one bottle each in stage 1 and stage 2. So, after each
minute, we get a new bottle at the end of stage 3. Hence, the average time
taken to manufacture 1 bottle is:
Without pipelining = 9/3 minutes = 3m
I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)
With pipelining = 5/3 minutes = 1.67m
I F S | |
| I F S |
| | I F S (5 minutes)
Thus, pipelined operation increases the efficiency of a system.
Design of a basic pipeline
• In a pipelined processor, a pipeline has two ends, the input end and the
output end. Between these ends, there are multiple stages/segments
such that the output of one stage is connected to the input of the next
stage and each stage performs a specific operation.
• Interface registers are used to hold the intermediate output between two
stages. These interface registers are also called latch or buffer.
• All the stages in the pipeline along with the interface registers are
controlled by a common clock.
Execution in a pipelined processor Execution sequence of instructions in
a pipelined processor can be visualized using a space-time diagram. For
example, consider a processor having 4 stages and let there be 2
instructions to be executed. We can visualize the execution sequence
through the following space-time diagrams:
Non-overlapped execution:
Stage / Cycle 1 2 3 4 5 6 7 8

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2

Total time = 8 Cycle

Overlapped execution:
Stage / Cycle 1 2 3 4 5

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2

Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage

instruction pipeline to execute all the instructions in the RISC instruction set.
Following are the 5 stages of the RISC pipeline with their respective
operations:
• Stage 1 (Instruction Fetch) In this stage the CPU reads instructions from
the address in the memory whose value is present in the program
counter.
• Stage 2 (Instruction Decode) In this stage, instruction is decoded and
the register file is accessed to get the values from the registers used in
the instruction.
• Stage 3 (Instruction Execute) In this stage, ALU operations are
performed.
• Stage 4 (Memory Access) In this stage, memory operands are read and
written from/to the memory that is present in the instruction.
• Stage 5 (Write Back) In this stage, computed/fetched value is written
back to the register present in the instructions.
Performance of a pipelined processor Consider a ‘k’ segment pipeline
with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be completed in the
pipelined processor. Now, the first instruction is going to take ‘k’ cycles to
come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’
cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute ‘n’
instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of ‘n’
instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor,
when ‘n’ tasks are executed on the same processor is:
S = Performance of pipelined processor /
Performance of non-pipelined processor
As the performance of a processor is inversely proportional to the execution
time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k
S = n * k / n
S = k
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given
speed up / Max speed up = S / S max We know that Smax = k So, Efficiency =
S / k Throughput = Number of instructions / Total time to complete the
instructions So, Throughput = n / (k + n – 1) * Tp Note: The cycles per
instruction (CPI) value of an ideal pipelined processor is 1 Please see Set 2 for
Dependencies and Data Hazard and Set 3 for Types of pipeline and Stalling.
What are the types of Parallel Processor
System in Computer Architecture?
Parallel processing systems are created to speed up the implementation of programs
by breaking the program into several fragments and processing these fragments
together. Such systems are multiprocessor systems also referred to as tightly coupled
systems. Parallel processors can be divided into the following four groups based on
the number of instructions and data streams are as follows −
SISD Computer Organization(Single Instruction and
Single Data Stream)
SISD represents a computer organization with a control unit, a processing unit, and a
memory unit. SISD is like the serial computer in use. SISD executes instructions
sequentially and they may or may not have parallel processing capabilities.
Instructions executed sequentially may get overlapped in their execution stages. A
SISD computer can have greater than one functional unit in it. But all the functional
units are below the administration of one control unit. Parallel processing in such
systems can be attained by pipeline processing or by using multiple functional units.

SIMD Computer Organization(Single Instruction,

Multiple Data)
SIMD organization includes multiple processing elements. All these elements are
below the administration of a common control unit. All processors get identical
instruction from the control unit but work on multiple data items.
The shared subsystem contains multiple modules which help in communicating with
all the processors simultaneously. This is further divided into word slice and bit-slice
mode organizations.
MISD Computer Organization(Multiple Instruction and
Single Data stream)
MISD organization includes multiple processing units, each receiving separate
instructions operating over a similar data flow. The result of one processor becomes
the input of the next processor. The introduction of this organization received less
attention and was not practically implemented in architecture. The structure was of
only theoretical interest.

MIMD Computer Organization(multiple instruction,

multiple data)
A MIMD computer organization contains interactions among the multiprocessors
since all memory flows are changed from the common data area transmitted by all
processors. If the multi-data streams were derived from different shared memories
then it is a multiple SISD operation that is equal to a set of ‘n’ independent SISD
systems.

Cache Coherence
The cache coherence protocol is discussed in this article as a solution to the multicache
inconsistency issues.

Cache Coherence
A cache coherence issue results from the concurrent operation of several processors
and the possibility that various caches may hold different versions of the identical
memory block. The practice of cache coherence makes sure that alterations in the
contents of associated operands are quickly transmitted across the system.

Cache coherence has three different levels:

o Each writing operation seems to happen instantly.
o Each operand's value changes are seen in every processor in precisely the same order.
o Non-coherent behavior results from many processors interpreting the same action in
various ways.

Methods to resolve Cache Coherence

The two methods listed below can be used to resolve the cache coherence issue:

o Write Through
o Write Back
Write Through
The easiest and most popular method is to write through. Every memory write operation
updates the main memory. If the word is present in the cache memory at the requested
address, the cache memory is also updated simultaneously with the main memory.

The benefit of this approach is that the RAM and cache always hold the same information. In
systems with direct memory access transfer, this quality is crucial. It makes sure the
information in the main memory is up-to-date at all times so that a device interacting over
DNA can access the most recent information.

Advantage - It provides the highest level of consistency.

Disadvantage - It requires a greater number of memory access.

Write Back
Only the catch location is changed during a write operation in this approach. When the word
is withdrawn from the cache, the place is flagged, so it is replicated in the main memory. The
right-back approach was developed because words may be updated numerous times while
they are in the cache. However, as long as they are still there, it doesn't matter whether the
copy that is stored in the main memory is outdated because requests for words are fulfilled
from the cache.

An accurate copy must only be transferred back to the main memory when the word is
separated from the cache. According to the analytical findings, between 10% and 30% of all
memory references in a normal program are written into memory.

Advantage - A very small number of memory accesses and write operations.

Disadvantage - Inconsistency may occur in this approach.

The important terms related to the data or information stored in the cache as well as
in the main memory are as follows:

o Modified - The modified term signifies that the data stored in the cache and main
memory are different. This means the data in the cache has been modified, and the
changes need to be reflected in the main memory.
o Exclusive - The exclusive term signifies that the data is clean, i.e., the cache and the
main memory hold identical data.
o Shared - Shared refers to the fact that the cache value contains the most current data
copy, which is then shared across the whole cache as well as main memory.
o Owned - The owned term indicates that the block is currently held by the cache and
that it has acquired ownership of it, i.e., complete privileges to that specific block.
o Invalid - When a cache block is marked as invalid, it means that it needs to be fetched
from another cache or main memory.

Below is a list of the different Cache Coherence Protocols used in multiprocessor

systems:

o MSI protocol (Modified, Shared, Invalid)

o MOSI protocol (Modified, Owned, Shared, Invalid)
o MESI protocol (Modified, Exclusive, Shared, Invalid)
o MOESI protocol (Modified, Owned, Exclusive, Shared, Invalid)

These protocols are discussed below:

1. MSI Protocol

This is a fundamental cache coherence mechanism that is utilized in multiprocessor

systems. A cache may be in any state indicated by the protocol name's letters.
Therefore, for MSI, each block may be in one of the following states:

o Modified - In other words, the data in the cache is incompatible with the main memory,
and this status denotes that the block has been updated in the cache. Therefore, when
the data from the cache block is removed and is in the Modified (M) state, the cache is
responsible for writing the block to the main memory.
o Shared - At least one cache has at least one copy of this block, which has not been
updated. The cache might be removed without writing the data to the backup store.
o Invalid - If this block is going to be stored in this cache, it must be obtained from RAM
or from a different cache because it is invalid.

2. MOSI Protocol

It has one extra state than the MSI protocol, which is discussed below:

Owned - It is used to signify the ownership of the current processor to this block and
will respond to inquiries if another processor wants this block.

3. MESI Protocol

The protocol for cache coherence that is utilized the most is this one. Each cache line
bears a status indicating one of the following:
o Modified - As mentioned above, this term signifies that the data stored in the cache
and main memory are different. This means the data in the cache has been modified,
and the changes need to be reflected in the main memory.
o Exclusive - The exclusive term signifies that the data is clean, i.e., the cache and the
main memory hold identical data.
o Shared - This signifies that other caches on the computer may also hold this cache line.
o Invalid - This indicates that this cache line is marked as invalid by the word "invalid."

4. MOESI Protocol

This protocol provides comprehensive cache coherence, covering all potential states
that are frequently utilized in other protocols. There are one of the following statuses
for each cache line:

o Modified - While the copy in main memory is inaccurate and no other processors are
holding copies, a cache line in this condition contains the most recent, accurate copy
of the data.
o Owned - The most current, accurate copy of the data is stored in a cache line in this
state. In that other processors can store copies of the most recent, accurate data
comparable to the shared state; unlike the shared state, copies in main memory can be
inaccurate. One processor can only own the data at a time, and the remaining
processor can have the data in the shared state.
o Exclusive - The most current, accurate copy of the data is stored on a cache line in this
state. Since no other storage location has a copy of the data, the ram copy is also the
most recent and accurate copy.
o Shared - The most current, accurate copy of the data is stored on a cache line in this
condition. Additional system processors may also store data copies in the shared state.
If no other processor has ownership of the data, the copy in primary memory also
represents the most recent and accurate version of the data.
o Invalid - In this situation, a cache line doesn't contain a reliable copy of the data. Still,
valid data can be found in primary memory or another processor's cache.

Types of Coherence:
There exist three varieties of coherence referred to the coherency mechanisms, which
are listed below:
1. Directory Based - A directory-based system keeps the coherence amongst caches by
storing shared data in a single directory. In order to load an entry from primary memory
into its cache, the processor must request permission through the directory, which
serves as a filter. The directory either upgrades or devalues the other caches that
contain that record when a record is modified.
2. Snooping - Individual caches watch address lines during the snooping process to look
for accesses to memory locations that they have cached. A write invalidate protocol is
what it is known as. When a write activity is seen to a memory address for which a
cache maintains a copy, the cache controller invalidates its own copy of the snooped
memory location.
3. Snarfing - A cache controller uses this approach to try and update its own copy of a
memory location when a second master alters a place in the main memory by keeping
an eye on both the address and the contents. The cache controller updates its own
copy of the underlying memory location with the new data when a write action is
detected to a place of which a cache holds a copy.

Parallel Processing Structures in COA
No ratings yet
Parallel Processing Structures in COA
24 pages
Pipelined Architecture Overview and Diagram
No ratings yet
Pipelined Architecture Overview and Diagram
20 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
8 pages
CPU Functions and Parallel Computing Overview
No ratings yet
CPU Functions and Parallel Computing Overview
26 pages
Understanding Parallel Processing Techniques
No ratings yet
Understanding Parallel Processing Techniques
16 pages
Understanding Parallel Processing Techniques
No ratings yet
Understanding Parallel Processing Techniques
29 pages
Understanding Parallel Processing Systems
No ratings yet
Understanding Parallel Processing Systems
77 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
46 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
14 pages
Understanding Instruction Pipelining
No ratings yet
Understanding Instruction Pipelining
13 pages
Pipelining Enhances Instruction Throughput
No ratings yet
Pipelining Enhances Instruction Throughput
26 pages
Pipelining Concepts and Architecture
No ratings yet
Pipelining Concepts and Architecture
17 pages
Pipe Lining
No ratings yet
Pipe Lining
15 pages
Understanding Processor Architecture and Parallelism
No ratings yet
Understanding Processor Architecture and Parallelism
90 pages
Understanding Pipelining Techniques
No ratings yet
Understanding Pipelining Techniques
25 pages
Pipe Lining
No ratings yet
Pipe Lining
5 pages
Pipelining in CPU: Stages and Performance
No ratings yet
Pipelining in CPU: Stages and Performance
14 pages
Understanding Pipelining Concepts
No ratings yet
Understanding Pipelining Concepts
20 pages
Pipelining and Vector Processing Techniques
No ratings yet
Pipelining and Vector Processing Techniques
40 pages
Pipelining: Boosting CPU Performance
No ratings yet
Pipelining: Boosting CPU Performance
17 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
28 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
63 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
29 pages
Pipelining vs. Parallel Processing Explained
No ratings yet
Pipelining vs. Parallel Processing Explained
32 pages
Pipelining in Computer Architecture Explained
No ratings yet
Pipelining in Computer Architecture Explained
11 pages
Pipe Lining
No ratings yet
Pipe Lining
12 pages
Pipelining and Parallel Processing Overview
No ratings yet
Pipelining and Parallel Processing Overview
12 pages
Parallel Processing in Computer Architecture
No ratings yet
Parallel Processing in Computer Architecture
71 pages
Lecture 5 Computer Architecture
No ratings yet
Lecture 5 Computer Architecture
16 pages
CSC-313 Notes-P7 Pipelining CH 16
No ratings yet
CSC-313 Notes-P7 Pipelining CH 16
24 pages
Parallelism in Uniprocessor Systems
100% (5)
Parallelism in Uniprocessor Systems
5 pages
Pipe Lining
No ratings yet
Pipe Lining
22 pages
COA Unit 5
No ratings yet
COA Unit 5
22 pages
Understanding Pipelining in Microprocessors
No ratings yet
Understanding Pipelining in Microprocessors
28 pages
Pipelining in Multi-Core Processors
No ratings yet
Pipelining in Multi-Core Processors
55 pages
Pipelining Stages and Throughput Analysis
No ratings yet
Pipelining Stages and Throughput Analysis
7 pages
Pipelining Design Architecture - 095634
No ratings yet
Pipelining Design Architecture - 095634
6 pages
ACA1
No ratings yet
ACA1
26 pages
Pipeline and Vector Processingvvvvvvvvvvvvv
No ratings yet
Pipeline and Vector Processingvvvvvvvvvvvvv
10 pages
Pipelining and Vector Processing Overview
No ratings yet
Pipelining and Vector Processing Overview
33 pages
Basics of Pipelining
No ratings yet
Basics of Pipelining
8 pages
Advanced Computer Architectures Overview
No ratings yet
Advanced Computer Architectures Overview
45 pages
Pipelining in Parallel Processing
No ratings yet
Pipelining in Parallel Processing
63 pages
Pipelining and Parallel Processing Overview
No ratings yet
Pipelining and Parallel Processing Overview
46 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
53 pages
Understanding CPU Pipelining Techniques
No ratings yet
Understanding CPU Pipelining Techniques
37 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
8 pages
Parallel and Pipeline Processing Techniques
No ratings yet
Parallel and Pipeline Processing Techniques
10 pages
Module 5 Pipeline Vector Processing
No ratings yet
Module 5 Pipeline Vector Processing
20 pages
CPU Pipeline Design Principles
No ratings yet
CPU Pipeline Design Principles
32 pages
Pipeline and Vector Processing Overview
No ratings yet
Pipeline and Vector Processing Overview
16 pages
Understanding CPU Pipelining Techniques
No ratings yet
Understanding CPU Pipelining Techniques
28 pages
Unit 5
No ratings yet
Unit 5
11 pages
Understanding Pipelining Concepts
No ratings yet
Understanding Pipelining Concepts
23 pages
Understanding Pipelining in CPUs
No ratings yet
Understanding Pipelining in CPUs
50 pages
Understanding Pipelining in Computing
No ratings yet
Understanding Pipelining in Computing
21 pages
Lecture 09
No ratings yet
Lecture 09
6 pages
Understanding Parallel Processing Techniques
No ratings yet
Understanding Parallel Processing Techniques
41 pages
Instruction Execution and Pipelining Explained
No ratings yet
Instruction Execution and Pipelining Explained
42 pages
Computer Programmig 143 - 2023 - Framework - Eng - Signed
No ratings yet
Computer Programmig 143 - 2023 - Framework - Eng - Signed
3 pages
Amazon SDE Intern Interview Prep - Top Questions by Topic
No ratings yet
Amazon SDE Intern Interview Prep - Top Questions by Topic
17 pages
JTAG as a Fault Injection Tool
No ratings yet
JTAG as a Fault Injection Tool
5 pages
Introduction to Data Structures & Algorithms
No ratings yet
Introduction to Data Structures & Algorithms
11 pages
Binary Tree Structures and Traversals
No ratings yet
Binary Tree Structures and Traversals
40 pages
AI in Education: Research and Policies
No ratings yet
AI in Education: Research and Policies
56 pages
High-Level Programming Languages List
No ratings yet
High-Level Programming Languages List
3 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
13 pages
Candidate Email List Overview
No ratings yet
Candidate Email List Overview
10 pages
Computer Scinece - 09 Answer Sheet
No ratings yet
Computer Scinece - 09 Answer Sheet
5 pages
LeetCode Meta Overview for Facebook
100% (1)
LeetCode Meta Overview for Facebook
84 pages
OOP Exam Paper for BSE 3B Students
No ratings yet
OOP Exam Paper for BSE 3B Students
8 pages
CSH204B-P: Algorithms Lab Report
No ratings yet
CSH204B-P: Algorithms Lab Report
33 pages
A Hybrid PSO-Fuzzy Trust Energy Aware DRP in Wireless Sensor Network - SpringerLink
No ratings yet
A Hybrid PSO-Fuzzy Trust Energy Aware DRP in Wireless Sensor Network - SpringerLink
8 pages
Key Concepts in Software Development
No ratings yet
Key Concepts in Software Development
16 pages
Lec 21
No ratings yet
Lec 21
39 pages
Computer Science Pre-Term Exam XII
No ratings yet
Computer Science Pre-Term Exam XII
8 pages
Python List Manipulation Exercises
No ratings yet
Python List Manipulation Exercises
3 pages
Badal Ke PDF
No ratings yet
Badal Ke PDF
32 pages
DSA-233: Doubly Linked List Operations
No ratings yet
DSA-233: Doubly Linked List Operations
8 pages
Error Handling and Validation in C#
No ratings yet
Error Handling and Validation in C#
5 pages
Micron IT Assessment Study Guide
No ratings yet
Micron IT Assessment Study Guide
2 pages
2023thesis-Shapley Value Based Multi-Agent Reinforcement Learning
No ratings yet
2023thesis-Shapley Value Based Multi-Agent Reinforcement Learning
207 pages
Climatological Norms for Cuajimoloyas
No ratings yet
Climatological Norms for Cuajimoloyas
19 pages
BogoSort: An Inefficient Sorting Method
No ratings yet
BogoSort: An Inefficient Sorting Method
7 pages
Java Packages and Exception Handling
No ratings yet
Java Packages and Exception Handling
9 pages
Data Science R Functions Question Bank
No ratings yet
Data Science R Functions Question Bank
16 pages
Python Conditional Statements Answers
No ratings yet
Python Conditional Statements Answers
4 pages
Number System Conversion Guide
No ratings yet
Number System Conversion Guide
3 pages
Data Structures Exam Paper 2021
No ratings yet
Data Structures Exam Paper 2021
9 pages

Unit III Pipelining

Uploaded by

Unit III Pipelining

Uploaded by

UNIT III

Total time = 8 Cycle

Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage

SIMD Computer Organization(Single Instruction,

MIMD Computer Organization(multiple instruction,

Cache coherence has three different levels:

Methods to resolve Cache Coherence

Advantage - It provides the highest level of consistency.

Disadvantage - It requires a greater number of memory access.

Advantage - A very small number of memory accesses and write operations.

Disadvantage - Inconsistency may occur in this approach.

Below is a list of the different Cache Coherence Protocols used in multiprocessor

o MSI protocol (Modified, Shared, Invalid)

These protocols are discussed below:

This is a fundamental cache coherence mechanism that is utilized in multiprocessor

You might also like