0% found this document useful (0 votes)

45 views50 pages

Instruction Pipelining in Operating Systems

The document discusses pipelining as a technique to improve processor performance by overlapping the execution of instructions, and describes different pipeline stages and designs including two-stage, six-stage, and dealing with branches. It also introduces parallel processing concepts such as symmetric multiprocessors, clusters, and non-uniform memory access systems. Finally, it categorizes different multiple processor organizations including single instruction single data, single instruction multiple data, multiple instruction single data, and multiple instruction multiple data architectures.

Uploaded by

rajkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views50 pages

Instruction Pipelining in Operating Systems

Uploaded by

rajkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

PIPELINING, INTRODUCTION TO PARALLEL PROCESSING

AND OPERATING SYSTEM

1
PIPELINING
 greater performance can be achieved by taking advantage of
improvements in technology, such as faster circuitry.
 In addition, organizational enhancements to the processor can
improve performance.
 We have already seen some examples of this, such as the use of
multiple registers rather than a single accumulator, and the use
of a cache memory.
 Another approach is Instruction Pipelining

2
PIPELINING
 new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end
 an instruction has a number of stages.

 Two stage Instruction Pipeline has two stages:

 Fetch
 Execute

3
TWO-STAGE PIPELINING

4
PROBLEMS WITH TWO-STAGE PIPELINE
 The execution time will generally be longer than the
fetch time. Thus, the fetch stage may have to wait for
some time before it can empty its buffer.
 A conditional branch instruction makes the address of
the next instruction to be fetched unknown. Thus, the
fetch stage must wait until it receives the next instruction
address from the execute stage. The execute stage may
then have to wait while the next instruction is fetched.

5
SIX-STAGE PIPELINE
 To gain further speedup, the pipeline must have more stages.
 Fetch instruction (FI): Read the next expected instruction into a
buffer.
 Decode instruction (DI): Determine the opcode and the operand
specifiers.
 Calculate operands (CO): Calculate the effective address of each
source operand. This may involve displacement, register indirect,
indirect, or other forms of address calculation.
 Fetch operands (FO): Fetch each operand from memory.
Operands in registers need not be fetched.
 Execute instruction (EI): Perform the indicated operation and
store the result, if any, in the specified destination operand
location.
 Write operand (WO): Store the result in memory. 6
SIX-STAGE PIPELINE
 With this decomposition, the various stages will be of
more nearly equal duration.
 For the sake of illustration, let us assume equal duration.

 Using this assumption, Figure below shows that a six-

stage pipeline can reduce the execution time for 9
instructions from 54 time units to 14 time units.

7
SIX-STAGE PIPELINE

8
SIX-STAGE PIPELINE
 Factors that limit performance
 Ifthe six stages are not of equal duration, there will be some
waiting involved at various pipeline stages
 Conditional Brach instruction-invalidate several instruction
fetches
 Interrupt

9
BRANCH PENALTY

10
THE LOGIC NEEDED FOR
SIX STAGE
INSTRUCTION PIPELINE

11
DEALING WITH BRANCHES
 A variety of approaches have been taken for dealing with
conditional branches:
 Multiple streams
 Prefetch branch target
 Loop buffer
 Branch prediction
 Delayed branch

12
MULTIPLE STREAMS
 A simple pipeline suffers a penalty for a branch
instruction because it must choose one of two
instructions to fetch next and may make the wrong
choice. A brute-force approach is to replicate the initial
portions of the pipeline and allow the pipeline to fetch
both instructions, making use of two streams.
 There are two problems with this approach:
 With multiple pipelines there are contention delays for access
to the registers and to memory.
 Additional branch instructions may enter the pipeline (either
stream) before the original branch decision is resolved. Each
such instruction needs an additional stream. 13
PREFETCH BRANCH TARGET
 When a conditional branch is recognized, the target of
the branch is prefetched, in addition to the instruction
following the branch.
 This target is then saved until the branch instruction is
executed. If the branch is taken, the target has already
been prefetched.

14
LOOP BUFFER
 A loop buffer is a small, very-high-speed memory
maintained by the instruction fetch stage of the pipeline
and containing the n most recently fetched instructions, in
sequence.
 If a branch is to be taken, the hardware first checks
whether the branch target is within the buffer. If so, the
next instruction is fetched from the buffer
 Benefits:
 Reduces memory access
 useful for the rather common occurrence of IF–THEN and IF–
THEN–ELSE sequences
 well suited to dealing with loops, or iterations; hence the name 15
loop buffer
BRANCH PREDICTION (1)
 Predict never taken
 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result (O/S v CPU design)
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction

16
BRANCH PREDICTION (2)
 Predict by Opcode
 Some instructions are more likely to result in a jump than
others
 Can get up to 75% success

 Taken/Not taken switch

 Based on previous history
 Good for loops

17
BRANCH PREDICTION (3)
 Delayed Branch
 Do not take jump until you have to
 Rearrange instructions

18
BRANCH PREDICTION FLOWCHART

19
PARALLEL PROCESSING
A traditional way to increase system performance is
to use multiple processors that can execute in parallel to
support a given workload.
The two most common multiple-processor
organizations are symmetric multiprocessors (SMPs) and
clusters.
More recently, nonuniform memory access (NUMA)
systems have been introduced commercially.

20
Symmetric multiprocessors
An SMP consists of multiple similar processors within
the same computer, interconnected by a bus or some sort
of switching arrangement.

The most critical problem to address in an SMP is

that of cache coherence.

Each processor has its own cache and so it is possible

for a given line of data to be present in more than one
cache.
21
If such a line is altered in one cache, then both
main memory and the other cache have an invalid
version of that line. Cache coherence protocols are
designed to cope with this problem.

chip multiprocessing :
When more than one processor are implemented on a
single chip, the configuration is referred to as chip
multiprocessing.

22
Multithreaded processor:
A related design scheme is to replicate some of the
components of a single processor so that the processor can
execute multiple threads concurrently; this is known as a
multithreaded processor.

Cluster:
A cluster is a group of interconnected, whole computers
working together as a unified computing resource that can
create the illusion of being one machine.
The term whole computer means a system that can run on
its own, apart from the cluster. 23
Non-uniform memory access

A NUMA system is a shared-memory multiprocessor in
which the access time from a given processor to a word
in memory varies with the location of the memory word.

24
MULTIPLE PROCESSOR ORGANIZATION

I. Single instruction, single data stream - SISD

II. Single instruction, multiple data stream - SIMD
III. Multiple instruction, single data stream - MISD
IV. Multiple instruction, multiple data stream- MIMD

25
SINGLE INSTRUCTION, SINGLE DATA
STREAM - SISD
 Single processor
 Single instruction stream
 Data stored in single memory
 Uni-processor

26
SINGLE INSTRUCTION, MULTIPLE DATA
STREAM - SIMD
 Single machine instruction Controls simultaneous
execution of a Number of processing elements on
Lockstep basis
 Each processing element has associated data memory

 Each instruction executed on different set of data by

different processors
 Vector and array processors

27
MULTIPLE INSTRUCTION, SINGLE
DATA STREAM - MISD
 Sequence of data Transmitted to set of processors
 Each processor executes different instruction sequence

 Never been implemented

28
MULTIPLE INSTRUCTION, MULTIPLE DATA STREAM- MIMD
 Set of processors Simultaneously execute different
instruction sequences on Different sets of data
 SMPs, clusters and NUMA systems

29
TAXONOMY OF PARALLEL PROCESSOR
ARCHITECTURES

30
OPERATING SYSTEM
Operating Systems Definition: The set of software
products that jointly controls the system resources and
processes using these resources on a computer.

It can be thought of as having two objectives:

Convenience: An OS makes a computer more
convenient to use.
Efficiency: An OS allows the computer system resources
to be used in an efficient manner.

31
An operating system (OS) is a software program that
manages the hardware and software resources of a
computer.

The OS performs basic tasks, such as:

 Controlling and allocating memory
 Prioritizing the processing of instructions
 Controlling input and output devices
 Facilitating networking
 Managing files.

32
Layers and Views of a Computer System:

33
Operating System Services:
 Program creation
 Program execution
 Access to I/O devices
 Controlled access to files
 System access
 Error detection and response
 Accounting

34
Kernel:
An operating system generally consists of two parts:
• Kernel space (kernel mode) and
• User space (user mode).
Kernel is the smallest and central component of an
operating system.
Its services include managing memory and devices and
also to provide an interface for software applications to
use the resources.
Additional services such as managing protection of
35
programs and multitasking may be included depending
on architecture of operating system.
There are three broad categories of kernel models
available, namely:
I. Monolithic kernel
II. Microkernel
III. Exokernel

36
Why Operating Systems?
Operating systems provide a layer of abstraction
between the programmer and the machine
It screens the complexities of the computer from the
programmer
It allows the computer to be treated as a virtual machine

37
Operating system types:

Real-time:
A real-time operating system is a multitasking operating
system that aims at executing real-time applications.

 Real-time operating systems often use specialized

scheduling algorithms so that they can achieve a
deterministic nature of behavior.

The main objective of Real-Time operating systems is

their quick and predictable response to events. 38
Multi-user:
A multi-user operating system allows multiple users to
access a computer system concurrently.

Time-sharing system can be classified as multi-user

systems as they enable a multiple user access to a
computer through the sharing of time.

Single-user operating systems, as opposed to a multi-

user operating system, are usable by a single user at a
time.
39
Being able to use multiple accounts on a Windows
operating system does not make it a multi-user system.
For a UNIX-like operating system, it is possible for two
users to login at a time and this capability of the OS
makes it a multi-user operating system.

Below are some examples of multi-user operating

systems.
• Linux
• Unix
• Windows 2000

40
Distributed:
A distributed operating system manages a group of
independent computers and makes them appear to be a
single computer.

The development of networked computers that could

be linked and communicate with each other gave rise to
distributed computing.

Distributed computations are carried out on more than

one machine. When computers in a group work in
cooperation, they make a distributed system. 41
Embedded:
Embedded operating systems are designed to be used
in embedded computer systems.

They are designed to operate on small machines like

PDAs with less autonomy.

They are able to operate with a limited number of

resources.

They are very compact and extremely efficient by

design.
Windows CE and Minix 3 are some examples of 42

embedded operating systems.

CORE I3, CORE I5, CORE I7 — THE
DIFFERENCE IN A NUTSHELL
 Core i7s are better than Core i5s, which are in turn better
than Core i3s.
 Core i7 does not have seven cores nor does Core i3 have
three cores.
 The numbers are simply indicative of their relative
processing powers.

43
RELATIVE PROCESSING POWER
 Determined:
 number of cores,
 clock speed (in GHz), size of cache,
 Intel technologies like Turbo Boost and Hyper-Threading

44
NUMBER OF CORES
 The more cores there are, the more tasks (known as
threads) can be served at the same time.
 The lowest number of cores can be found in Core i3
CPUs, i.e., which have only two cores. Currently, all
Core i3s are dual-core processors

45
INTEL TURBO BOOST
 The Intel Turbo Boost Technology allows a processor to
dynamically increase its clock speed whenever the need
arises.
 The maximum amount that Turbo Boost can raise clock
speed at any given time is dependent on the number of
active cores, the estimated current consumption, the
estimated power consumption, and the processor
temperature.
 Because none of the Core i3 CPUs have Turbo Boost, the
i5-4570T can outrun them whenever it needs to.

46
CACHE SIZE
 Whenever the CPU finds that it keeps on using the same
data over and over, it stores that data in its cache
 with a larger cache, more data can be accessed quickly.

 The Haswell (fourth generation) Core i3 processors have

either 3MB or 4MB of cache.
 The Haswell Core i5s have either 4MB or 6MB of cache.

 Finally, all Core i7 CPUs have 8MB of cache, except for

i7-4770R, which has 6MB.
 This is clearly one reason why an i7 outperforms an i5
— and why an i5 outperforms an i3.
47
HYPER-THREADING
 Strictly speaking, only one thread can be served by one core at a
time.
 So if a CPU is a dual core, then supposedly only two threads can
be served simultaneously.
 However, Intel has a technology called Hyper-Threading. This
enables a single core to serve multiple threads.
 For instance, a Core i3, which is only a dual core, can actually
serve two threads per core. In other words, a total of four threads
can run simultaneously.
 Thus, even if Core i5 processors are quad cores, since they don’t
support Hyper-Threading (again, except the i5-4570T) the
number of threads they can serve at the same time is just about
48
equal to those of their Core i3 counterparts
HYPER-THREADING
 Core i7s Not only are they quad cores, they also support
Hyper-Threading.
 Thus, a total of eight threads can run on them at the same
time.
 Combine that with 8MB of cache and Intel Turbo Boost
Technology, which all of them have, and you’ll see what
sets the Core i7 apart from its siblings.

49
The End!
Thank You So Much!

Computer Architecture and Pipelining Insights
No ratings yet
Computer Architecture and Pipelining Insights
15 pages
RISC vs CISC: CPU Architecture Explained
100% (1)
RISC vs CISC: CPU Architecture Explained
20 pages
Pipelining and Parallel Processing Overview
No ratings yet
Pipelining and Parallel Processing Overview
56 pages
Indirect Addressing in CPU Cycles
No ratings yet
Indirect Addressing in CPU Cycles
56 pages
Understanding Parallel Processing Units
No ratings yet
Understanding Parallel Processing Units
32 pages
Parallel Processing and Pipelining Overview
No ratings yet
Parallel Processing and Pipelining Overview
36 pages
Pipeline and Vector Processing Overview
100% (1)
Pipeline and Vector Processing Overview
18 pages
RISC vs CISC: Architecture Overview
No ratings yet
RISC vs CISC: Architecture Overview
35 pages
Parallel and Pipeline Processing Overview
No ratings yet
Parallel and Pipeline Processing Overview
31 pages
Understanding Multiprocessors and Pipelining
No ratings yet
Understanding Multiprocessors and Pipelining
11 pages
Computer Architecture and Pipelining Basics
No ratings yet
Computer Architecture and Pipelining Basics
37 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
65 pages
Introduction to HPC and Parallel Models
No ratings yet
Introduction to HPC and Parallel Models
76 pages
Introduction to Parallel Computing Models
No ratings yet
Introduction to Parallel Computing Models
65 pages
Pipelining and Performance in CPUs
No ratings yet
Pipelining and Performance in CPUs
8 pages
Instruction Pipelining and Multiprocessor Systems
No ratings yet
Instruction Pipelining and Multiprocessor Systems
23 pages
Pipelining and Flynn's Classification Explained
No ratings yet
Pipelining and Flynn's Classification Explained
86 pages
HPC 1
No ratings yet
HPC 1
57 pages
Processor Structure and Pipelining Insights
No ratings yet
Processor Structure and Pipelining Insights
48 pages
Understanding Parallel Processing Techniques
No ratings yet
Understanding Parallel Processing Techniques
29 pages
Pipelining in Computer Architecture
No ratings yet
Pipelining in Computer Architecture
74 pages
Computer Architecture & Organization Overview
No ratings yet
Computer Architecture & Organization Overview
21 pages
Unit 5
No ratings yet
Unit 5
11 pages
Types and Challenges of Parallelism
No ratings yet
Types and Challenges of Parallelism
66 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
40 pages
Understanding CPU Structure and Function
No ratings yet
Understanding CPU Structure and Function
44 pages
Importance of Parallelism in Computing
No ratings yet
Importance of Parallelism in Computing
19 pages
Pipeline and Vector Processing Overview
No ratings yet
Pipeline and Vector Processing Overview
11 pages
Instruction Formats and Control Units
No ratings yet
Instruction Formats and Control Units
63 pages
Pipelining in Parallel Processing
No ratings yet
Pipelining in Parallel Processing
63 pages
Understanding Multiprocessor Systems
No ratings yet
Understanding Multiprocessor Systems
53 pages
Computer Architecture Performance Techniques
No ratings yet
Computer Architecture Performance Techniques
48 pages
COA Unit 5
No ratings yet
COA Unit 5
22 pages
Understanding Parallel Processor Architectures
No ratings yet
Understanding Parallel Processor Architectures
117 pages
Instruction Pipelining Basics
No ratings yet
Instruction Pipelining Basics
20 pages
Pipelining and Parallel Processing Concepts
No ratings yet
Pipelining and Parallel Processing Concepts
51 pages
Instruction-Level Parallelism & Pipeline Hazards
No ratings yet
Instruction-Level Parallelism & Pipeline Hazards
34 pages
Processor Structure and Pipelining Overview
No ratings yet
Processor Structure and Pipelining Overview
42 pages
Computer Instruction Execution Overview
No ratings yet
Computer Instruction Execution Overview
11 pages
Parallel and Distributed Programming Overview
No ratings yet
Parallel and Distributed Programming Overview
6 pages
CISC vs RISC: Architecture Insights
No ratings yet
CISC vs RISC: Architecture Insights
51 pages
Instruction Pipelining Overview
No ratings yet
Instruction Pipelining Overview
13 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
14 pages
Pipelining and Parallel Processing Explained
No ratings yet
Pipelining and Parallel Processing Explained
26 pages
Parallel Processing Techniques Explained
No ratings yet
Parallel Processing Techniques Explained
33 pages
Instruction Cycle and Pipeline Overview
No ratings yet
Instruction Cycle and Pipeline Overview
35 pages
Multiprocessor Computer Architecture Overview
No ratings yet
Multiprocessor Computer Architecture Overview
28 pages
Understanding Pipelining in Microprocessors
No ratings yet
Understanding Pipelining in Microprocessors
17 pages
Understanding Distributed Systems Basics
No ratings yet
Understanding Distributed Systems Basics
110 pages
Advanced Notes - Chapter 6 - Advanced Concepts in Computer Architecture
No ratings yet
Advanced Notes - Chapter 6 - Advanced Concepts in Computer Architecture
9 pages
Instruction Set Architecture Overview
No ratings yet
Instruction Set Architecture Overview
35 pages
CPU Pipeline and Stalling Techniques
No ratings yet
CPU Pipeline and Stalling Techniques
22 pages
CPU Structure and Function Overview
100% (1)
CPU Structure and Function Overview
30 pages
Design Principles of Computer Parallelism
No ratings yet
Design Principles of Computer Parallelism
5 pages
Processor Structure and Function Overview
No ratings yet
Processor Structure and Function Overview
55 pages
Processor Organization and Instruction Cycle
No ratings yet
Processor Organization and Instruction Cycle
78 pages
Understanding Pipelining in Computer Architecture
No ratings yet
Understanding Pipelining in Computer Architecture
6 pages
Understanding Parallelism in Computing
No ratings yet
Understanding Parallelism in Computing
71 pages
Understanding CPU Pipelining Techniques
No ratings yet
Understanding CPU Pipelining Techniques
33 pages
Efficient Warehouse Layout for Modjo Dry Port
No ratings yet
Efficient Warehouse Layout for Modjo Dry Port
8 pages
Ethiopia Research Conference 2021 Call
No ratings yet
Ethiopia Research Conference 2021 Call
1 page
Computer Arithmetic and Twos Complement
No ratings yet
Computer Arithmetic and Twos Complement
56 pages
Memory System Design Characteristics
No ratings yet
Memory System Design Characteristics
87 pages
Basic Computer Organization Overview
No ratings yet
Basic Computer Organization Overview
46 pages
Intrusion Detection Catalogue 2021 July en
No ratings yet
Intrusion Detection Catalogue 2021 July en
96 pages
Intermediate Result ACA
No ratings yet
Intermediate Result ACA
51 pages
Test Report Digital TP 7VR
No ratings yet
Test Report Digital TP 7VR
12 pages
DVD Store Kiosk Project Plan Overview
No ratings yet
DVD Store Kiosk Project Plan Overview
11 pages
Apache Hive, Sharding, and HBase Overview
No ratings yet
Apache Hive, Sharding, and HBase Overview
6 pages
Compiler Design Course Syllabus
No ratings yet
Compiler Design Course Syllabus
2 pages
Bootloader Using CAN
100% (1)
Bootloader Using CAN
10 pages
319962s IARGS-STM32F103ZE
No ratings yet
319962s IARGS-STM32F103ZE
20 pages
Understanding Microkernel Architecture
No ratings yet
Understanding Microkernel Architecture
24 pages
Proxmox VE Datasheet
No ratings yet
Proxmox VE Datasheet
4 pages
Microprocessors and Microcontrollers Overview
No ratings yet
Microprocessors and Microcontrollers Overview
217 pages
NSGA4 Management Information Systems
No ratings yet
NSGA4 Management Information Systems
2 pages
Fsoft IT Infrastructure and Security Standards
No ratings yet
Fsoft IT Infrastructure and Security Standards
12 pages
Database Fundamentals and SQL Guide
No ratings yet
Database Fundamentals and SQL Guide
45 pages
Cloud and SaaS Security Overview
No ratings yet
Cloud and SaaS Security Overview
2 pages
Cisco ISE dot1x & MAB Lab Guide
No ratings yet
Cisco ISE dot1x & MAB Lab Guide
40 pages
Dell g16 7630 Owners Manual en Us
No ratings yet
Dell g16 7630 Owners Manual en Us
100 pages
Essential Linux Command Cheat Sheet
No ratings yet
Essential Linux Command Cheat Sheet
14 pages
LS1012A PFE Ethernet Port Setup Guide
No ratings yet
LS1012A PFE Ethernet Port Setup Guide
9 pages
PIC16F877 Microcontroller Overview
No ratings yet
PIC16F877 Microcontroller Overview
31 pages
Remote Access Trojan
No ratings yet
Remote Access Trojan
7 pages
DS12885 Ds12c887a
No ratings yet
DS12885 Ds12c887a
22 pages
Mapping Excel Data in Google Earth
No ratings yet
Mapping Excel Data in Google Earth
6 pages
System 800xa Architecture: Expert Workshop E143
No ratings yet
System 800xa Architecture: Expert Workshop E143
19 pages
Top Wearable Band Vendors Q1 2025
No ratings yet
Top Wearable Band Vendors Q1 2025
46 pages
Default Variable Type in UiPath
No ratings yet
Default Variable Type in UiPath
3 pages
Semantic Analysis in Compilers
No ratings yet
Semantic Analysis in Compilers
43 pages
Dell Diagnostics and Troubleshooting Guide
No ratings yet
Dell Diagnostics and Troubleshooting Guide
4 pages
Android Mobile Development Overview
No ratings yet
Android Mobile Development Overview
40 pages
SFLIGHT Table ALV Report Generation
No ratings yet
SFLIGHT Table ALV Report Generation
4 pages

Instruction Pipelining in Operating Systems

Uploaded by

Instruction Pipelining in Operating Systems

Uploaded by

PIPELINING, INTRODUCTION TO PARALLEL PROCESSING

AND OPERATING SYSTEM

 Two stage Instruction Pipeline has two stages:

 Using this assumption, Figure below shows that a six-

 Taken/Not taken switch

The most critical problem to address in an SMP is

Each processor has its own cache and so it is possible

I. Single instruction, single data stream - SISD

 Each instruction executed on different set of data by

 Never been implemented

It can be thought of as having two objectives:

The OS performs basic tasks, such as:

 Real-time operating systems often use specialized

The main objective of Real-Time operating systems is

Time-sharing system can be classified as multi-user

Single-user operating systems, as opposed to a multi-

Below are some examples of multi-user operating

The development of networked computers that could

Distributed computations are carried out on more than

They are designed to operate on small machines like

They are able to operate with a limited number of

They are very compact and extremely efficient by

embedded operating systems.

 The Haswell (fourth generation) Core i3 processors have

 Finally, all Core i7 CPUs have 8MB of cache, except for

You might also like