ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 2
Chapter 1: Introduction
Chapter 2: Operating-System Services
CONTENTS
➢ What Operating System Do
➢ Computer-System Organization, Architecture and Operations
➢ Resource Management
➢ Virtualization
➢ Computing Environments
➢ Operating-System Services
➢ User and Operating-System Interface
➢ System Calls and Services
Weekly LEARNING OUTCOMES
➢ Describe computer system organization and role.
➢ Define the components a computer system including multiprocessor computer systems.
➢ Explain user and kernel mode transition.
➢ Identify operating system services and system calls.
WHAT IS AN OPERATING SYSTEM?
An operating system is ……
Is it only for Computers? How about these machines?
• Car
• Airplane
• Printer
• Washing Machine
• Toaster
• Compiler
• Etc.
WHAT OPERATING SYSTEM DO
• A program that acts as an intermediary between a user of a
computer and the computer hardware
• Operating system goals:
• Execute user programs and make solving user problems easier
• By providing abstraction to application programs.
• Make the computer system convenient to use
• By turning ugly hardware into beautiful abstractions.
• Use the computer hardware in an efficient manner
• By orderly controlled allocation of resources.
COMPUTER SYSTEM STRUCTURE
Users
(People, Machines, and other computers)
OPERATING SYSTEM DEFINITION
• No universally accepted definition
• “Everything a vendor ships when you order an operating system” is a good
approximation but varies wildly
• “The one program running at all times on the computer” is the kernel, part of the
operating system. Everything else is either
• A system program (ships with the operating system, but not part of the kernel) ,
or
• An application program, all programs not associated with the operating system
• Today’s OSes for general purpose and mobile computing also include middleware –
a set of software frameworks that provide addition services to application
developers such as databases, multimedia, graphics
COMPUTER SYSTEM ORGANIZATION
• Computer-system consists of one or more CPUs, device controllers
connect through common bus providing access to shared memory
• Operating systems (OS) have a device driver for each device controller
which facilitate communication between OS and the device.
I/O OPERATION OVERVIEW
• A program request an I/O operation
• Device driver load appropriate registers in device controller
• The device controller, in turn, examines the contents of these registers to determine what
action to take (such as “read a character from the keyboard”).
• The controller starts the transfer of data from the device to its local buffer. Once the
transfer of data is complete, the device controller informs the device driver that it has
finished its operation.
• The device driver then gives control to other parts of the operating system, possibly
returning the data or a pointer to the data if the operation was a read. For other
operations, the device driver returns status information such as “write completed
successfully” or “device busy”.
• But how does the controller inform the device driver that it has finished its operation? This
is accomplished via an interrupt.
INTERRUPT
• Hardware may trigger an interrupt at any time by sending a signal to the CPU,
usually by way of the system bus.
• Interrupts are used for many other purposes as well and are a key part of how
operating systems and hardware interact.
• Interrupts are an important part of a computer architecture.
• Each computer design has its own interrupt mechanism, but several functions
are common.
INTERRUPT
INTERRUPT HANDLING
• The operating system preserves the state of the CPU by storing the
registers and the program counter
• Determines which type of interrupt has occurred:
• Separate segments of code determine what action should be taken for
each type of interrupt
COMPUTER SYSTEM OPERATION
• I/O devices and the CPU can execute concurrently
• Each device controller is in charge of a particular device type
• Each device controller has a local buffer
• Each device controller type has an operating system device driver to
manage it
• CPU moves data from/to main memory to/from local buffers
• I/O is from the device to local buffer of controller
• Device controller informs CPU that it has finished its operation by
causing an interrupt
COMMON FUNCTIONS OF INTERRUPTS
• Interrupt transfers control to the interrupt service routine generally,
through the interrupt vector, which contains the addresses of all the
service routines
• Interrupt architecture must save the address of the interrupted
instruction
• A trap or exception is a software-generated interrupt caused either by
an error or a user request
• An operating system is interrupt driven
I/O STRUCTURE
• After I/O starts, control returns to user program only upon I/O
completion
• Wait instruction idles the CPU until the next interrupt
• Wait loop (contention for memory access)
• At most one I/O request is outstanding at a time, no simultaneous I/O
processing
• After I/O starts, control returns to user program without waiting for I/O
completion
• System call – request to the OS to allow user to wait for I/O completion
• Device-status table contains entry for each I/O device indicating its type,
address, and state
• OS indexes into I/O device table to determine device status and to modify table
entry to include interrupt
COMPUTER STARTUP
• Bootstrap program is loaded at power-up or reboot
• Typically stored in ROM or EPROM, generally known as firmware
• Initializes all aspects of system
• Loads operating system kernel and starts execution
STORAGE STRUCTURE
• Main memory – only large storage media that the CPU can access
directly
• Random access
• Typically volatile
• Typically random-access memory in the form of Dynamic Random-
access Memory (DRAM)
• Secondary storage – extension of main memory that provides large
nonvolatile storage capacity
STORAGE STRUCTURE (CONT.)
• Hard Disk Drives (HDD) – rigid metal or glass platters covered with
magnetic recording material
• Disk surface is logically divided into tracks, which are subdivided into
sectors
• The disk controller determines the logical interaction between the
device and the computer
• Non-volatile memory (NVM) devices– faster than hard disks,
nonvolatile
• Various technologies
• Becoming more popular as capacity and performance increases,
price drops
STORAGE HIERARCHY
• Storage systems organized in hierarchy
• Speed
• Cost
• Volatility
• Caching – copying information into faster storage system; main memory
can be viewed as a cache for secondary storage
• Device Driver for each device controller to manage I/O
• Provides uniform interface between controller and kernel
STORAGE-DEVICE HIERARCHY
HOW A MODERN COMPUTER WORKS
A von Neumann architecture
DIRECT MEMORY ACCESS STRUCTURE
• Used for high-speed I/O devices able to transmit information at close
to memory speeds
• Device controller transfers blocks of data from buffer storage directly
to main memory without CPU intervention
• Only one interrupt is generated per block, rather than the one
interrupt per byte
OPERATING-SYSTEM OPERATIONS
• Bootstrap program – simple code to initialize the system, load the kernel
• Kernel loads
• Starts system daemons (services provided outside of the kernel)
• Kernel interrupt driven (hardware and software)
• Hardware interrupt by one of the devices
• Software interrupt (exception or trap):
• Software error (e.g., division by zero)
• Request for operating system service – system call
• Other process problems include infinite loop, processes modifying each
other or the operating system
MULTIPROGRAMMING (BATCH SYSTEM)
• Single user cannot always keep CPU and I/O devices busy
• Multiprogramming organizes jobs (code and data) so CPU always has
one to execute
• A subset of total jobs in system is kept in memory
• One job selected and run via job scheduling
• When job has to wait (for I/O for example), OS switches to another
job
MULTITASKING (TIMESHARING)
• A logical extension of Batch systems– the CPU switches jobs so
frequently that users can interact with each job while it is running,
creating interactive computing
• Response time should be < 1 second
• Each user has at least one program executing in memory
process
• If several jobs ready to run at the same time CPU scheduling
• If processes don’t fit in memory, swapping moves them in and
out to run
• Virtual memory allows execution of processes not completely in
memory
MULTITASKING (TIMESHARING)
• A logical extension of Batch systems– the CPU switches jobs so
frequently that users can interact with each job while it is running,
creating interactive computing
• Response time should be < 1 second
• Each user has at least one program executing in memory
process
• If several jobs ready to run at the same time CPU scheduling
• If processes don’t fit in memory, swapping moves them in and
out to run
• Virtual memory allows execution of processes not completely in
memory
MEMORY LAYOUT FOR MULTIPROGRAMMED SYSTEM
DUAL-MODE OPERATION
• Dual-mode operation allows OS to protect itself and other system components
• User mode and kernel mode
• Mode bit provided by hardware
• Provides ability to distinguish when system is running user code or kernel
code.
• When a user is running mode bit is “user”
• When kernel code is executing mode bit is “kernel”
• How do we guarantee that user does not explicitly set the mode bit to
“kernel”?
• System call changes mode to kernel, return from call resets it to user
• Some instructions designated as privileged, only executable in kernel mode
TRANSITION FROM USER TO KERNEL MODE
TIMER
• Timer to prevent infinite loop (or process hogging resources)
• Timer is set to interrupt the computer after some time period
• Keep a counter that is decremented by the physical clock
• Operating system set the counter (privileged instruction)
• When counter zero generate an interrupt
• Set up before scheduling process to regain control or terminate
program that exceeds allotted time
PROCESS MANAGEMENT
• A process is a program in execution. It is a unit of work within the system.
Program is a passive entity; process is an active entity.
• Process needs resources to accomplish its task
• CPU, memory, I/O, files
• Initialization data
• Process termination requires reclaim of any reusable resources
• Single-threaded process has one program counter specifying location of next
instruction to execute
• Process executes instructions sequentially, one at a time, until completion
• Multi-threaded process has one program counter per thread
• Typically system has many processes, some user, some operating system running
concurrently on one or more CPUs
• Concurrency by multiplexing the CPUs among the processes / threads
PROCESS MANAGEMENT ACTIVITIES
The operating system is responsible for the following activities in connection with
process management:
• Creating and deleting both user and system processes
• Suspending and resuming processes
• Providing mechanisms for process synchronization
• Providing mechanisms for process communication
• Providing mechanisms for deadlock handling
MEMORY MANAGEMENT
• To execute a program all (or part) of the instructions must be in
memory
• All (or part) of the data that is needed by the program must be in
memory
• Memory management determines what is in memory and when
• Optimizing CPU utilization and computer response to users
• Memory management activities
• Keeping track of which parts of memory are currently being used and by
whom
• Deciding which processes (or parts thereof) and data to move into and out of
memory
• Allocating and deallocating memory space as needed
FILE-SYSTEM MANAGEMENT
• OS provides uniform, logical view of information storage
• Abstracts physical properties to logical storage unit - file
• Each medium is controlled by device (i.e., disk drive, tape drive)
• Varying properties include access speed, capacity, data-transfer rate, access method
(sequential or random)
• File-System management
• Files usually organized into directories
• Access control on most systems to determine who can access what
• OS activities include
• Creating and deleting files and directories
• Primitives to manipulate files and directories
• Mapping files onto secondary storage
• Backup files onto stable (non-volatile) storage media
CACHING
• Important principle, performed at many levels in a computer (in
hardware, operating system, software)
• Information in use copied from slower to faster storage temporarily
• Faster storage (cache) checked first to determine if information is
there
• If it is, information used directly from the cache (fast)
• If not, data copied to cache and used there
• Cache smaller than storage being cached
• Cache management important design problem
• Cache size and replacement policy
CHARACTERISTICS OF VARIOUS TYPES OF STORAGE
Movement between levels of storage hierarchy can be explicit or implicit
MIGRATION OF DATA “A” FROM DISK TO REGISTER
• Multitasking environments must be careful to use most recent value,
no matter where it is stored in the storage hierarchy
• Multiprocessor environment must provide cache coherency in
hardware such that all CPUs have the most recent value in their cache
• Distributed environment situation even more complex
• Several copies of a datum can exist
• Various solutions covered in Chapter 19
I/O SUBSYSTEM
• One purpose of OS is to hide peculiarities of hardware devices from
the user
• I/O subsystem responsible for
• Memory management of I/O including buffering (storing data temporarily
while it is being transferred), caching (storing parts of data in faster storage for
performance), spooling (the overlapping of output of one job with input of
other jobs)
• General device-driver interface
• Drivers for specific hardware devices
VIRTUALIZATION
• Use cases involve laptops and desktops running multiple OSes for
exploration or compatibility
• Apple laptop running Mac OS X host, Windows as a guest
• Developing apps for multiple OSes without having multiple systems
• Quality assurance testing applications without having multiple systems
• Executing and managing compute environments within data centers
• VMM can run natively, in which case they are also the host
• There is no general-purpose host then (VMware ESX and Citrix XenServer)
VIRTUALIZATION (CONT.)
• Allows operating systems to run applications within other OSes
• Vast and growing industry
• Emulation used when source CPU type different from target type (i.e.
PowerPC to Intel x86)
• Generally slowest method
• When computer language not compiled to native code – Interpretation
• Virtualization – OS natively compiled for CPU, running guest OSes
also natively compiled
• Consider VMware running WinXP guests, each running applications, all on
native WinXP host OS
• VMM (virtual machine Manager) provides virtualization services
COMPUTING ENVIRONMENTS - VIRTUALIZATION
COMPUTER-SYSTEM ARCHITECTURE
• Most systems use a single general-purpose processor
• Most systems have special-purpose processors as well
• Multiprocessors systems growing in use and importance
• Also known as parallel systems, tightly-coupled systems
• Advantages include:
1. Increased throughput
2. Economy of scale
3. Increased reliability – graceful degradation or fault tolerance
• Two types:
1. Asymmetric Multiprocessing – each processor is assigned a specie task.
2. Symmetric Multiprocessing – each processor performs all tasks
SYMMETRIC MULTIPROCESSING ARCHITECTURE
DUAL-CORE DESIGN
Multi-chip and multicore
Systems containing all chips
Chassis containing multiple separate systems
COMPUTING ENVIRONMENTS
• Traditional
• Mobile
• Client Server
• Pear-to-Pear
• Cloud computing
• Real-time Embedded
TRADITIONAL
• Stand-alone general-purpose machines
• But blurred as most systems interconnect with others (i.e., the
Internet)
• Portals provide web access to internal systems
• Network computers (thin clients) are like Web terminals
• Mobile computers interconnect via wireless networks
• Networking becoming ubiquitous – even home systems use
firewalls to protect home computers from Internet attacks
CLIENT SERVER
• Client-Server Computing
• Dumb terminals supplanted by smart PCs
• Many systems now servers, responding to requests generated by
clients
• Compute-server system provides an interface to client to request services (i.e.,
database)
• File-server system provides interface for clients to store and retrieve files
CLOUD COMPUTING
• Delivers computing, storage, even apps as a service across a network
• Logical extension of virtualization because it uses virtualization as the base
for it functionality.
• Amazon EC2 has thousands of servers, millions of virtual machines, petabytes of
storage available across the Internet, pay based on usage
• Many types
• Public cloud – available via Internet to anyone willing to pay
• Private cloud – run by a company for the company’s own use
• Hybrid cloud – includes both public and private cloud components
• Software as a Service (SaaS) – one or more applications available via the Internet
(i.e., word processor)
• Platform as a Service (PaaS) – software stack ready for application use via the
Internet (i.e., a database server)
• Infrastructure as a Service (IaaS) – servers or storage available over Internet (i.e.,
storage available for backup use)
CLOUD COMPUTING (CONT.)
• Cloud computing environments composed of traditional OSes, plus
VMMs, plus cloud management tools
• Internet connectivity requires security like firewalls
• Load balancers spread traffic across multiple applications
FREE AND OPEN-SOURCE OPERATING SYSTEMS
• Operating systems made available in source-code format rather than just
binary closed-source and proprietary
• Counter to the copy protection and Digital Rights Management (DRM)
movement
• Started by Free Software Foundation (FSF), which has “copyleft” GNU
Public License (GPL)
• Free software and open-source software are two different ideas championed by different groups of people
• [Link]
• Examples include GNU/Linux and BSD UNIX (including core of Mac OS X),
and many more
• Can use VMM like VMware Player (Free on Windows), Virtualbox (open
source and free on many platforms - [Link]
• Use to run guest operating systems for exploration
OPERATING SYSTEM SERVICES
• Operating systems provide an environment for execution of
programs and services to programs and users
• One set of operating-system services provides functions that are
helpful to the user:
• User interface - Almost all operating systems have a user interface (UI).
• Varies between Command-Line (CLI), Graphics User Interface (GUI), touch-screen,
Batch
• Program execution - The system must be able to load a program into
memory and to run that program, end execution, either normally or
abnormally (indicating error)
• I/O operations - A running program may require I/O, which may involve a
file or an I/O device
• File-system manipulation - The file system is of particular interest. Programs
need to read and write files and directories, create and delete them, search
them, list file Information, permission management.
OPERATING SYSTEM SERVICES (CONT.)
• One set of operating-system services provides functions that are helpful to the
user (Cont.):
• Communications – Processes may exchange information, on the same
computer or between computers over a network
• Communications may be via shared memory or through message passing
(packets moved by the OS)
• Error detection – OS needs to be constantly aware of possible errors
• May occur in the CPU and memory hardware, in I/O devices, in user
program
• For each type of error, OS should take the appropriate action to ensure
correct and consistent computing
• Debugging facilities can greatly enhance the user’s and programmer’s
abilities to efficiently use the system
OPERATING SYSTEM SERVICES (CONT.)
• Another set of OS functions exists for ensuring the efficient
operation of the system itself via resource sharing
• Resource allocation - When multiple users or multiple jobs running
concurrently, resources must be allocated to each of them
• Many types of resources - CPU cycles, main memory, file storage, I/O devices.
• Logging - To keep track of which users use how much and what kinds of
computer resources
• Protection and security - The owners of information stored in a multiuser or
networked computer system may want to control use of that information,
concurrent processes should not interfere with each other
• Protection involves ensuring that all access to system resources is controlled
• Security of the system from outsiders requires user authentication, extends to
defending external I/O devices from invalid access attempts
A VIEW OF OPERATING SYSTEM SERVICES
BOURNE SHELL COMMAND INTERPRETER
USER OPERATING SYSTEM INTERFACE - GUI
• User-friendly desktop metaphor interface
• Usually mouse, keyboard, and monitor
• Icons represent files, programs, actions, etc
• Various mouse buttons over objects in the interface cause various actions
(provide information, options, execute function, open directory (known as a
folder)
• Invented at Xerox PARC
• Many systems now include both CLI and GUI interfaces
• Microsoft Windows is GUI with CLI “command” shell
• Apple Mac OS X is “Aqua” GUI interface with UNIX kernel underneath and
shells available
• Unix and Linux have CLI with optional GUI interfaces (CDE, KDE, GNOME)
TOUCHSCREEN INTERFACES
• Touchscreen devices require
new interfaces
• Mouse not possible or not desired
• Actions and selection based on gestures
• Virtual keyboard for text entry
• Voice commands
THE MAC OS X GUI
SYSTEM CALLS
• Programming interface to the services provided by the OS
• Typically written in a high-level language (C or C++)
• Mostly accessed by programs via a high-level Application
Programming Interface (API) rather than direct system call
use
• Three most common APIs are Win32 API for Windows,
POSIX API for POSIX-based systems (including virtually all
versions of UNIX, Linux, and Mac OS X), and Java API for the
Java virtual machine (JVM)
Note that the system-call names used throughout this text are
generic
EXAMPLE OF SYSTEM CALLS
• System call sequence to copy the contents of one file to
another file
EXAMPLE OF STANDARD API
SYSTEM CALL IMPLEMENTATION
• Typically, a number is associated with each system call
• System-call interface maintains a table indexed according to these
numbers
• The system call interface invokes the intended system call in OS kernel
and returns status of the system call and any return values
• The caller need know nothing about how the system call is
implemented
• Just needs to obey API and understand what OS will do as a result
call
• Most details of OS interface hidden from programmer by API
• Managed by run-time support library (set of functions built into
libraries included with compiler)
SYSTEM CALL PARAMETER PASSING
• Often, more information is required than simply identity of desired system call
• Exact type and amount of information vary according to OS and call
• Three general methods used to pass parameters to the OS
• Simplest: pass the parameters in registers
• In some cases, may be more parameters than registers
• Parameters stored in a block, or table, in memory, and address of block passed
as a parameter in a register
• This approach taken by Linux and Solaris
• Parameters placed, or pushed, onto the stack by the program and popped off
the stack by the operating system
• Block and stack methods do not limit the number or length of parameters
being passed
PARAMETER PASSING VIA TABLE
TYPES OF SYSTEM CALLS
• Process control
• create process, terminate process
• end, abort
• load, execute
• get process attributes, set process attributes
• wait for time
• wait event, signal event
• allocate and free memory
• Dump memory if error
• Debugger for determining bugs, single step execution
• Locks for managing access to shared data between processes
TYPES OF SYSTEM CALLS (CONT.)
• File management
• create file, delete file
• open, close file
• read, write, reposition
• get and set file attributes
• Device management
• request device, release device
• read, write, reposition
• get device attributes, set device attributes
• logically attach or detach devices
TYPES OF SYSTEM CALLS (CONT.)
• Information maintenance
• get time or date, set time or date
• get system data, set system data
• get and set process, file, or device attributes
• Communications
• create, delete communication connection
• send, receive messages if message passing model to host name or process
name
• From client to server
• Shared-memory model create and gain access to memory regions
• transfer status information
• attach and detach remote devices
TYPES OF SYSTEM CALLS (CONT.)
• Protection
• Control access to resources
• Get and set permissions
• Allow and deny user access
EXAMPLES OF WINDOWS AND UNIX SYSTEM CALLS
SYSTEM SERVICES
• System programs provide a convenient environment for program
development and execution. They can be divided into:
• File manipulation
• Status information sometimes stored in a file
• Programming language support
• Program loading and execution
• Communications
• Background services
• Application programs
• Most users’ view of the operation system is defined by system
programs, not the actual system calls
SYSTEM SERVICES (CONT.)
• Provide a convenient environment for program development and execution
• Some of them are simply user interfaces to system calls; others are considerably more complex
• File management - Create, delete, copy, rename, print, dump, list, and generally
manipulate files and directories
• Status information
• Some ask the system for info - date, time, amount of available memory, disk space, number of
users
• Others provide detailed performance, logging, and debugging information
• Typically, these programs format and print the output to the terminal or other output devices
• Some systems implement a registry - used to store and retrieve configuration information
SYSTEM SERVICES (CONT.)
• File modification
• Text editors to create and modify files
• Special commands to search contents of files or perform transformations of the text
• Programming-language support - Compilers, assemblers, debuggers and
interpreters sometimes provided
• Program loading and execution- Absolute loaders, relocatable loaders, linkage
editors, and overlay-loaders, debugging systems for higher-level and machine
language
• Communications - Provide the mechanism for creating virtual connections among
processes, users, and computer systems
• Allow users to send messages to one another’s screens, browse web pages, send electronic-
mail messages, log in remotely, transfer files from one machine to another
SYSTEM SERVICES (CONT.)
• Background Services
• Launch at boot time
• Some for system startup, then terminate
• Some from system boot to shutdown
• Provide facilities like disk checking, process scheduling, error logging, printing
• Run in user context not kernel context
• Known as services, subsystems, daemons
• Application programs
• Don’t pertain to system
• Run by users
• Not typically considered part of OS
• Launched by command line, mouse click, finger poke
WHY APPLICATIONS ARE OPERATING SYSTEM SPECIFIC
• Apps compiled on one system usually not executable on other operating systems
• Each operating system provides its own unique system calls
• Own file formats, etc.
• Apps can be multi-operating system
• Written in interpreted language like Python, Ruby, and interpreter available on multiple
operating systems
• App written in language that includes a VM containing the running app (like Java)
• Use standard language (like C), compile separately on each operating system to run on each
• Application Binary Interface (ABI) is architecture equivalent of API, defines how
different components of binary code can interface for a given operating system on
a given architecture, CPU, etc.
Required Reading
1. Chapter 1: Introduction (Operating System Concepts by
Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3,
2018)
2. Chapter 2: Operating-System Structures (Operating System
Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-
119-32091-3, 2018)
Recommended Reading
1. Chapter 1 (Modern Operating Systems by Andrew S. Tanenbaum and
Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-
359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 3
Chapter 3: Processes
Chapter 4: Threads & Concurrency
CONTENTS
➢ Process Concept
➢ Process Scheduling
➢ Operations on Processes
➢ Inter-Process Communication
➢ IPC in Shared-Memory Systems
➢ IPC in Message-Passing Systems
➢ Multicore Programming
➢ Multithreading Models
➢ Threading Issues
WEEKLY LEARNING OUTCOMES
➢ Identify the separate components of a process and illustrate how they are
represented and scheduled in an operating system.
➢ Describe how processes are created and terminated in an operating
system, including developing programs using the appropriate system
calls that perform these operations.
➢ Compare and contrast inter-process communication using shared
memory and message passing.
WEEKLY LEARNING OUTCOMES
➢ Identify the basic components of a thread, and contrast threads and
processes.
➢ Describe the benefits and challenges of designing multithreaded
applications.
PROCESS CONCEPTS
Process: A program in execution; process execution must progress in
sequential fashion. No parallel execution of instructions of a single process
An operating system executes a variety of programs that run as a process.
• Multiple parts
• Text section—the executable code
• Data section—global variables
• Heap section—memory that is dynamically allocated during
program run time
• Stack section—temporary data storage when invoking functions
(such as function parameters, return addresses, and local
variables)
LAYOUT OF PROCESS IN MEMORY
PROCESS STATES
• As a process executes, it changes state. The state of a process is
defined in part by the current activity of that process.
• New. The process is being created.
• Running. Instructions are being executed.
• Waiting. The process is waiting for some event to occur (such
as an I/O completion or reception of a signal).
• Ready. The process is waiting to be assigned to a processor.
• Terminated. The process has finished execution.
PROCESS STATES
PROCESS CONTROL BLOCK
• Each process is represented in the operating system by a process
control block (PCB)—also called a task control block.
• It contains many pieces of information associated with a specific
process, including these:
• Process state. The state may be new, ready, running, waiting,
halted, and so on.
• Program counter. The counter indicates the address of the
next instruction to be executed for this process.
• CPU registers. The registers vary in number and type,
depending on the computer architecture.
PROCESS CONTROL BLOCK
• CPU-scheduling information. This information includes a
process priority, pointers to scheduling queues, and any other
scheduling parameters.
• Memory-management information. This information may
include such items as the value of the base and limit registers
and the page tables, or the segment tables.
• Accounting information. This information includes the
amount of CPU and real time used, time limits, account
numbers, job or process numbers, and so on.
• I/O status information. This information includes the list of
I/O devices allocated to the process, a list of open files, and so
on.
PROCESS SCHEDULING
• Process scheduler selects among available processes for next
execution on CPU core.
• Goal: Maximize CPU use, quickly switch processes onto CPU core
• Maintains scheduling queues of processes
• Ready queue – set of all processes residing in main memory,
ready and waiting to execute
• Wait queues – set of processes waiting for an event (i.e., I/O)
• Processes migrate among the various queues
PROCESS SCHEDULING
Queueing-diagram representation of process scheduling.
CONTEXT SWITCH
• When CPU switches to another process, the system must save the
state of the old process and load the saved state for the new process
via a context switch
• Context of a process represented in the PCB
• Context-switch time is pure overhead; the system does no useful
work while switching
• The more complex the OS and the PCB ➔ the longer the context
switch
• Time dependent on hardware support
• Some hardware provides multiple sets of registers per CPU ➔
multiple contexts loaded at once
CONTEXT SWITCH
OPERATIONS ON PROCESS
• System must provide mechanisms for:
• Process creation
The creating process is called a parent process, and the new
processes are called the children of that process.
When a process creates a new process, two possibilities for execution exist:
1. The parent continues to execute concurrently with its children.
2. The parent waits until some or all of its children have terminated.
There are also two address-space possibilities for the new process:
1. The child process is a duplicate of the parent process (it has the
same program and data as the parent).
2. The child process has a new program loaded into it.
OPERATIONS ON PROCESS
Process Termination
A process terminates when it finishes executing its final statement and asks the
operating system to delete it.
A parent may terminate the execution of one of its children for a variety of reasons,
such as these:
• The child has exceeded its usage of some of the resources that it has been allocated.
• The task assigned to the child is no longer required.
• The parent is exiting, and the operating system does not allow a child to continue if its
parent terminates.
INTERPROCESS COMMUNICATION
• Processes executing concurrently in the operating system may be
either independent processes or cooperating processes.
• A process is independent if it does not share data with any other
processes executing in the system.
• A process is cooperating if it can affect or be affected by the other
processes executing in the system.
INTERPROCESS COMMUNICATION
There are several reasons for providing an environment that allows
process cooperation:
• Information sharing. Since several applications may be interested in
the same piece of information (for instance, copying and pasting), we
must provide an environment to allow concurrent access to such
information.
• Computation speedup. If we want a particular task to run faster, we
must break it into subtasks, each of which will be executing in parallel
with the others.
• Modularity. We may want to construct the system in a modular
fashion, dividing the system functions into separate processes or
threads.
INTERPROCESS COMMUNICATION
• Two models of IPC
• Shared memory
• Message passing
INTERPROCESS COMMUNICATION
(a) Shared memory. (b) Message passing.
IPC IN SHARED-MEMORY SYSTEMS
• An area of memory shared among the processes that wish to
communicate
• The communication is under the control of the users processes not
the operating system.
• Major issues is to provide mechanism that will allow the user
processes to synchronize their actions when they access shared
memory.
IPC IN SHARED-MEMORY SYSTEMS
Bounded-Buffer – Shared-Memory Solution
#define BUFFER_SIZE 10
typedef struct {
. . .
} item;
item buffer[BUFFER_SIZE];
int in = 0;
int out = 0;
• Solution is correct, but can only use BUFFER_SIZE-1 elements
PRODUCER PROCESS – SHARED MEMORY
item next_produced;
while (true) {
/* produce an item in next produced */
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing */
buffer[in] = next_produced;
in = (in + 1) % BUFFER_SIZE;
}
CONSUMER PROCESS – SHARED MEMORY
item next_consumed;
while (true) {
while (in == out)
; /* do nothing */
next_consumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
/* consume the item in next consumed */
}
IPC IN MESSAGE PASSING SYSTEMS
• Processes communicate with each other without resorting to shared
variables
• IPC facility provides two operations:
• send(message)
• receive(message)
• The message size is either fixed or variable
IPC IN MESSAGE PASSING SYSTEMS
• If processes P and Q wish to communicate, they need to:
• Establish a communication link between them
• Exchange messages via send/receive
• Implementation issues:
• How are links established?
• Can a link be associated with more than two processes?
• How many links can there be between every pair of communicating
processes?
• What is the capacity of a link?
• Is the size of a message that the link can accommodate fixed or variable?
• Is a link unidirectional or bi-directional?
IMPLEMENTATION OF COMMUNICATION LINK
• Physical:
• Shared memory
• Hardware bus
• Network
• Logical:
• Direct or indirect
• Synchronous or asynchronous
• Automatic or explicit buffering
DIRECT COMMUNICATION
• Processes must name each other explicitly:
• send (P, message) – send a message to process P
• receive(Q, message) – receive a message from process Q
• Properties of communication link
• Links are established automatically
• A link is associated with exactly one pair of communicating
processes
• Between each pair there exists exactly one link
• The link may be unidirectional, but is usually bi-directional.
INDIRECT COMMUNICATION
• Messages are directed and received from mailboxes (also referred to
as ports)
• Each mailbox has a unique id
• Processes can communicate only if they share a mailbox
• Properties of communication link
• Link established only if processes share a common mailbox
• A link may be associated with many processes
• Each pair of processes may share several communication links
• Link may be unidirectional or bi-directional.
SINGLE AND MULTITHREADED PROCESS
BENEFITS
• Responsiveness – may allow continued execution if part of process
is blocked, especially important for user interfaces
• Resource Sharing – threads share resources of process, easier than
shared memory or message passing
• Economy – cheaper than process creation, thread switching lower
overhead than context switching
• Scalability – process can take advantage of multicore architectures
MULTICORE PROGRAMMING
• Multicore or multiprocessor systems putting pressure on programmers,
challenges include:
• Dividing activities
• Balance
• Data splitting
• Data dependency
• Testing and debugging
• Parallelism implies a system can perform more than one task
simultaneously
• Concurrency supports more than one task making progress
• Single processor / core, scheduler providing concurrency
MULTICORE PROGRAMMING
▪ Concurrent execution on single-core system:
▪ Parallelism on a multi-core system:
MULTICORE PROGRAMMING
Types of parallelism
Data parallelism – distributes subsets of the same data across
multiple cores, same operation on each
Task parallelism – distributing threads across cores, each
thread performing unique operation
DATA AND TASK PARALLELISM
USER THREADS AND KERNEL THREADS
• User threads - management done by user-level threads library
• Three primary thread libraries:
• POSIX Pthreads
• Windows threads
• Java threads
• Kernel threads - Supported by the Kernel
• Examples – virtually all general -purpose operating systems, including:
• Windows
• Linux
• Mac OS X
• iOS
• Android
USER THREADS AND KERNEL THREADS
MULTITHREADING MODELS
• Many-to-One
• One-to-One
• Many-to-Many
MANY TO ONE MODEL
• Many user-level threads mapped to single kernel thread
• One thread blocking causes all to block
• Multiple threads may not run in parallel on muticore system because
only one may be in kernel at a time
• Few systems currently use this model
• Examples:
• Solaris Green Threads
• GNU Portable Threads
ONE TO ONE MODEL
• Each user-level thread maps to kernel thread
• Creating a user-level thread creates a kernel thread
• More concurrency than many-to-one
• Number of threads per process sometimes restricted due to overhead
• Examples
• Windows
• Linux
MANY TO MANY MODEL
• Allows many user level threads to be mapped to many kernel threads
• Allows the operating system to create a sufficient number of kernel
threads
• Windows with the ThreadFiber package
• Otherwise not very common
THREADING ISSUES
• Semantics of fork() and exec() system calls
• Signal handling
• Synchronous and asynchronous
• Thread cancellation of target thread
• Asynchronous or deferred
• Thread-local storage
• Scheduler Activations
Main Reference
Chapter 3: Processes
Chapter 4: Threads & Concurrency
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 2.1
Chapter 2.2
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 4
Chapter 4: CPU Scheduling
CONTENTS
➢ Basic Concepts
➢ Scheduling Criteria
➢ Scheduling Algorithms
➢ Thread Scheduling
➢ Multi-Processor Scheduling
Weekly LEARNING OUTCOMES
➢ Describe CPU scheduling algorithms and their differences.
➢ Evaluate CPU scheduling algorithms based on scheduling criteria
➢ Explain the issues related to multiprocessor and multicore scheduling
BASIC CONCEPTS
• Maximum CPU utilization obtained with
multiprogramming
• CPU–I/O Burst Cycle – Process execution
consists of a cycle of CPU execution and I/O
wait
• CPU burst followed by I/O burst
• CPU burst distribution is of main concern
HISTOGRAM OF CPU-BURST TIMES
Large number of short bursts
Small number of longer bursts
CPU SCHEDULER
• The CPU scheduler selects from among the processes in ready queue, and
allocates a CPU core to one of them
• Queue may be ordered in various ways
• CPU scheduling decisions may take place when a process:
1. Switches from running to waiting state
2. Switches from running to ready state
3. Switches from waiting to ready
4. Terminates
• For situations 1 and 4, there is no choice in terms of scheduling. A new process (if
one exists in the ready queue) must be selected for execution.
• For situations 2 and 3, however, there is a choice.
PREEMPTIVE AND NONPREEMPTIVE SCHEDULING
• When scheduling takes place only under circumstances 1 and 4, the
scheduling scheme is nonpreemptive.
• Otherwise, it is preemptive.
• Under Nonpreemptive scheduling, once the CPU has been allocated
to a process, the process keeps the CPU until it releases it either by
terminating or by switching to the waiting state.
• Virtually all modern operating systems including Windows, MacOS,
Linux, and UNIX use preemptive scheduling algorithms.
PREEMPTIVE SCHEDULING AND RACE
CONDITIONS
• Preemptive scheduling can result in race conditions when data are
shared among several processes.
• Consider the case of two processes that share data. While one process
is updating the data, it is preempted so that the second process can
run. The second process then tries to read the data, which are in an
inconsistent state.
• This issue will be explored in detail in Chapter 6.
DISPATCHER
• Dispatcher module gives control of the
CPU to the process selected by the CPU
scheduler; this involves:
• Switching context
• Switching to user mode
• Jumping to the proper location in the user
program to restart that program
• Dispatch latency – time it takes for the
dispatcher to stop one process and start
another running
SCHEDULING CRITERIA
• CPU utilization – keep the CPU as busy as possible (Max)
• Throughput – # of processes that complete their execution per time unit
(Max)
• Turnaround time – amount of time to execute a particular process (Min)
• Waiting time – amount of time a process has been waiting in the ready queue
(Min)
• Response time – amount of time it takes from when a request was submitted
until the first response is produced. (Min)
FIRST- COME, FIRST-SERVED (FCFS) SCHEDULING
Process Burst Time
P1 24
P2 3
P3 3
Suppose that the processes arrive in the order: P1 , P2 , P3
The Gantt Chart for the schedule is:
P1 P2 P3
0 24 27 30
Waiting time for P1 = 0; P2 = 24; P3 = 27
Average waiting time: (0 + 24 + 27)/3 = 17
FCFS SCHEDULING (CONT.)
Suppose that the processes arrive in the order:
P2 , P3 , P1
The Gantt chart for the schedule is:
P2 P3 P1
0 3 6 30
Waiting time for P1 = 6; P2 = 0; P3 = 3
Average waiting time: (6 + 0 + 3)/3 = 3
Much better than previous case
Convoy effect - short process behind long process
Consider one CPU-bound and many I/O-bound processes
SHORTEST-JOB-FIRST (SJF) SCHEDULING
• Associate with each process the length of its next CPU burst
• Use these lengths to schedule the process with the shortest time
• SJF is optimal – gives minimum average waiting time for a
given set of processes
• The difficulty is knowing the length of the next CPU request
• Could ask the user
SHORTEST-JOB-FIRST (SJF) SCHEDULING
• Associate with each process the length of its next CPU burst
• Use these lengths to schedule the process with the shortest time
• SJF is optimal – gives minimum average waiting time for a
given set of processes
• Preemptive version called shortest-remaining-time-first
• How do we determine the length of the next CPU burst?
• Could ask the user
• Estimate
EXAMPLE OF SJF
ProcessArrival Time Burst Time
P1 0.0 6
P2 2.0 8
P3 4.0 7
P4 5.0 3
SJF scheduling chart
P4 P1 P3 P2
0 3 9 16 24
Average waiting time = (3 + 16 + 9 + 0) / 4 = 7
COMPUTER STARTUP
• Bootstrap program is loaded at power-up or reboot
• Typically stored in ROM or EPROM, generally known as firmware
• Initializes all aspects of system
• Loads operating system kernel and starts execution
DETERMINING LENGTH OF NEXT CPU
BURST
Can only estimate the length – should be similar to the previous one
Then pick process with shortest predicted next CPU burst
Can be done by using the length of previous CPU bursts, using exponential
averaging
1. t n = actual length of n th CPU burst
2. n +1 = predicted value for the next CPU burst
3. , 0 1
4. Define :
Commonly, α set to ½
PREDICTION OF THE LENGTH OF THE NEXT CPU BURST
EXAMPLES OF EXPONENTIAL AVERAGING
• =0
• n+1 = n
• Recent history does not count
• =1
• n+1 = tn
• Only the actual last CPU burst counts
• If we expand the formula, we get:
n+1 = tn+(1 - ) tn -1 + …
+(1 - )j tn -j + …
+(1 - )n +1 0
• Since both and (1 - ) are less than or equal to 1, each successive
term has less weight than its predecessor
EXAMPLE OF SHORTEST-REMAINING-TIME-FIRST
Now we add the concepts of varying arrival times and preemption to the analysis
ProcessA arri Arrival TimeT Burst Time
P1 0 8
P2 1 4
P3 2 9
P4 3 5
Preemptive SJF Gantt Chart
P1 P2 P4 P1 P3
0 1 5 10 17 26
Average waiting time = [(10-1)+(1-1)+(17-2)+(5-3)]/4 = 26/4 = 6.5
ROUND ROBIN (RR)
• Each process gets a small unit of CPU time (time quantum q), usually 10-100
milliseconds. After this time has elapsed, the process is preempted and added
to the end of the ready queue.
• If there are n processes in the ready queue and the time quantum is q, then
each process gets 1/n of the CPU time in chunks of at most q time units at
once. No process waits more than (n-1)q time units.
• Timer interrupts every quantum to schedule next process
• Performance
• q large FIFO
• q small q must be large with respect to context switch, otherwise overhead is too high
EXAMPLE OF RR WITH TIME QUANTUM = 4
Process Burst Time
P1 24
P2 3
P3 3
The Gantt chart is:
P1 P2 P3 P1 P1 P1 P1 P1
0 4 7 10 14 18 22 26 30
Typically, higher average turnaround than SJF, but better response
q should be large compared to context switch time
q usually 10 milliseconds to 100 milliseconds,
Context switch < 10 microseconds
TIME QUANTUM AND CONTEXT SWITCH TIME
PRIORITY SCHEDULING
• A priority number (integer) is associated with each process
• The CPU is allocated to the process with the highest priority (smallest integer
highest priority)
• Preemptive
• Nonpreemptive
• SJF is priority scheduling where priority is the inverse of predicted next CPU burst
time
• Problem Starvation – low priority processes may never execute
• Solution Aging – as time progresses increase the priority of the process
EXAMPLE OF PRIORITY SCHEDULING
ProcessA arri Burst TimeT Priority
P1 10 3
P2 1 1
P3 2 4
P4 1 5
P5 5 2
• Priority scheduling Gantt Chart
• Average waiting time = 8.2
PRIORITY SCHEDULING W/ ROUND-ROBIN
ProcessAarri Burst TimeT Priority
P1 4 3
P2 5 2
P3 8 2
P4 7 1
P5 3 3
• Run the process with the highest priority. Processes with the same priority run
round-robin
• Gantt Chart with time quantum = 2
MULTILEVEL QUEUE
With priority scheduling, have separate queues for each priority.
Schedule the process in the highest-priority queue!
MULTILEVEL QUEUE
• Prioritization based upon process type
MULTILEVEL FEEDBACK QUEUE
• Three queues:
• Q0 – RR with time quantum 8 milliseconds
• Q1 – RR time quantum 16 milliseconds
• Q2 – FCFS
• Scheduling
• A new process enters queue Q0 which is
served in RR
• When it gains CPU, the process receives 8
milliseconds
• If it does not finish in 8 milliseconds, the
process is moved to queue Q1
• At Q1 job is again served in RR and receives
16 additional milliseconds
• If it still does not complete, it is preempted
and moved to queue Q2
THREAD SCHEDULING
• Distinction between user-level and kernel-level threads
• When threads supported, threads scheduled, not processes
• Many-to-one and many-to-many models, thread library
schedules user-level threads to run on LWP
• Known as process-contention scope (PCS) since scheduling
competition is within the process
• Typically done via priority set by programmer
• Kernel thread scheduled onto available CPU is system-
contention scope (SCS) – competition among all threads in
system
MULTIPLE-PROCESSOR SCHEDULING
• Symmetric multiprocessing (SMP) is where each processor is self
scheduling.
• All threads may be in a common ready queue (a)
• Each processor may have its own private queue of threads (b)
MULTICORE PROCESSORS
• Recent trend to place multiple processor cores on same physical chip
• Faster and consumes less power
• Multiple threads per core also growing
• Takes advantage of memory stall to make progress on another thread while memory
retrieve happens
• Figure
MULTITHREADED MULTICORE SYSTEM
• Each core has > 1 hardware threads.
• If one thread has a memory stall, switch to another thread!
• Figure
MULTITHREADED MULTICORE SYSTEM
• Chip-multithreading (CMT) assigns
each core multiple hardware
threads. (Intel refers to this as
hyperthreading.)
• On a quad-core system with 2
hardware threads per core, the
operating system sees 8 logical
processors.
MULTITHREADED MULTICORE SYSTEM
Two levels of scheduling:
[Link] operating system deciding
which software thread to run on a
logical CPU
[Link] each core decides which
hardware thread to run on the
physical core.
MULTIPLE-PROCESSOR SCHEDULING – LOAD BALANCING
• If SMP, need to keep all CPUs loaded for efficiency
• Load balancing attempts to keep workload evenly distributed
• Push migration – periodic task checks load on each processor, and if
found pushes task from overloaded CPU to other CPUs
• Pull migration – idle processors pulls waiting task from busy processor
MULTIPLE-PROCESSOR SCHEDULING – PROCESSOR
AFFINITY
• When a thread has been running on one processor, the cache contents of that
processor stores the memory accesses by that thread.
• We refer to this as a thread having affinity for a processor (i.e., “processor
affinity”)
• Load balancing may affect processor affinity as a thread may be moved from one
processor to another to balance loads, yet that thread loses the contents of what
it had in the cache of the processor it was moved off of.
• Soft affinity – the operating system attempts to keep a thread running on the
same processor, but no guarantees.
• Hard affinity – allows a process to specify a set of processors it may run on.
NUMA AND CPU SCHEDULING
If the operating system is NUMA-aware, it will assign memory closes
to the CPU the thread is running on.
Required Reading
1. Chapter 5: CPU Scheduling (Operating System Concepts by
Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3,
2018)
Recommended Reading
1. Chapter 2.4 (Modern Operating Systems by Andrew S. Tanenbaum and
Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-
359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 5
Chapter 6: Synchronization Tools
CONTENTS
➢ The Critical-Section Problem
➢ Peterson’s Solution
➢ Hardware Support for Synchronization
➢ Mutex Locks
➢ Semaphores
WEEKLY LEARNING OUTCOMES
➢ Describe the critical-section problem and illustrate a race condition
➢ Illustrate hardware solutions to the critical-section problem using
memory barriers, compare-and-swap operations, and atomic
variables
➢ Demonstrate how Mutex locks and semaphores can be used to solve
the critical section problem
➢ Explain the bounded-buffer, readers-writers and dining-philosophers
synchronization problem.
INTRODUCTION
• Processes can execute concurrently
• May be interrupted at any time, partially completing execution
• Concurrent access to shared data may result in data inconsistency
• Maintaining data consistency requires mechanisms to ensure the
orderly execution of cooperating processes
• We illustrated in chapter 4 the problem when we considered the
Bounded Buffer problem with use of a counter that is updated
concurrently by the producer and consumer,. Which lead to race
condition.
CRITICAL SECTION PROBLEM
• Consider system of n processes {p0, p1, … pn-1}
• Each process has critical section segment of code
• Process may be changing common variables, updating table,
writing file, etc.
• When one process in critical section, no other may be in its
critical section
• Critical section problem is to design protocol to solve this
• Each process must ask permission to enter critical section in entry
section, may follow critical section with exit section, then
remainder section
CRITICAL SECTION
• General structure of process Pi
REQUIREMENT FOR CRITICAL SECTION
1. Mutual Exclusion - If process Pi is executing in its critical section,
then no other processes can be executing in their critical sections
2. Progress - If no process is executing in its critical section and there
exist some processes that wish to enter their critical section, then
the selection of the process that will enter the critical section next
cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that
other processes are allowed to enter their critical sections after a
process has made a request to enter its critical section and before
that request is granted
• Assume that each process executes at a nonzero speed
• No assumption concerning relative speed of the n processes
RACE CONDITION
• Processes P0 and P1 are creating child processes using the fork() system call
• Race condition on kernel variable next_available_pid which represents the next
available process identifier (pid)
• Unless there is a mechanism to prevent P0 and P1 from accessing the variable
next_available_pid the same pid could be assigned to two different processes!
PETERSON’S SOLUTION
• Two process solution
• Assume that the load and store machine-language instructions are
atomic; that is, cannot be interrupted
• The two processes share two variables:
• int turn;
• boolean flag[2]
• The variable turn indicates whose turn it is to enter the critical section
• The flag array is used to indicate if a process is ready to enter the
critical section.
• flag[i] = true implies that process Pi is ready!
ALGORITHM FOR PROCESS
while (true){
flag[i] = true;
turn = j;
while (flag[j] && turn = = j)
;
/* critical section */
flag[i] = false;
/* remainder section */
}
CORRECTNESS OF PETERSON’S SOLUTION
• Provable that the three CS requirement are met:
1. Mutual exclusion is preserved
Pi enters CS only if:
either flag[j] = false or turn = i
2. Progress requirement is satisfied
3. Bounded-waiting requirement is met
PETERSON’S SOLUTION
• Although useful for demonstrating an algorithm, Peterson’s Solution
is not guaranteed to work on modern architectures.
• To improve performance, processors and/or compilers may
reorder operations that have no dependencies
• Understanding why it will not work is useful for better
understanding race conditions.
• For single-threaded this is ok as the result will always be the same.
• For multithreaded the reordering may produce inconsistent or
unexpected results!
SYNCHRONIZATION HARDWARE
• Many systems provide hardware support for implementing the
critical section code.
• Uniprocessors – could disable interrupts
• Currently running code would execute without preemption
• Generally too inefficient on multiprocessor systems
• Operating systems using this not broadly scalable
• We will look at three forms of hardware support:
1. Hardware instructions
2. Atomic variables
HARDWARE INSTRUCTIONS
• Special hardware instructions that allow us to either test-and-modify
the content of a word, or to swap the contents of two words
atomically (uninterruptedly.)
• Test-and-Set instruction
• Compare-and-Swap instruction
The test_and_set Instruction
• Definition
boolean test_and_set (boolean *target)
{
boolean rv = *target;
*target = true;
return rv:
}
• Properties
• Executed atomically
• Returns the original value of passed parameter
• Set the new value of passed parameter to true
Solution Using test_and_set()
• Shared boolean variable lock, initialized to false
• Solution:
do {
while (test_and_set(&lock))
; /* do nothing */
/* critical section */
lock = false;
/* remainder section */
} while (true);
• Does it solve the critical-section problem?
Atomic Variables
• Typically, instructions such as compare-and-swap are used as
building blocks for other synchronization tools.
• One tool is an atomic variable that provides atomic
(uninterruptible) updates on basic data types such as integers and
booleans.
• For example:
• Let sequence be an atomic variable
• Let increment() be operation on the atomic variable sequence
• The Command:
increment(&sequence);
ensures sequence is incremented without interruption:
MUTEX LOCK
• Previous solutions are complicated and generally inaccessible to application
programmers
• OS designers build software tools to solve critical section problem
• Simplest is mutex lock
• Boolean variable indicating if lock is available or not
• Protect a critical section by
• First acquire() a lock
• Then release() the lock
• Calls to acquire() and release() must be atomic
• Usually implemented via hardware atomic instructions such as compare-and-swap.
• But this solution requires busy waiting
• This lock therefore called a spinlock
SOLUTION USING MUTEX LOCK
while (true) {
acquire lock
critical section
release lock
remainder section
}
SEMAPHORE
• Synchronization tool that provides more sophisticated ways (than Mutex locks) for processes to synchronize their
activities.
• Semaphore S – integer variable
• Can only be accessed via two indivisible (atomic) operations
• wait() and signal()
• Originally called P() and V()
• Definition of the wait() operation
wait(S) {
while (S <= 0)
; // busy wait
S--;
}
• Definition of the signal() operation
signal(S) {
S++;
}
SEMAPHORE
• Counting semaphore – integer value can range over an unrestricted
domain
• Binary semaphore – integer value can range only between 0 and 1
• Same as a mutex lock
• Can implement a counting semaphore S as a binary semaphore
• With semaphores we can solve various synchronization problems
SEMAPHORE EXAMPLE
• Solution to the CS Problem
• Create a semaphore “mutex” initialized to 1
wait(mutex);
CS
signal(mutex);
• Consider P1 and P2 that with two statements S1 and S2 and the requirement that S1 to happen
before S2
• Create a semaphore “synch” initialized to 0
P1:
S1;
signal(synch);
P2:
wait(synch);
S2;
SEMAPHORE IMPLEMENTATION
• Must guarantee that no two processes can execute the wait() and
signal() on the same semaphore at the same time
• Thus, the implementation becomes the critical section problem
where the wait and signal code are placed in the critical section
• Could now have busy waiting in critical section implementation
• But implementation code is short
• Little busy waiting if critical section rarely occupied
• Note that applications may spend lots of time in critical sections and
therefore this is not a good solution
SEMAPHORE IMPLEMENTATION WITH NO BUSY WAIT
• With each semaphore there is an associated waiting queue
• Each entry in a waiting queue has two data items:
• Value (of type integer)
• Pointer to next record in the list
• Two operations:
• block – place the process invoking the operation on the
appropriate waiting queue
• wakeup – remove one of processes in the waiting queue and
place it in the ready queue
Main Reference
Chapter 6: Synchronization Tools
Chapter 7: Synchronization Examples
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 2.3
Chapter 2.5
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 5
Chapter 6: Synchronization Tools
Chapter 7: Synchronization Examples
CONTENTS
➢ Classic Problems of Synchronization
WEEKLY LEARNING OUTCOMES
➢ Describe the critical-section problem and illustrate a race condition
➢ Illustrate hardware solutions to the critical-section problem using
memory barriers, compare-and-swap operations, and atomic
variables
➢ Demonstrate how Mutex locks and semaphores can be used to solve
the critical section problem
➢ Explain the bounded-buffer, readers-writers and dining-philosophers
synchronization problem.
CLASSICAL PROBLEMS OF SYNCHRONIZATION
• Classical problems used to test newly-proposed synchronization
schemes
• Bounded-Buffer Problem
• Readers and Writers Problem
• Dining-Philosophers Problem
BOUNDED BUFFER PROBLEM
• n buffers, each can hold one item
• Semaphore mutex initialized to the value 1
• Semaphore full initialized to the value 0
• Semaphore empty initialized to the value n
BOUNDED BUFFER PROBLEM
• The structure of the producer process
while (true) {
...
/* produce an item in next_produced */
...
wait(empty);
wait(mutex);
...
/* add next produced to the buffer */
...
signal(mutex);
signal(full);
}
BOUNDED BUFFER PROBLEM
• The structure of the consumer process
while (true) {
wait(full);
wait(mutex);
...
/* remove an item from buffer to next_consumed */
...
signal(mutex);
signal(empty);
...
/* consume the item in next consumed */
...
}
READERS-WRITERS PROBLEM
• A data set is shared among a number of concurrent processes
• Readers – only read the data set; they do not perform any
updates
• Writers – can both read and write
• Problem – allow multiple readers to read at the same time
• Only one single writer can access the shared data at the same
time
• Several variations of how readers and writers are considered – all
involve some form of priorities
READERS-WRITERS PROBLEM
• Shared Data
• Data set
• Semaphore rw_mutex initialized to 1
• Semaphore mutex initialized to 1
• Integer read_count initialized to 0
READERS-WRITERS PROBLEM
• The structure of a writer process
while (true) {
wait(rw_mutex);
...
/* writing is performed */
...
signal(rw_mutex);
}
READERS-WRITERS PROBLEM
• The structure of a reader process
while (true){
wait(mutex);
read_count++;
if (read_count == 1) /* first reader */
wait(rw_mutex);
signal(mutex);
...
/* reading is performed */
...
wait(mutex);
read count--;
if (read_count == 0) /* last reader */
signal(rw_mutex);
signal(mutex);
}
READERS-WRITERS PROBLEM VARIATION
• The solution in previous slide can result in a situation where a writer
process never writes. It is referred to as the “First reader-writer”
problem.
• The “Second reader-writer” problem is a variation the first reader-
writer problem that state:
• Once a writer is ready to write, no “newly arrived reader” is
allowed to read.
• Both the first and second may result in starvation. leading to even
more variations
• Problem is solved on some systems by kernel providing reader-
writer locks
DINING PHILOSOPHERS PROBLEM
• N philosophers’ sit at a round table with a bowel of rice in the middle.
• They spend their lives alternating thinking and eating.
• They do not interact with their neighbors.
• Occasionally try to pick up 2 chopsticks (one at a time) to eat from bowl
• Need both to eat, then release both when done
• In the case of 5 philosophers, the shared data
• Bowl of rice (data set)
• Semaphore chopstick [5] initialized to 1
DINING PHILOSOPHERS PROBLEM ALGORITHM
• Semaphore Solution
• The structure of Philosopher i :
while (true){
wait (chopstick[i] );
wait (chopStick[ (i + 1) % 5] );
/* eat for a while */
signal (chopstick[i] );
signal (chopstick[ (i + 1) % 5] );
/* think for awhile */
}
• What is the problem with this algorithm?
KERNEL SYNCHRONIZATION - WINDOWS
• Uses interrupt masks to protect access to global resources on
uniprocessor systems
• Uses spinlocks on multiprocessor systems
• Spinlocking-thread will never be preempted
• Also provides dispatcher objects user-land which may act mutexes,
semaphores, events, and timers
• Events
• An event acts much like a condition variable
• Timers notify one or more thread when time expired
• Dispatcher objects either signaled-state (object available) or
non-signaled state (thread will block)
KERNEL SYNCHRONIZATION - WINDOWS
Mutex dispatcher object
Main Reference
Chapter 6: Synchronization Tools
Chapter 7: Synchronization Examples
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 2.3
Chapter 2.5
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 6
Chapter 8: Deadlocks
CONTENTS
➢ System Model
➢ Deadlock Characterization
➢ Methods for Handling Deadlocks
➢ Deadlock Prevention
➢ Deadlock Avoidance
Weekly LEARNING OUTCOMES
➢ Illustrate how deadlock can occur when mutex locks are used
➢ Define the four necessary conditions that characterize deadlock
➢ Identify a deadlock situation in a resource allocation graph
➢ Evaluate the four different approaches for preventing deadlocks
SYSTEM MODEL
• System consists of resources
• Resource types R1, R2, . . ., Rm
• CPU cycles, memory space, I/O devices
• Each resource type Ri has Wi instances.
• Each process utilizes a resource as follows:
• request
• use
• release
DEADLOCK WITH SEMAPHORES
• Data:
• A semaphore S1 initialized to 1
• A semaphore S2 initialized to 1
• Two processes P1 and P2
• P1:
wait(s1)
wait(s2)
• P2:
wait(s2)
wait(s1)
DEADLOCK CHARACTERIZATION
Deadlock can arise if four conditions hold simultaneously.
• Mutual exclusion: only one process at a time can use a resource
• Hold and wait: a process holding at least one resource is waiting to acquire
additional resources held by other processes
• No preemption: a resource can be released only voluntarily by the process
holding it, after that process has completed its task
• Circular wait: there exists a set {P0, P1, …, Pn} of waiting processes such that P0 is
waiting for a resource that is held by P1, P1 is waiting for a resource that is held
by P2, …, Pn–1 is waiting for a resource that is held by Pn, and Pn is waiting for a
resource that is held by P0.
RESOURCE-ALLOCATION GRAPH
A set of vertices V and a set of edges E.
• V is partitioned into two types:
• P = {P1, P2, …, Pn}, the set consisting of all the processes in the system
• R = {R1, R2, …, Rm}, the set consisting of all resource types in the system
• request edge – directed edge Pi → Rj
• assignment edge – directed edge Rj → Pi
RESOURCE ALLOCATION GRAPH EXAMPLE
• One instance of R1
• Two instances of R2
• One instance of R3
• Three instance of R4
• T1 holds one instance of R2 and is waiting
for an instance of R1
• T2 holds one instance of R1, one instance of
R2, and is waiting for an instance of R3
• T3 is holds one instance of R3
RESOURCE ALLOCATION GRAPH WITH A
DEADLOCK
GRAPH WITH A CYCLE BUT NO DEADLOCK
BASIC FACTS ON CYCLES AND DEADLOCK
• If graph contains no cycles no deadlock
• If graph contains a cycle
• if only one instance per resource type, then deadlock
• if several instances per resource type, possibility of deadlock
METHODS FOR HANDLING DEADLOCKS
• Ensure that the system will never enter a deadlock state:
• Deadlock prevention
• Deadlock avoidance
• Allow the system to enter a deadlock state and then recover
• Ignore the problem and pretend that deadlocks never occur in the system.
DEADLOCK PREVENTION
Invalidate one of the four necessary conditions for
deadlock:
• Mutual Exclusion – not required for sharable resources (e.g.,
read-only files); must hold for non-sharable resources
• Hold and Wait – must guarantee that whenever a process
requests a resource, it does not hold any other resources
• Require process to request and be allocated all its resources before
it begins execution, or allow process to request resources only
when the process has none allocated to it.
• Low resource utilization; starvation possible
DEADLOCK PREVENTION (CONT.)
• No Preemption:
• If a process that is holding some resources requests another resource that cannot
be immediately allocated to it, then all resources currently being held are released
• Preempted resources are added to the list of resources for which the process is
waiting
• Process will be restarted only when it can regain its old resources, as well as the
new ones that it is requesting
• Circular Wait:
• Impose a total ordering of all resource types, and require that each process
requests resources in an increasing order of enumeration
CIRCULAR WAIT
• Invalidating the circular wait condition is most common.
• Simply assign each resource (i.e., mutex locks) a unique number.
• Resources must be acquired in order.
• If:
first_mutex = 1
second_mutex = 5
code for thread_two could not be
written as follows:
DEADLOCK AVOIDANCE
Requires that the system has some additional a priori information
available:
• Simplest and most useful model requires that each process declare the
maximum number of resources of each type that it may need
• The deadlock-avoidance algorithm dynamically examines the resource-
allocation state to ensure that there can never be a circular-wait condition
• Resource-allocation state is defined by the number of available and
allocated resources, and the maximum demands of the processes
SAFE STATE
• When a process requests an available resource, system must decide if
immediate allocation leaves the system in a safe state
• System is in safe state if there exists a sequence <P1, P2, …, Pn> of ALL the
processes in the systems such that for each Pi, the resources that Pi can still
request can be satisfied by currently available resources + resources held by all
the Pj, with j < I
• That is:
• If Pi resource needs are not immediately available, then Pi can wait until all Pj have
finished
• When Pj is finished, Pi can obtain needed resources, execute, return allocated resources,
and terminate
• When Pi terminates, Pi +1 can obtain its needed resources, and so on
BASIC FACTS ON SAFE STATE AND DEADLOCK
• If a system is in safe state no deadlocks
• If a system is in unsafe state possibility of deadlock
• Avoidance ensure that a system will never enter an unsafe
state.
SAFE, UNSAFE, DEADLOCK STATE
AVOIDANCE ALGORITHMS
• Single instance of a resource type
• Use a resource-allocation graph
• Multiple instances of a resource type
• Use the Banker’s Algorithm
RESOURCE-ALLOCATION GRAPH SCHEME
• Claim edge Pi → Rj indicated that process Pj may request
resource Rj; represented by a dashed line
• Claim edge converts to request edge when a process requests
a resource
• Request edge converted to an assignment edge when the
resource is allocated to the process
• When a resource is released by a process, assignment edge
reconverts to a claim edge
• Resources must be claimed a priori in the system
RESOURCE-ALLOCATION GRAPH
(a) Safe state (b) Unsafe state
RESOURCE-ALLOCATION GRAPH ALGORITHM
• Suppose that process Pi requests a resource Rj
• The request can be granted only if converting the request edge to an
assignment edge does not result in the formation of a cycle in the resource
allocation graph
BANKER’S ALGORITHM
• Multiple instances of resources
• Each process must a priori claim maximum use
• When a process requests a resource it may have to wait
• When a process gets all its resources it must return them in a finite amount of
time
DATA STRUCTURES FOR THE BANKER’S ALGORITHM
Let n = number of processes, and m = number of resources types.
• Available: Vector of length m. If available [j] = k, there are k instances of
resource type Rj available
• Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k
instances of resource type Rj
• Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently allocated k
instances of Rj
• Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to
complete its task
Need [i,j] = Max[i,j] – Allocation [i,j]
SAFETY ALGORITHM
1. Let Work and Finish be vectors of length m and n, respectively. Initialize:
Work = Available
Finish [i] = false for i = 0, 1, …, n- 1
2. Find an i such that both:
(a) Finish [i] = false
(b) Needi Work
If no such i exists, go to step 4
3. Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish [i] == true for all i, then the system is in a safe state
RESOURCE-REQUEST ALGORITHM FOR PROCESS PI
Requesti = request vector for process Pi. If Requesti [j] = k then process
Pi wants k instances of resource type Rj
1. If Requesti Needi go to step 2. Otherwise, raise error condition, since
process has exceeded its maximum claim
2. If Requesti Available, go to step 3. Otherwise Pi must wait, since resources
are not available
3. Pretend to allocate requested resources to Pi by modifying the state as
follows:
Available = Available – Requesti;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
•If safe the resources are allocated to Pi
•If unsafe Pi must wait, and the old resource-allocation state is restored
EXAMPLE OF BANKER’S ALGORITHM
• 5 processes P0 through P4;
3 resource types:
A (10 instances), B (5instances), and C (7 instances)
• Snapshot at time T0:
Allocation Max Available
ABC ABC ABC
P0 010 753 332
P1 200 322
P2 302 902
P3 211 222
P4 002 433
EXAMPLE (CONT.)
• The content of the matrix Need is defined to be Max – Allocation
Need
ABC
P0 7 4 3
P1 1 2 2
P2 6 0 0
P3 0 1 1
P4 4 3 1
• The system is in a safe state since the sequence < P1, P3, P4, P2, P0>
satisfies safety criteria
EXAMPLE: P1 REQUEST (1,0,2)
• Check that Request Available (that is, (1,0,2) (3,3,2) true
Allocation Need Available
ABC ABC ABC
P0 010 743 230
P1 302 020
P2 302 600
P3 211 011
P4 002 431
• Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2> satisfies safety
requirement
• Can request for (3,3,0) by P4 be granted?
• Can request for (0,2,0) by P0 be granted?
Required Reading
1. Chapter 8: Deadlocks (Operating System Concepts by Silberschatz,
Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3, 2018)
Recommended Reading
1. Chapter 6 (Modern Operating Systems by Andrew S. Tanenbaum and
Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-
359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 6
Chapter 8: Deadlocks
CONTENTS
➢ Deadlock Detection
➢ Recovery from Deadlock
Weekly LEARNING OUTCOMES
➢ Illustrate how deadlock can occur when mutex locks are used
➢ Define the four necessary conditions that characterize deadlock
➢ Identify a deadlock situation in a resource allocation graph
➢ Evaluate the four different approaches for preventing deadlocks
DEADLOCK DETECTION
• Allow system to enter deadlock state
• Detection algorithm
• Recovery scheme
SINGLE INSTANCE OF EACH RESOURCE TYPE
• Maintain wait-for graph
• Nodes are processes
• Pi → Pj if Pi is waiting for Pj
• Periodically invoke an algorithm that searches for a cycle in
the graph. If there is a cycle, there exists a deadlock
• An algorithm to detect a cycle in a graph requires an order of
n2 operations, where n is the number of vertices in the graph
RESOURCE-ALLOCATION GRAPH AND WAIT-FOR GRAPH
Resource-Allocation Graph Corresponding wait-for graph
SEVERAL INSTANCES OF A RESOURCE TYPE
• Available: A vector of length m indicates the number of available
resources of each type
• Allocation: An n x m matrix defines the number of resources of each
type currently allocated to each process
• Request: An n x m matrix indicates the current request of each
process. If Request [i][j] = k, then process Pi is requesting k more
instances of resource type Rj.
DETECTION ALGORITHM
[Link] Work and Finish be vectors of length m and n,
respectively Initialize:
a) Work = Available
b) For i = 1,2, …, n, if Allocationi 0, then
Finish[i] = false; otherwise, Finish[i] = true
[Link] an index i such that both:
a) Finish[i] == false
b) Requesti Work
If no such i exists, go to step 4
DETECTION ALGORITHM (CONT.)
3. Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish[i] == false, for some i, 1 i n, then the system is in
deadlock state. Moreover, if Finish[i] == false, then Pi is
deadlocked
Note: Algorithm requires an order of O(m x n2) operations to
detect whether the system is in deadlocked state
EXAMPLE OF DETECTION ALGORITHM
• Five processes P0 through P4; three resource types
A (7 instances), B (2 instances), and C (6 instances)
• Snapshot at time T0:
Allocation Request Available
ABC ABC ABC
P0 010 000 000
P1 200 202
P2 303 000
P3 211 100
P4 002 002
• Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i
EXAMPLE (CONT.)
P2 requests an additional instance of type C
Request
ABC
P0 000
P1 202
P2 001
P3 100
P4 002
State of system?
Can reclaim resources held by process P0, but insufficient resources to fulfill other processes;
requests
Deadlock exists, consisting of processes P1, P2, P3, and P4
DETECTION-ALGORITHM USAGE
• When, and how often, to invoke depends on:
• How often a deadlock is likely to occur?
• How many processes will need to be rolled back?
• one for each disjoint cycle
• If detection algorithm is invoked arbitrarily, there may be many cycles
in the resource graph and so we would not be able to tell which of the
many deadlocked processes “caused” the deadlock.
RECOVERY FROM DEADLOCK: PROCESS TERMINATION
Abort all deadlocked processes
Abort one process at a time until the deadlock cycle is eliminated
In which order should we choose to abort?
1. Priority of the process
2. How long process has computed, and how much longer to
completion
3. Resources the process has used
4. Resources process needs to complete
5. How many processes will need to be terminated
6. Is process interactive or batch?
RECOVERY FROM DEADLOCK: RESOURCE PREEMPTION
• Selecting a victim – minimize cost
• Rollback – return to some safe state, restart process for that
state
• Starvation – same process may always be picked as victim,
include number of rollback in cost factor
Required Reading
1. Chapter 8: Deadlocks (Operating System Concepts by Silberschatz,
Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3, 2018)
Recommended Reading
1. Chapter 6 (Modern Operating Systems by Andrew S. Tanenbaum and
Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-
359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 7
Chapter 9: Main Memory
CONTENTS
➢ Contiguous Memory Allocation
➢ Paging
➢ Structure of the Page Table
➢ Swapping
WEEKLY LEARNING OUTCOMES
➢ To provide a detailed description of various ways of
organizing memory hardware
➢ To evaluate various memory-management techniques
➢ To understand the concept of Swapping
INTRODUCTION
• Program must be brought (from disk) into memory and placed
within a process for it to be run
• Main memory and registers are only storage CPU can access directly
• Memory unit only sees a stream of:
• addresses + read requests, or
• address + data and write requests
• Register access is done in one CPU clock (or less)
• Main memory can take many cycles, causing a stall
• Cache sits between main memory and CPU registers
• Protection of memory required to ensure correct operation
PROTECTION
• Need to censure that a process can access only access those addresses
in its address space.
• We can provide this protection by using a pair of base and limit
registers define the logical address space of a process
HARDWARE ADDRESS PROTECTION
• CPU must check every memory access generated in user mode to be sure
it is between base and limit for that user
• the instructions to loading the base and limit registers are privileged
MEMORY BINDING
• Address binding of instructions and data to memory addresses can
happen at three different stages
• Compile time: If memory location known a priori, absolute code
can be generated; must recompile code if starting location
changes
• Load time: Must generate relocatable code if memory location
is not known at compile time
• Execution time: Binding delayed until run time if the process
can be moved during its execution from one memory segment to
another
• Need hardware support for address maps (e.g., base and
limit registers)
PROCESSING OF USER PROGRAM
ADDRESS SPACE
• The concept of a logical address space that is bound to a separate
physical address space is central to proper memory management
• Logical address – generated by the CPU; also referred to as
virtual address
• Physical address – address seen by the memory unit
• Logical and physical addresses are the same in compile-time and
load-time address-binding schemes; logical (virtual) and physical
addresses differ in execution-time address-binding scheme
• Logical address space is the set of all logical addresses generated by
a program
• Physical address space is the set of all physical addresses generated
by a program
MEMORY MANAGEMENT UNIT (MMU)
• Hardware device that at run time maps virtual to physical address
MEMORY MANAGEMENT UNIT (MMU)
• Consider simple scheme. which is a generalization of the base-
register scheme.
▪ The base register now called relocation register
• The value in the relocation register is added to every address
generated by a user process at the time it is sent to memory
• The user program deals with logical addresses; it never sees the real
physical addresses
• Execution-time binding occurs when reference is made to
location in memory
• Logical address bound to physical addresses
MEMORY MANAGEMENT UNIT (MMU)
• Consider simple scheme. which is a generalization of the base-
register scheme.
▪ The base register now called relocation register
• The value in the relocation register is added to every address
generated by a user process at the time it is sent to memory
CONTIGUOUS ALLOCATION
• Main memory must support both OS and user processes
• Limited resource, must allocate efficiently
• Contiguous allocation is one early method
• Main memory usually into two partitions:
• Resident operating system, usually held in low memory with
interrupt vector
• User processes then held in high memory
• Each process contained in single contiguous section of memory
CONTIGUOUS ALLOCATION
• Relocation registers used to protect user processes from each other,
and from changing operating-system code and data
• Base register contains value of smallest physical address
• Limit register contains range of logical addresses – each logical
address must be less than the limit register
• MMU maps logical address dynamically
• Can then allow actions such as kernel code being transient and
kernel changing size
HARDWARE SUPPORT
VARIABLE PARTITION
• Multiple-partition allocation
• Degree of multiprogramming limited by number of partitions
• Variable-partition sizes for efficiency (sized to a given process’ needs)
• Hole – block of available memory; holes of various size are scattered throughout memory
• When a process arrives, it is allocated memory from a hole large enough to accommodate it
• Process exiting frees its partition, adjacent free partitions combined
• Operating system maintains information about:
a) allocated partitions b) free partitions (hole)
DYNAMIC STORAGE ALLOCATION
• First-fit: Allocate the first hole that is big enough
• Best-fit: Allocate the smallest hole that is big enough; must search
entire list, unless ordered by size
• Produces the smallest leftover hole
• Worst-fit: Allocate the largest hole; must also search entire list
• Produces the largest leftover hole
• First-fit and best-fit better than worst-fit in terms of speed and
storage utilization
FRAGMENTATION
• External Fragmentation – total memory space exists to satisfy a
request, but it is not contiguous
• Internal Fragmentation – allocated memory may be slightly larger
than requested memory; this size difference is memory internal to a
partition, but not being used
• First fit analysis reveals that given N blocks allocated, 0.5 N blocks
lost to fragmentation
• 1/3 may be unusable -> 50-percent rule
ADDRESS SPACE
• Reduce external fragmentation by Compaction
• Shuffle memory contents to place all free memory together in
one large block
• Compaction is possible only if relocation is dynamic, and is done
at execution time
• I/O problem
• Latch job in memory while it is involved in I/O
• Do I/O only into OS buffers
• Now consider that backing store has same fragmentation problems
PAGING
• Physical address space of a process can be noncontiguous; process is allocated physical
memory whenever the latter is available
• Avoids external fragmentation
• Avoids problem of varying sized memory chunks
• Divide physical memory into fixed-sized blocks called frames
• Size is power of 2, between 512 bytes and 16 Mbytes
• Divide logical memory into blocks of same size called pages
• Keep track of all free frames
• To run a program of size N pages, need to find N free frames and load program
• Set up a page table to translate logical to physical addresses
• Backing store likewise split into pages
• Still have Internal fragmentation
ADDRESS TRANSLATION
• Address generated by CPU is divided into:
• Page number (p) – used as an index into a page table which
contains base address of each page in physical memory
• Page offset (d) – combined with base address to define the
physical memory address that is sent to the memory unit
page number page offset
p d
m -n n
• For given logical address space 2m and page size 2n
PAGING HARDWARE
PAGING EXAMPLE
PAGING EXAMPLE
PAGING - CALCULATION
• Page size = 2,048 bytes
• Process size = 72,766 bytes
• 35 pages + 1,086 bytes
• Internal fragmentation of 2,048 - 1,086 = 962 bytes
• Worst case fragmentation = 1 frame – 1 byte
• On average fragmentation = 1 / 2 frame size
• So small frame sizes desirable?
• But each page table entry takes memory to track
• Page sizes growing over time
FREE FRAMES
Before allocation After allocation
PAGE TABLE - IMPLEMENTATION
• Page table is kept in main memory
• Page-table base register (PTBR) points to the page table
• Page-table length register (PTLR) indicates size of the page table
• In this scheme every data/instruction access requires two memory
accesses
• One for the page table and one for the data / instruction
• The two-memory access problem can be solved by the use of a
special fast-lookup hardware cache called translation look-aside
buffers (TLBs) (also called associative memory).
TRANSLATION LOOK-ASIDE BUFFER
• Some TLBs store address-space identifiers (ASIDs) in each TLB entry
– uniquely identifies each process to provide address-space
protection for that process
• Otherwise need to flush at every context switch
• TLBs typically small (64 to 1,024 entries)
• On a TLB miss, value is loaded into the TLB for faster access next
time
• Replacement policies must be considered
• Some entries can be wired down for permanent fast access
ADDRESS SPACE
• Associative memory – parallel search
Page # Frame #
• Address translation (p, d)
• If p is in associative register, get frame # out
• Otherwise get frame # from page table in memory
PAGING HARDWARE WITH TLB
EFFECTIVE ACCESS TIME
▪ Hit ratio – percentage of times that a page number is found in the TLB
• An 80% hit ratio means that we find the desired page number in the TLB 80% of the
time.
• Suppose that 10 nanoseconds to access memory.
• If we find the desired page in TLB then a mapped-memory access take 10 ns
• Otherwise we need two memory access so it is 20 ns
• Effective Access Time (EAT)
EAT = 0.80 x 10 + 0.20 x 20 = 12 nanoseconds
implying 20% slowdown in access time
• Consider amore realistic hit ratio of 99%,
EAT = 0.99 x 10 + 0.01 x 20 = 10.1ns
implying only 1% slowdown in access time.
MEMORY PROTECTION
• Memory protection implemented by associating protection bit with
each frame to indicate if read-only or read-write access is allowed
• Can also add more bits to indicate page execute-only, and so on
• Valid-invalid bit attached to each entry in the page table:
• “valid” indicates that the associated page is in the process’
logical address space, and is thus a legal page
• “invalid” indicates that the page is not in the process’ logical
address space
• Or use page-table length register (PTLR)
• Any violations result in a trap to the kernel
INVERTED PAGE TABLE
• Rather than each process having a page table and keeping track of all
possible logical pages, track all physical pages
• One entry for each real page of memory
• Entry consists of the virtual address of the page stored in that real memory
location, with information about the process that owns that page
• Decreases memory needed to store each page table, but increases time
needed to search the table when a page reference occurs
• Use hash table to limit the search to one — or at most a few — page-table
entries
• TLB can accelerate access
• But how to implement shared memory?
• One mapping of a virtual address to the shared physical address
MEMORY PROTECTION
Valid (v) or Invalid (i) Bit In A Page Table
SWAPPING
• A process can be swapped temporarily out of memory to a backing store,
and then brought back into memory for continued execution
• Total physical memory space of processes can exceed physical
memory
• Backing store – fast disk large enough to accommodate copies of all
memory images for all users; must provide direct access to these memory
images
• Roll out, roll in – swapping variant used for priority-based scheduling
algorithms; lower-priority process is swapped out so higher-priority
process can be loaded and executed
• Major part of swap time is transfer time; total transfer time is directly
proportional to the amount of memory swapped
• System maintains a ready queue of ready-to-run processes which have
memory images on disk
SCHEMATIC VIEW
SWAPPING WITH PAGING
Main Reference
Chapter 9: Main Memory
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 3.1
Chapter 3.2
Chapter 3.7
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 9
Chapter 10: Virtual Memory
CONTENTS
➢ Demand Paging
➢ Copy-on-Write
➢ Page Replacement
➢ Allocation of Frames
WEEKLY LEARNING OUTCOMES
➢ Define virtual memory and describe its benefits.
➢ Illustrate how pages are loaded into memory using demand
paging.
➢ Apply the FIFO, optimal, and LRU page-replacement
algorithms.
➢ Describe the working set of a process, and explain how it is
related to program locality.
INTRODUCTION
• Code needs to be in memory to execute, but entire program rarely
used
• Error code, unusual routines, large data structures
• Entire program code not needed at same time
• Consider ability to execute partially-loaded program
• Program no longer constrained by limits of physical memory
• Each program takes less memory while running -> more
programs run at the same time
• Increased CPU utilization and throughput with no increase in
response time or turnaround time
• Less I/O needed to load or swap programs into memory -> each
user program runs faster
VIRTUAL MEMORY
• Virtual memory – separation of user logical memory from physical
memory
• Only part of the program needs to be in memory for execution
• Logical address space can therefore be much larger than physical
address space
• Allows address spaces to be shared by several processes
• Allows for more efficient process creation
• More programs running concurrently
• Less I/O needed to load or swap processes
VIRTUAL MEMORY
• Virtual address space – logical view of how process is stored in
memory
• Usually start at address 0, contiguous addresses until end of
space
• Meanwhile, physical memory organized in page frames
• MMU must map logical to physical
• Virtual memory can be implemented via:
• Demand paging
• Demand segmentation
VIRTUAL MEMORY
Virtual Memory That is Larger Than Physical Memory
DEMAND PAGING
• Could bring entire process into
memory at load time
• Or bring a page into memory only
when it is needed
• Less I/O needed, no unnecessary
I/O
• Less memory needed
• Faster response
• More users
• Similar to paging system with
swapping (diagram on right)
BASIC CONCEPTS
• With swapping, pager guesses which pages will be used before
swapping out again
• Instead, pager brings in only those pages into memory
• How to determine that set of pages?
• Need new MMU functionality to implement demand paging
• If pages needed are already memory resident
• No difference from non demand-paging
• If page needed and not memory resident
• Need to detect and load the page into memory from storage
• Without changing program behavior
• Without programmer needing to change code
VALID - INVALID BIT
• With each page table entry a valid–invalid bit is associated
(v in-memory – memory resident, i not-in-memory)
• Initially valid–invalid bit is set to i on all entries
• Example of a page table snapshot:
During MMU address translation, if valid–invalid bit in page table entry is i page fault
VALID - INVALID BIT
• Page Table When Some Pages Are Not in Main Memory
STEPS – PAGE FAULT
[Link] there is a reference to a page, first reference to that page will trap
to operating system
• Page fault
[Link] system looks at another table to decide:
• Invalid reference abort
• Just not in memory
[Link] free frame
[Link] page into frame via scheduled disk operation
[Link] tables to indicate page now in memory
Set validation bit = v
[Link] the instruction that caused the page fault
STEPS – PAGE FAULT
STAGES IN DEMAND PAGES
[Link] to the operating system
[Link] the user registers and process state
[Link] that the interrupt was a page fault
[Link] that the page reference was legal and determine the location
of the page on the disk
[Link] a read from the disk to a free frame:
a) Wait in a queue for this device until the read request is serviced
b) Wait for the device seek and/or latency time
c) Begin the transfer of the page to a free frame
STAGES IN DEMAND PAGES
6. While waiting, allocate the CPU to some other user
7. Receive an interrupt from the disk I/O subsystem (I/O completed)
8. Save the registers and process state for the other user
9. Determine that the interrupt was from the disk
10. Correct the page table and other tables to show page is now in
memory
11. Wait for the CPU to be allocated to this process again
12. Restore the user registers, process state, and new page table, and
then resume the interrupted instruction
PAGE REPLACEMENT
• Used up by process pages
• Also in demand from the kernel, I/O buffers, etc
• How much to allocate to each?
• Page replacement – find some page in memory, but not really in
use, page it out
• Algorithm – terminate? swap out? replace the page?
• Performance – want an algorithm which will result in minimum
number of page faults
• Same page may be brought into memory several times
PAGE REPLACEMENT
• Prevent over-allocation of memory by modifying page-fault
service routine to include page replacement
• Use modify (dirty) bit to reduce overhead of page transfers –
only modified pages are written to disk
• Page replacement completes separation between logical memory
and physical memory – large virtual memory can be provided on a
smaller physical memory
NEED FOR PAGE REPLACEMENT
BASIC PAGE REPLACEMENT
1. Find the location of the desired page on disk
2. Find a free frame:
- If there is a free frame, use it
- If there is no free frame, use a page replacement algorithm to
select a victim frame
- Write victim frame to disk if dirty
3. Bring the desired page into the (newly) free frame; update the
page and frame tables
4. Continue the process by restarting the instruction that caused the
trap
PAGE REPLACEMENT
PAGE REPLACEMENT ALGORITHMS
• Frame-allocation algorithm determines
• How many frames to give each process
• Which frames to replace
• Page-replacement algorithm
• Want lowest page-fault rate on both first access and re-access
• Evaluate algorithm by running it on a particular string of memory references
(reference string) and computing the number of page faults on that string
• String is just page numbers, not full addresses
• Repeated access to the same page does not cause a page fault
• Results depend on number of frames available
• In all our examples, the reference string of referenced page numbers is
7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
FIRST-IN-FIRST-OUT (FIFO) ALGORITHM
• Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1
• 3 frames (3 pages can be in memory at a time per process)
15 page faults
• Can vary by reference string: consider 1,2,3,4,1,2,5,1,2,3,4,5
• Adding more frames can cause more page faults!
• Belady’s Anomaly
• How to track ages of pages?
• Just use a FIFO queue
FIRST-IN-FIRST-OUT (FIFO) ALGORITHM
Belady’s Anomaly
OPTIMAL ALGORITHM
• Replace page that will not be used for longest period of time
• 9 is optimal for the example
• How do you know this?
• Can’t read the future
• Used for measuring how well your algorithm performs
LEAST RECENTLY USED (LRU) ALGORITHM
• Use past knowledge rather than future
• Replace page that has not been used in the most amount of time
• Associate time of last use with each page
• 12 faults – better than FIFO but worse than OPT
• Generally good algorithm and frequently used
• But how to implement?
LEAST RECENTLY USED (LRU) APPROXIMATION
• Second-chance algorithm
• Generally FIFO, plus hardware-provided reference bit
• Clock replacement
• If page to be replaced has
• Reference bit = 0 -> replace it
• reference bit = 1 then:
• set reference bit 0, leave page in memory
• replace next page, subject to same rules
SECOND-CHANCE ALGORITHM
ALLOCATION OF FRAMES
• Each process needs minimum number of frames
• Example: IBM 370 – 6 pages to handle SS MOVE instruction:
• instruction is 6 bytes, might span 2 pages
• 2 pages to handle from
• 2 pages to handle to
• Maximum of course is total frames in the system
• Two major allocation schemes
• fixed allocation
• priority allocation
• Many variations
FIXED ALLOCATION
• Equal allocation – For example, if there are 100 frames (after
allocating frames for the OS) and 5 processes, give each process 20
frames
• Keep some as free frame buffer pool
• Proportional allocation – Allocate according to the size of process
• Dynamic as degree of multiprogramming, process sizes change
si = size of process pi m = 64
s1 = 10
S = si
s2 = 127
m = total number of frames 10
a1 = ´ 62 » 4
si 137
ai = allocation for pi = m 127
S a2 = ´ 62 » 57
137
GLOBAL AND LOCAL ALLOCATION
• Global replacement – process selects a replacement frame from the
set of all frames; one process can take a frame from another
• But then process execution time can vary greatly
• But greater throughput so more common
• Local replacement – each process selects from only its own set of
allocated frames
• More consistent per-process performance
• But possibly underutilized memory
THRASHING
• If a process does not have “enough” pages, the page-fault rate is
very high
• Page fault to get page
• Replace existing frame
• But quickly need replaced frame back
• This leads to:
• Low CPU utilization
• Operating system thinking that it needs to increase the
degree of multiprogramming
• Another process added to the system
THRASHING
• Thrashing. A process is busy swapping pages in and out
DEMAND PAGING AND THRASHING
• Why does demand paging work?
Locality model
• Process migrates from one locality to another
• Localities may overlap
• Why does thrashing occur?
size of locality > total memory size
• Limit effects by using local or priority page replacement
Main Reference
Chapter 10: Virtual Memory
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 3.3
Chapter 3.4
Chapter 3.5
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
ر
الجامعة السعودية االلكتونية
ر
االلكتونية الجامعة السعودية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 10
Chapter 11: Mass-Storage Systems
CONTENTS
➢ Overview of Mass Storage Structure
➢ HDD Scheduling
Weekly LEARNING OUTCOMES
➢ Describe the physical structure of secondary storage devices and the effect of a device’s
structure on its uses
➢ Explain the performance characteristics of mass-storage devices
➢ Evaluate I/O scheduling algorithms
➢ Discuss operating-system services provided for mass storage, including RAID
OVERVIEW OF MASS STORAGE STRUCTURE
• Bulk of secondary storage for modern computers is hard disk drives
(HDDs) and nonvolatile memory (NVM) devices
• HDDs spin platters of magnetically-coated material under moving read-
write heads
• Drives rotate at 60 to 250 times per second
• Transfer rate is rate at which data flow between drive and computer
• Positioning time (random-access time) is time to move disk arm to
desired cylinder (seek time) and time for desired sector to rotate
under the disk head (rotational latency)
• Head crash results from disk head making contact with the disk
surface -- That’s bad
• Disks can be removable
HARD DISK DRIVES
• Platters range from .85” to 14” (historically)
• Commonly 3.5”, 2.5”, and 1.8”
• Range from 30GB to 3TB per drive
• Performance
• Transfer Rate – theoretical – 6 Gb/sec
• Effective Transfer Rate – real – 1Gb/sec
• Seek time from 3ms to 12ms – 9ms
common for desktop drives
• Average seek time measured or
calculated based on 1/3 of tracks
• Latency based on spindle speed
• 1 / (RPM / 60) = 60 / RPM
• Average latency = ½ latency
HARD DISK PERFORMANCE
• Access Latency = Average access time = average seek time + average
latency
• For fastest disk 3ms + 2ms = 5ms
• For slow disk 9ms + 5.56ms = 14.56ms
• Average I/O time = average access time + (amount to transfer / transfer
rate) + controller overhead
• For example to transfer a 4KB block on a 7200 RPM disk with a 5ms
average seek time, 1Gb/sec transfer rate with a .1ms controller overhead
=
• 5ms + 4.17ms + 0.1ms + transfer time =
• Transfer time = 4KB / 1Gb/s * 8Gb / GB * 1GB / 10242KB = 32 / (10242) = 0.031 ms
• Average I/O time for 4KB block = 9.27ms + .031ms = 9.301ms
THE FIRST COMMERCIAL DISK DRIVE
1956
IBM RAMDAC computer included the IBM
Model 350 disk storage system
5M (7 bit) characters
50 x 24” platters
Access time = < 1 second
NONVOLATILE MEMORY DEVICES
• If disk-drive like, then called solid-state disks (SSDs)
• Other forms include USB drives (thumb drive, flash drive), DRAM disk
replacements, surface-mounted on motherboards, and main storage in devices like
smartphones
• Can be more reliable than HDDs
• More expensive per MB
• Maybe have shorter life span – need careful management
• Less capacity
• But much faster
• Busses can be too slow -> connect directly to PCI for example
• No moving parts, so no seek time or rotational latency
NONVOLATILE MEMORY DEVICES
• Have characteristics that present
challenges
• Read and written in “page”
increments (think sector) but can’t
overwrite in place
• Must first be erased, and erases
happen in larger ”block” increments
• Can only be erased a limited number
of times before worn out – ~ 100,000
• Life span measured in drive writes per
day (DWPD)
• A 1TB NAND drive with rating of 5DWPD
is expected to have 5TB per day written
within warrantee period without failing
NAND FLASH CONTROLLER ALGORITHMS
• With no overwrite, pages end up with mix of valid and invalid data
• To track which logical blocks are valid, controller maintains flash translation
layer (FTL) table
• Also implements garbage collection to free invalid page space
• Allocates overprovisioning to provide working space for GC
• Each cell has lifespan, so wear leveling needed to write equally to all cells
• NAND block with valid and invalid pages
VOLATILE MEMORY
• DRAM frequently used as mass-storage device
• Not technically secondary storage because volatile, but can have file systems, be used
like very fast secondary storage
• RAM drives (with many names, including RAM disks) present as raw block
devices, commonly file system formatted
• Computers have buffering, caching via RAM, so why RAM drives?
• Caches / buffers allocated / managed by programmer, operating system, hardware
• RAM drives under user control
• Found in all major operating systems
• Linux /dev/ram, macOS diskutil to create them, Linux /tmp of file system type tmpfs
• Used as high speed temporary storage
• Programs could share bulk date, quickly, by reading/writing to RAM drive
MAGNETIC TAPE
DISK ATTACHMENT
• Host-attached storage accessed through I/O ports talking to I/O busses
• Several busses available, including advanced technology attachment (ATA), serial
ATA (SATA), eSATA, serial attached SCSI (SAS), universal serial bus (USB), and fibre
channel (FC).
• Most common is SATA
• Because NVM much faster than HDD, new fast interface for NVM called NVM
express (NVMe), connecting directly to PCI bus
• Data transfers on a bus carried out by special electronic processors called
controllers (or host-bus adapters, HBAs)
• Host controller on the computer end of the bus, device controller on device end
• Computer places command on host controller, using memory-mapped I/O ports
• Host controller sends messages to device controller
• Data transferred via DMA between device and computer DRAM
ADDRESS MAPPING
• Disk drives are addressed as large 1-dimensional arrays of
logical blocks, where the logical block is the smallest unit of
transfer
• Low-level formatting creates logical blocks on physical media
• The 1-dimensional array of logical blocks is mapped into the
sectors of the disk sequentially
• Sector 0 is the first sector of the first track on the outermost cylinder
• Mapping proceeds in order through that track, then the rest of the
tracks in that cylinder, and then through the rest of the cylinders from
outermost to innermost
• Logical to physical address should be easy
• Except for bad sectors
• Non-constant # of sectors per track via constant angular velocity
HDD SCHEDULING
• The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth
• Minimize seek time
• Seek time seek distance
• Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for service
and the completion of the last transfer
DISK SCHEDULING (CONT.)
• There are many sources of disk I/O request
• OS
• System processes
• Users processes
• I/O request includes input or output mode, disk address, memory address, number of
sectors to transfer
• OS maintains queue of requests, per disk or device
• Idle disk can immediately work on I/O request, busy disk means work must queue
• Optimization algorithms only make sense when a queue exists
• In the past, operating system responsible for queue management, disk drive head
scheduling
• Now, built into the storage devices, controllers
• Just provide LBAs, handle sorting of requests
• Some of the algorithms they use described next
DISK SCHEDULING (CONT.)
Note that drive controllers have small buffers and can
manage a queue of I/O requests (of varying “depth”)
Several algorithms exist to schedule the servicing of disk I/O
requests
The analysis is true for one or many platters
We illustrate scheduling algorithms with a request queue (0-
199)
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53
FCFS
Illustration shows total head movement of 640 cylinders
SCAN
• The disk arm starts at one end of the disk, and moves toward the
other end, servicing requests until it gets to the other end of the
disk, where the head movement is reversed and servicing
continues.
• SCAN algorithm Sometimes called the elevator algorithm
• Illustration shows total head movement of 208 cylinders
• But note that if requests are uniformly dense, largest density at
other end of disk and those wait the longest
SCAN (CONT.)
C-SCAN
• Provides a more uniform wait time than SCAN
• The head moves from one end of the disk to the other, servicing
requests as it goes
• When it reaches the other end, however, it immediately returns to the
beginning of the disk, without servicing any requests on the return trip
• Treats the cylinders as a circular list that wraps around from the last
cylinder to the first one
• Total number of cylinders?
C-SCAN (CONT.)
SELECTING A DISK-SCHEDULING ALGORITHM
• SSTF is common and has a natural appeal
• SCAN and C-SCAN perform better for systems that place a heavy load on the disk
• Less starvation, but still possible
• To avoid starvation Linux implements deadline scheduler
• Maintains separate read and write queues, gives read priority
• Because processes more likely to block on read than write
• Implements four queues: 2 x read and 2 x write
• 1 read and 1 write queue sorted in LBA order, essentially implementing C-SCAN
• 1 read and 1 write queue sorted in FCFS order
• All I/O requests sent in batch sorted in that queue’s order
• After each batch, checks if any requests in FCFS older than configured age (default
500ms)
• If so, LBA queue containing that request is selected for next batch of I/O
• In RHEL 7 also NOOP and completely fair queueing scheduler (CFQ) also
available, defaults vary by storage device
Required Reading
1. Chapter 8: Deadlocks (Operating System Concepts by Silberschatz,
Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3, 2018)
Recommended Reading
1. Chapter 6 (Modern Operating Systems by Andrew S. Tanenbaum and
Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-
359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
الجامعة السعودية االلكترونية
الجامعة السعودية االلكترونية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 10
Chapter 11: Mass-Storage Systems
CONTENTS
Storage Device Management
Swap-Space Management
Storage Attachment
RAID Structure
Weekly LEARNING OUTCOMES
Describe the physical structure of secondary storage devices and the effect of a device’s
structure on its uses
Explain the performance characteristics of mass-storage devices
Evaluate I/O scheduling algorithms
Discuss operating-system services provided for mass storage, including RAID
STORAGE DEVICE MANAGEMENT
• Low-level formatting, or physical formatting — Dividing a disk into
sectors that the disk controller can read and write
• Each sector can hold header information, plus data, plus error correction
code (ECC)
• Usually 512 bytes of data but can be selectable
• To use a disk to hold files, the operating system still needs to record
its own data structures on the disk
• Partition the disk into one or more groups of cylinders, each treated as a
logical disk
• Logical formatting or “making a file system”
• To increase efficiency most file systems group blocks into clusters
• Disk I/O done in blocks
• File I/O done in clusters
STORAGE DEVICE MANAGEMENT (CONT.)
Root partition contains the OS, other partitions can hold other Oses,
other file systems, or be raw
Mounted at boot time
Other partitions can mount automatically or manually
At mount time, file system consistency checked
Is all metadata correct?
If not, fix it, try again
If yes, add to mount table, allow access
Boot block can point to boot volume or boot loader set of blocks that
contain enough code to know how to load the kernel from the file
system
Or a boot management program for multi-os booting
DEVICE STORAGE MANAGEMENT (CONT.)
• Raw disk access for apps that
want to do their own block
management, keep OS out of
the way (databases for example)
• Boot block initializes system
• The bootstrap is stored in ROM,
firmware
• Bootstrap loader program stored Booting from secondary storage
in boot blocks of boot partition
in Windows
• Methods such as sector sparing
used to handle bad blocks
STORAGE ATTACHMENT
• Computers access storage in three ways
• host-attached
• network-attached
• cloud
• Host attached access through local I/O ports, using one of
several technologies
• To attach many devices, use storage busses such as USB, firewire,
thunderbolt
• High-end systems use fibre channel (FC)
• High-speed serial architecture using fibre or copper cables
• Multiple hosts and storage devices can connect to the FC fabric
NETWORK-ATTACHED STORAGE
• Network-attached storage (NAS) is storage made available over a network rather
than over a local connection (such as a bus)
• Remotely attaching to file systems
• NFS and CIFS are common protocols
• Implemented via remote procedure calls (RPCs) between host and storage over
typically TCP or UDP on IP network
• iSCSI protocol uses IP network to carry the SCSI protocol
• Remotely attaching to devices (blocks)
CLOUD STORAGE
• Similar to NAS, provides access to storage across a network
• Unlike NAS, accessed over the Internet or a WAN to remote data
center
• NAS presented as just another file system, while cloud
storage is API based, with programs using the APIs to
provide access
• Examples include Dropbox, Amazon S3, Microsoft OneDrive,
Apple iCloud
• Use APIs because of latency and failure scenarios (NAS protocols
wouldn’t work well)
STORAGE ARRAY
• Can just attach disks, or arrays of disks
• Avoids the NAS drawback of using network bandwidth
• Storage Array has controller(s), provides features to attached host(s)
• Ports to connect hosts to array
• Memory, controlling software (sometimes NVRAM, etc)
• A few to thousands of disks
• RAID, hot spares, hot swap (discussed later)
• Shared storage -> more efficiency
• Features found in some file systems
• Snaphots, clones, thin provisioning, replication, deduplication, etc
STORAGE AREA NETWORK
• Common in large storage environments
• Multiple hosts attached to multiple storage arrays – flexible
STORAGE AREA NETWORK (CONT.)
• SAN is one or more storage arrays
• Connected to one or more Fibre Channel switches or
InfiniBand (IB) network
• Hosts also attach to the switches
• Storage made available via LUN Masking from specific
arrays to specific servers
• Easy to add or remove storage, add new host and
allocate it storage A Storage Array
• Why have separate storage networks and
communications networks?
• Consider iSCSI, FCOE
RAID STRUCTURE
• RAID – redundant array of inexpensive disks
• multiple disk drives provides reliability via redundancy
• Increases the mean time to failure
• Mean time to repair – exposure time when another failure could cause data
loss
• Mean time to data loss based on above factors
• If mirrored disks fail independently, consider disk with 1300,000 mean time
to failure and 10 hour mean time to repair
• Mean time to data loss is 100, 0002 / (2 ∗ 10) = 500 ∗ 106 hours, or
57,000 years!
• Frequently combined with NVRAM to improve write performance
• Several improvements in disk-use techniques involve the use of multiple
disks working cooperatively
RAID (CONT.)
• Disk striping uses a group of disks as one storage unit
• RAID is arranged into six different levels
• RAID schemes improve performance and improve the reliability of the
storage system by storing redundant data
• Mirroring or shadowing (RAID 1) keeps duplicate of each disk
• Striped mirrors (RAID 1+0) or mirrored stripes (RAID 0+1) provides high
performance and high reliability
• Block interleaved parity (RAID 4, 5, 6) uses much less redundancy
• RAID within a storage array can still fail if the array fails, so automatic
replication of the data between arrays is common
• Frequently, a small number of hot-spare disks are left unallocated,
automatically replacing a failed disk and having data rebuilt onto them
RAID LEVELS
RAID (0 + 1) AND (1 + 0)
OTHER FEATURES
• Regardless of where RAID implemented, other useful features can be
added
• Snapshot is a view of file system before a set of changes take place
(i.e. at a point in time)
• More in Ch 12
• Replication is automatic duplication of writes between separate sites
• For redundancy and disaster recovery
• Can be synchronous or asynchronous
• Hot spare disk is unused, automatically used by RAID production if a
disk fails to replace the failed disk and rebuild the RAID set if possible
• Decreases mean time to repair
EXTENSIONS
• RAID alone does not prevent or detect data
corruption or other errors, just disk failures
• Solaris ZFS adds checksums of all data and
metadata
• Checksums kept with pointer to object, to
detect if object is the right one and whether
it changed
• Can detect and correct data and metadata
corruption
ZFS checksums all metadata
• ZFS also removes volumes, partitions and data
• Disks allocated in pools
• Filesystems with a pool share that pool, use and
release space like malloc() and free()
memory allocate / release calls
TRADITIONAL AND POOLED STORAGE
OBJECT STORAGE
• General-purpose computing, file systems not sufficient for very large
scale
• Another approach – start with a storage pool and place objects in it
• Object just a container of data
• No way to navigate the pool to find objects (no directory
structures, few services
• Computer-oriented, not user-oriented
• Typical sequence
• Create an object within the pool, receive an object ID
• Access object via that ID
• Delete object via that ID
OBJECT STORAGE (CONT.)
• Object storage management software like Hadoop file system
(HDFS) and Ceph determine where to store objects, manages
protection
• Typically by storing N copies, across N systems, in the
object storage cluster
• Horizontally scalable
• Content addressable, unstructured
Required Reading
1. Chapter 8: Deadlocks (Operating System Concepts by
Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-
32091-3, 2018)
Recommended Reading
1. Chapter 6 (Modern Operating Systems by Andrew S. Tanenbaum
and Herbert Bos. 4th ed., ISBN-10: 0-13-359162-X, ISBN-13:
978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
الجامعة السعودية االلكترونية
الجامعة السعودية االلكترونية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 11
Chapter 12: I/O Systems
CONTENTS
I/O Hardware
Application I/O Interface
Kernel I/O Subsystem
Transforming I/O Requests to Hardware Operations
WEEKLY LEARNING OUTCOMES
Explore the structure of an operating system’s I/O
subsystem
Discuss the principles and complexities of I/O hardware
Explain the performance aspects of I/O hardware and
software
INTRODUCTION
• I/O management is a major component of operating system design
and operation
• Important aspect of computer operation
• I/O devices vary greatly
• Various methods to control them
• Performance management
• New types of devices frequent
• Ports, busses, device controllers connect to various devices
• Device drivers encapsulate device details
• Present uniform device-access interface to I/O subsystem
I/O HARDWARE
• Incredible variety of I/O devices
• Storage
• Transmission
• Human-interface
• Common concepts – signals from I/O devices interface with
computer
• Port – connection point for device
• Bus - daisy chain or shared direct access
• PCI bus common in PCs and servers, PCI Express (PCIe)
• expansion bus connects relatively slow devices
• Serial-attached SCSI (SAS) common disk interface
I/O HARDWARE
• Controller (host adapter) – electronics that operate port, bus,
device
• Sometimes integrated
• Sometimes separate circuit board (host adapter)
• Contains processor, microcode, private memory, bus
controller, etc.
• Some talk to per-device controller with bus controller,
microcode, memory, etc.
A TYPICAL BUS STRUCTURE
I/O HARDWARE
• Fibre channel (FC) is complex controller, usually separate circuit
board (host-bus adapter, HBA) plugging into bus
• I/O instructions control devices
• Devices usually have registers where device driver places
commands, addresses, and data to write, or read data from registers
after command execution
• Data-in register, data-out register, status register, control
register
• Typically 1-4 bytes, or FIFO buffer
I/O HARDWARE
• The data-in register is read by the host to get input.
• The data-out register is written by the host to send output.
• The status register contains bits that can be read by the host. These
bits indicate states, such as whether the current command has
completed, whether a byte is available to be read from the data-in
register, and whether a device error has occurred
• The control register can be written by the host to start a command or
to change the mode of a device.
I/O HARDWARE
• Devices have addresses, used by
• Direct I/O instructions
• Memory-mapped I/O
• Device data and command registers mapped to processor
address space
• Especially for large address spaces (graphics)
I/O HARDWARE
Device I/O Port Locations on PCs (partial)
POLLING
• For each byte of I/O
1. Read busy bit from status register until 0
2. Host sets read or write bit and if write copies data into data-out register
3. Host sets command-ready bit
4. Controller sets busy bit, executes transfer
5. Controller clears busy bit, error bit, command-ready bit when transfer
done
• Step 1 is busy-wait cycle to wait for I/O from device
• Reasonable if device is fast
• But inefficient if device slow
• CPU switches to other tasks?
• But if miss a cycle data overwritten / lost
POLLING
• 1. The host repeatedly reads the busy bit until that bit becomes clear.
2. The host sets the write bit in the command register and writes a byte
into the data-out register.
3. The host sets the command-ready bit.
4. When the controller notices that the command-ready bit is set, it sets
the busy bit.
5. The controller reads the command register and sees the write
command. It reads the data-out register to get the byte and does the I/O
to the device.
6. The controller clears the command-ready bit, clears the error bit in the
status register to indicate that the device I/O succeeded, and clears the
busy bit to indicate that it is finished
INTERRUPTS
• Polling can happen in 3 instruction cycles
• Read status, logical-and to extract status bit, branch if not zero
• How to be more efficient if non-zero infrequently?
• CPU Interrupt-request line triggered by I/O device
• Checked by processor after each instruction
• Interrupt handler receives interrupts
• Maskable to ignore or delay some interrupts
• Interrupt vector to dispatch interrupt to correct handler
• Context switch at start and end
• Based on priority
• Some nonmaskable
• Interrupt chaining if more than one device at same interrupt number
INTERRUPT DRIVEN I/O CYCLE
INTERRUPTS
• Interrupt mechanism also used for exceptions
• Terminate process, crash system due to hardware error
• Page fault executes when memory access error
• System call executes via trap to trigger kernel to execute request
• Multi-CPU systems can process interrupts concurrently
• If operating system designed to handle it
• Used for time-sensitive processing, frequent, must be fast
INTERRUPTS HANDLING FEATURES
• 1. We need the ability to defer interrupt handling during critical
processing.
2. We need an efficient way to dispatch to the proper interrupt
handler for a device without first polling all the devices to see which
one raised the interrupt.
3. We need multilevel interrupts, so that the operating system can
distinguish between high- and low-priority interrupts and can
respond with the appropriate degree of urgency when there are
multiple concurrent interrupts.
4. We need a way for an instruction to get the operating system’s
attention directly (separately from I/O requests), for activities such
as page faults and errors such as division by zero. As we shall see,
this task is accomplished by “traps.”
INTERRUPTS
Intel Pentium Processor Event-Vector Table
DIRECT MEMORY ACCESS
• Used to avoid programmed I/O (one byte at a time) for large data movement
• Requires DMA controller
• Bypasses CPU to transfer data directly between I/O device and memory
• OS writes DMA command block into memory
• Source and destination addresses
• Read or write mode
• Count of bytes
• Writes location of command block to DMA controller
• Bus mastering of DMA controller – grabs bus from CPU
• Cycle stealing from CPU but still much more efficient
• When done, interrupts to signal completion
• Version that is aware of virtual addresses can be even more efficient - DVMA
DIRECT MEMORY ACCESS STEPS
APPLICATION I/O INTERFACE
• I/O system calls encapsulate device behaviors in generic classes
• Device-driver layer hides differences among I/O controllers from kernel
• New devices talking already-implemented protocols need no extra work
• Each OS has its own I/O subsystem structures and device driver frameworks
• Devices vary in many dimensions
• Character-stream or block
• Sequential or random-access
• Synchronous or asynchronous (or both)
• Sharable or dedicated
• Speed of operation
• read-write, read only, or write only
APPLICATION I/O INTERFACE
• Character-stream or block. A character-stream device transfers bytes one by
one, whereas a block device transfers a block of bytes as a unit.
• Sequential or random access. A sequential device transfers data in a fixed order
determined by the device, whereas the user of a random-access device can instruct
the device to seek to any of the available data storage locations.
• Synchronous or asynchronous. A synchronous device performs data transfers
with predictable response times, in coordination with other aspects of the system.
An asynchronous device exhibits irregular or unpredictable response times not
coordinated with other computer events.
APPLICATION I/O INTERFACE
• Sharable or dedicated. A sharable device can be used concurrently by several
processes or threads; a dedicated device cannot.
• Speed of operation. Device speeds range from a few bytes per second to
gigabytes per second.
• Read–write, read only, write once. Some devices perform both input and
output, but others support only one data transfer direction. Some allow data to be
modified after write, but others can be written only once and are read-only
thereafter
KERNEL I/O STRUCTURE
CHARACTERISTICS OF I/O DEVICES
CHARACTERISTICS OF I/O DEVICES
• Subtleties of devices handled by device drivers
• Broadly I/O devices can be grouped by the OS into
• Block I/O
• Character I/O (Stream)
• Memory-mapped file access
• Network sockets
• For direct manipulation of I/O device specific characteristics, usually an escape / back door
• Unix ioctl() call to send arbitrary bits to a device control register and data to device
data register
• UNIX and Linux use tuple of “major” and “minor” device numbers to identify type and instance
of devices (here major 8 and minors 0-4)
% ls –l /dev/sda*
BLOCK AND CHARACTER DEVICES
• Block devices include disk drives
• Commands include read, write, seek
• Raw I/O, direct I/O, or file-system access
• Memory-mapped file access possible
• File mapped to virtual memory and clusters brought via
demand paging
• DMA
• Character devices include keyboards, mice, serial ports
• Commands include get(), put()
• Libraries layered on top allow line editing
NETWORK DEVICES
• Varying enough from block and character to have own interface
• Linux, Unix, Windows and many others include socket interface
• Separates network protocol from network operation
• Includes select() functionality
• Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes)
CLOCKS AND TIMERS
• Provide current time, elapsed time, timer
• Normal resolution about 1/60 second
• Some systems provide higher-resolution timers
• Programmable interval timer used for timings, periodic
interrupts
• ioctl() (on UNIX) covers odd aspects of I/O such as clocks and timers
NON-BLOCKING AND ASYNCHRONOUS I/O
• Blocking - process suspended until I/O completed
• Easy to use and understand
• Insufficient for some needs
• Nonblocking - I/O call returns as much as available
• User interface, data copy (buffered I/O)
• Implemented via multi-threading
• Returns quickly with count of bytes read or written
• select() to find if data ready then read() or write() to transfer
• Asynchronous - process runs while I/O executes
• Difficult to use
• I/O subsystem signals process when I/O completed
I/O METHODS
Synchronous Asynchronous
VECTORED I/O
• Vectored I/O allows one system call to perform multiple I/O
operations
• For example, Unix readve() accepts a vector of multiple buffers to
read into or write from
• This scatter-gather method better than multiple individual I/O calls
• Decreases context switching and system call overhead
• Some versions provide atomicity
• Avoid for example worry about multiple threads changing
data as reads / writes occurring
KERNEL I/O SUBSYSTEM
• Scheduling
• Some I/O request ordering via per-device queue
• Some OSs try fairness
• Some implement Quality Of Service (i.e. IPQOS)
• Buffering - store data in memory while transferring between devices
• To cope with device speed mismatch
• To cope with device transfer size mismatch
• To maintain “copy semantics”
• Double buffering – two copies of the data
• Kernel and user
• Varying sizes
• Full / being processed and not-full / being used
• Copy-on-write can be used for efficiency in some cases
DEVICE STATUS TABLE
I/O DEVICE AND INTERFACE SPEEDS
KERNEL I/O SUBSYSTEM
• Caching - faster device holding copy of data
• Always just a copy
• Key to performance
• Sometimes combined with buffering
• Spooling - hold output for a device
• If device can serve only one request at a time
• i.e., Printing
• Device reservation - provides exclusive access to a device
• System calls for allocation and de-allocation
• Watch out for deadlock
ERROR HANDLING
• OS can recover from disk read, device unavailable, transient write
failures
• Retry a read or write, for example
• Some systems more advanced – Solaris FMA, AIX
• Track error frequencies, stop using device with increasing
frequency of retry-able errors
• Most return an error number or code when I/O request fails
• System error logs hold problem reports
I/O PROTECTION
• User process may accidentally or purposefully attempt to disrupt
normal operation via illegal I/O instructions
• All I/O instructions defined to be privileged
• I/O must be performed via system calls
• Memory-mapped and I/O port memory locations must be
protected too
SYSTEM CALL
POWER MANAGEMENT
• Not strictly domain of I/O, but much is I/O related
• Computers and devices use electricity, generate heat, frequently
require cooling
• OSes can help manage and improve use
• Cloud computing environments move virtual machines between
servers
• Can end up evacuating whole systems and shutting them
down
• Mobile computing has power management as first class OS aspect
TRANSFORMING I/O REQUESTS TO HARDWARE
OPERATIONS
• Consider reading a file from disk for a process:
• Determine device holding file
• Translate name to device representation
• Physically read data from disk into buffer
• Make data available to requesting process
• Return control to process
LIFE CYCLE OF AN I/O REQUEST
Main Reference
Chapter 12: I/O Systems
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 5.1
Chapter 5.2
Chapter 5.3
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
الجامعة السعودية االلكترونية
الجامعة السعودية االلكترونية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 12
Chapter 13: File System
Chapter 14: File-System Implementation
CONTENTS
File Concept
Access Methods
Disk and Directory Structure
Protection
File-System Structure
File-System Operations
Directory Implementation
Allocation Methods
Free-Space Management
Weekly LEARNING OUTCOMES
Explain the function of file systems
Describe the interfaces to file systems
Evaluate file-system design tradeoffs, including access methods, file sharing, file
locking, and directory structures
Define file-system protection methods
Describe the details of implementing local file systems and directory structures
Evaluate block allocation and free-block algorithms and trade-offs
Explore file system efficiency and performance issues
FILE CONCEPT
• Contiguous logical address space
• Types:
• Data
• Numeric
• Character
• Binary
• Program
• Contents defined by file’s creator
• Many types
• text file,
• source file,
• executable file
FILE ATTRIBUTES
• Name – only information kept in human-readable form
• Identifier – unique tag (number) identifies file within file system
• Type – needed for systems that support different types
• Location – pointer to file location on device
• Size – current file size
• Protection – controls who can do reading, writing, executing
• Time, date, and user identification – data for protection, security, and usage
monitoring
• Information about files are kept in the directory structure, which is maintained
on the disk
• Many variations, including extended file attributes such as file checksum
• Information kept in the directory structure
FILE INFO WINDOW ON MAC OS X
FILE OPERATIONS
• Create
• Write – at write pointer location
• Read – at read pointer location
• Reposition within file - seek
• Delete
• Truncate
• Open (Fi) – search the directory structure on disk for entry Fi, and move
the content of entry to memory
• Close (Fi) – move the content of entry Fi in memory to directory
structure on disk
FILE LOCKING
• Provided by some operating systems and file systems
• Similar to reader-writer locks
• Shared lock similar to reader lock – several processes can acquire
concurrently
• Exclusive lock similar to writer lock
• Mediates access to a file
• Mandatory or advisory:
• Mandatory – access is denied depending on locks held and requested
• Advisory – processes can find status of locks and decide what to do
FILE TYPES – NAME, EXTENSION
ACCESS METHODS
A file is fixed length logical records
Sequential Access - Operations
• read next
• write next
• Reset
• no read after last write (rewrite)
Direct Access - Operations
• read n
• write n
• position to n
• read next
• write next
• rewrite n note: n = relative block number
• Relative block numbers allow OS to decide where file should be placed
SIMULATION OF SEQUENTIAL ACCESS ON
DIRECT-ACCESS FILE
OTHER ACCESS METHODS
• Can be other access methods built on top of base methods
• General involve creation of an index for the file
• Keep index in memory for fast determination of location of data to be operated on
(consider Universal Produce Code (UPC code) plus record of data about that item)
• If the index is too large, create an in-memory index, which an index of a disk index
• IBM indexed sequential-access method (ISAM)
• Small master index, points to disk blocks of secondary index
• File kept sorted on a defined key
• All done by the OS
• VMS operating system provides index and relative files as another example (see
next slide)
EXAMPLE OF INDEX AND RELATIVE FILES
DISK STRUCTURE
• Disk can be subdivided into partitions
• Disks or partitions can be RAID protected against failure
• Disk or partition can be used raw – without a file system, or formatted
with a file system
• Partitions also known as minidisks, slices
• Entity containing file system is known as a volume
• Each volume containing a file system also tracks that file system’s info in
device directory or volume table of contents
• In addition to general-purpose file systems there are many special-
purpose file systems, frequently all within the same operating system
or computer
A TYPICAL FILE-SYSTEM ORGANIZATION
TYPES OF FILE SYSTEMS
• We mostly talk of general-purpose file systems
• But systems frequently have may file systems, some general- and some
special- purpose
• Consider Solaris has
• tmpfs – memory-based volatile FS for fast, temporary I/O
• objfs – interface into kernel memory to get kernel symbols for debugging
• ctfs – contract file system for managing daemons
• lofs – loopback file system allows one FS to be accessed in place of another
• procfs – kernel interface to process structures
• ufs, zfs – general purpose file systems
DIRECTORY STRUCTURE
• A collection of nodes containing information about all files
• Both the directory structure and the files reside on disk
OPERATIONS PERFORMED ON DIRECTORY
• Search for a file
• Create a file
• Delete a file
• List a directory
• Rename a file
• Traverse the file system
DIRECTORY ORGANIZATION
The directory is organized logically to obtain
• Efficiency – locating a file quickly
• Naming – convenient to users
• Two users can have same name for different files
• The same file can have several different names
• Grouping – logical grouping of files by properties, (e.g., all
Java programs, all games, …)
SINGLE-LEVEL DIRECTORY
• A single directory for all users
• Naming problem
• Grouping problem
TWO-LEVEL DIRECTORY
• Separate directory for each user
• Path name
• Can have the same file name for different user
• Efficient searching
• No grouping capability
TREE-STRUCTURED DIRECTORIES
ACYCLIC-GRAPH DIRECTORIES
• Have shared subdirectories and files
• Example
ACYCLIC-GRAPH DIRECTORIES (CONT.)
• Two different names (aliasing)
• If dict deletes w/list dangling pointer
Solutions:
• Backpointers, so we can delete all pointers.
• Variable size records a problem
• Backpointers using a daisy chain organization
• Entry-hold-count solution
• New directory entry type
• Link – another name (pointer) to an existing file
• Resolve the link – follow pointer to locate the file
GENERAL GRAPH DIRECTORY
GENERAL GRAPH DIRECTORY (CONT.)
• How do we guarantee no cycles?
• Allow only links to files not subdirectories
• Garbage collection
• Every time a new link is added use a cycle detection
algorithm to determine whether it is OK
CURRENT DIRECTORY
• Can designate one of the directories as the current (working) directory
• cd /spell/mail/prog
• type list
• Creating and deleting a file is done in current directory
• Example of creating a new file
• If in current directory is /mail
• The command
• mkdir <dir-name>
• Results in:
• Deleting “mail” deleting the entire subtree rooted by “mail”
PROTECTION
• File owner/creator should be able to control:
• What can be done
• By whom
• Types of access
• Read
• Write
• Execute
• Append
• Delete
• List
ACCESS LISTS AND GROUPS IN UNIX
• Mode of access: read, write, execute
• Three classes of users on Unix / Linux
RWX
a) owner access 7 111
RWX
b) group access 6 110
RWX
c) public access 1 001
• Ask manager to create a group (unique name), say G, and add some users to the
group.
• For a file (say game) or subdirectory, define an appropriate access.
• Attach a group to a file
chgrp G game
A SAMPLE UNIX DIRECTORY LISTING
WINDOWS 7 ACCESS-CONTROL LIST MANAGEMENT
FILE-SYSTEM STRUCTURE
• File structure
• Logical storage unit
• Collection of related information
• File system resides on secondary storage (disks)
• Provided user interface to storage, mapping logical to physical
• Provides efficient and convenient access to disk by allowing data to be stored,
located retrieved easily
• Disk provides in-place rewrite and random access
• I/O transfers performed in blocks of sectors (usually 512 bytes)
• File control block (FCB) – storage structure consisting of information
about a file
• Device driver controls the physical device
• File system organized into layers
LAYERED FILE SYSTEM
FILE SYSTEM LAYERS
• Device drivers manage I/O devices at the I/O control layer
Given commands like
read drive1, cylinder 72, track 2, sector 10, into memory location 1060
Outputs low-level hardware specific commands to hardware controller
• Basic file system given command like “retrieve block 123” translates to device
driver
• Also manages memory buffers and caches (allocation, freeing, replacement)
• Buffers hold data in transit
• Caches hold frequently used data
• File organization module understands files, logical address, and physical blocks
Translates logical block # to physical block #
Manages free space, disk allocation
FILE SYSTEM LAYERS (CONT.)
• Logical file system manages metadata information
• Translates file name into file number, file handle, location by
maintaining file control blocks (inodes in UNIX)
• Directory management
• Protection
• Layering useful for reducing complexity and redundancy, but
adds overhead and can decrease performance
• Logical layers can be implemented by any coding method
according to OS designer
FILE SYSTEM LAYERS (CONT.)
• Many file systems, sometimes many within an operating system
• Each with its own format:
• CD-ROM is ISO 9660;
• Unix has UFS, FFS;
• Windows has FAT, FAT32, NTFS as well as floppy, CD, DVD Blu-ray,
• Linux has more than 130 types, with extended file system ext3 and ext4
leading; plus distributed file systems, etc.)
• New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE
FILE-SYSTEM OPERATIONS
• We have system calls at the API level, but how do we implement their
functions?
• On-disk and in-memory structures
• Boot control block contains info needed by system to boot OS from
that volume
• Needed if volume contains OS, usually first block of volume
• Volume control block (superblock, master file table) contains volume
details
• Total # of blocks, # of free blocks, block size, free block pointers or array
• Directory structure organizes the files
• Names and inode numbers, master file table
FILE CONTROL BLOCK (FCB)
• OS maintains FCB per file, which contains many details about the file
• Typically, inode number, permissions, size, dates
• Example
IN-MEMORY FILE SYSTEM STRUCTURES
• Mount table storing file system mounts, mount points, file
system types
• System-wide open-file table contains a copy of the FCB of
each file and other info
• Per-process open-file table contains pointers to appropriate
entries in system-wide open-file table as well as other info
IN-MEMORY FILE SYSTEM STRUCTURES (CONT.)
• Figure 12-3(a) refers to opening a file
• Figure 12-3(b) refers to reading a file
DIRECTORY IMPLEMENTATION
• Linear list of file names with pointer to the data blocks
• Simple to program
• Time-consuming to execute
• Linear search time
• Could keep ordered alphabetically via linked list or use B+ tree
• Hash Table – linear list with hash data structure
• Decreases directory search time
• Collisions – situations where two file names hash to the same
location
• Only good if entries are fixed size, or use chained-overflow
method
ALLOCATION METHOD
• An allocation method refers to how disk blocks are allocated for files:
• Contiguous
• Linked
• File Allocation Table (FAT)
CONTIGUOUS ALLOCATION METHOD
• An allocation method refers to how disk blocks are allocated
for files:
• Each file occupies set of contiguous blocks
• Best performance in most cases
• Simple – only starting location (block #) and length (number of
blocks) are required
• Problems include:
• Finding space on the disk for a file,
• Knowing file size,
• External fragmentation, need for compaction off-line (downtime) or on-line
CONTIGUOUS ALLOCATION (CONT.)
Mapping from logical to physical (block size
=512 bytes)
LA/512
Block to be accessed = starting address + Q
Displacement into block = R
EXTENT-BASED SYSTEMS
• Many newer file systems (i.e., Veritas File System) use a
modified contiguous allocation scheme
• Extent-based file systems allocate disk blocks in extents
• An extent is a contiguous block of disks
• Extents are allocated for file allocation
• A file consists of one or more extents
LINKED ALLOCATION
• Each file is a linked list of blocks
• File ends at nil pointer
• No external fragmentation
• Each block contains pointer to next block
• No compaction, external fragmentation
• Free space management system called when new block needed
• Improve efficiency by clustering blocks into groups but increases
internal fragmentation
• Reliability can be a problem
• Locating a block can take many I/Os and disk seeks
LINKED ALLOCATION EXAMPLE
• Each file is a linked list of disk blocks: blocks may be scattered
anywhere on the disk
• Scheme
LINKED ALLOCATION (CONT.)
• Mapping
Q
LA/511
R
• Block to be accessed is the Qth block in the linked chain of
blocks representing the file.
• Displacement into block = R + 1
FAT ALLOCATION METHOD
• Beginning of volume has table, indexed by block number
• Much like a linked list, but faster on disk and cacheable
• New block allocation simple
INDEXED ALLOCATION METHOD
• Each file has its own index block(s) of pointers to its data blocks
• Logical view
index table
EXAMPLE OF INDEXED ALLOCATION
INDEXED ALLOCATION – SMALL FILES
• Need index table
• Random access
• Dynamic access without external fragmentation, but have overhead of index
block
• Mapping from logical to physical in a file of maximum size of 256K bytes and
block size of 512 bytes. We need only 1 block for index table
Q
LA/512
• Calculation:
• Q = displacement into index table
• R = displacement into block
INDEXED ALLOCATION – LARGE FILES
• Mapping from logical to physical in a file of unbounded length (block
size of 512 words)
• Linked scheme – Link blocks of index table (no limit on size)
• Multi-level indexing
INDEXED ALLOCATION – LINKED SCHEME
• Link blocks of index table (no limit on size)
Q1
LA / (512 x 511)
R1
• Outer-level mapping scheme
• Q1 = block of index table
• R1 is used as follows
Q2
R1 / 512
R2
• Inner-level mapping scheme
• Q2 = displacement into block of index table
• R2 displacement into block of file
INDEXED ALLOCATION – TWO-LEVEL SCHEME
• Two-level index (4K blocks could store 1,024 four-byte pointers in outer
index -> 1,048,567 data blocks and file size of up to 4GB)
Q1
LA / (512 x 512)
R1
• Mapping scheme for outer-index:
• Q1 = displacement into outer-index
• R1 is used as follows:
Q2
R1 / 512
R2
• Mapping scheme for index level:
• Q2 = displacement into block of index table
• R2 displacement into block of file
INDEXED ALLOCATION – TWO-LEVEL SCHEME
COMBINED SCHEME : UNIX UFS
• 4K bytes per block, 32-bit addresses
• More index blocks than can be addressed with 32-bit file pointer
PERFORMANCE
• Best method depends on file access type
• Contiguous great for sequential and random
• Linked good for sequential, not random
• Declare access type at creation
• Select either contiguous or linked
• Indexed more complex
• Single block access could require 2 index block reads then data block read
• Clustering can help improve throughput, reduce CPU overhead
• For NVM, no disk head so different algorithms and optimizations needed
• Using old algorithm uses many CPU cycles trying to avoid non-existent head movement
• Goal is to reduce CPU cycles and overall path needed for I/O
PERFORMANCE (CONT.)
• Adding instructions to the execution path to save one disk
I/O is reasonable
• Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz = 159,000
MIPS
• [Link]
• Typical disk drive at 250 I/Os per second
• 159,000 MIPS / 250 = 630 million instructions during one disk I/O
• Fast SSD drives provide 60,000 IOPS
• 159,000 MIPS / 60,000 = 2.65 millions instructions during one disk I/O
FREE-SPACE MANAGEMENT
• File system maintains free-space list to track available blocks/clusters
• (Using term “block” for simplicity)
• Bit vector or bit map (n blocks)
01 2 n-1
…
1 block[i] free
bit[i] =
0 block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word
of first “1” bit
FREE-SPACE MANAGEMENT
File system maintains free-space list to track available blocks
Bit vector or bit map (n blocks)
01 2 n-1
…
1 block[i] free
bit[i] =
0 block[i] occupied
Bit map requires extra space
Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of memory
Easy to get contiguous files
LINKED FREE SPACE LIST ON DISK
Linked list (free list)
• Cannot get contiguous space
easily
• No waste. Linked Free Space
List on Disk of space
• No need to traverse the entire
list (if # free blocks recorded)
FREE-SPACE MANAGEMENT (CONT.)
Grouping
Modify linked list to store address of next n-1 free blocks
in first free block, plus a pointer to next block that contains
free-block-pointers (like this one)
Counting
Because space is frequently contiguously used and freed,
with contiguous-allocation allocation, extents, or clustering
Keep address of first free block and count of following
free blocks
Free space list then has entries containing addresses
and counts
FREE-SPACE MANAGEMENT (CONT.)
Space Maps
Used in ZFS
Consider meta-data I/O on very large file systems
Full data structures like bit maps cannot fit in memory thousands of I/Os
Divides device space into metaslab units and manages metaslabs
Given volume can contain hundreds of metaslabs
Each metaslab has associated space map
Uses counting algorithm
But records to log file rather than file system
Log of all block activity, in time order, in counting format
Metaslab activity load space map into memory in balanced-tree structure, indexed by
offset
Replay log into that structure
Combine contiguous free blocks into single entry
Required Reading
1. Chapter 13: FILE SYSTEM (Operating System Concepts by
Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3, 2018)
2. Chapter 14: File-System Implementation (Operating System Concepts
by Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3,
2018)
Recommended Reading
1. Chapter 4.1 (Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed., ISBN-
10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
2. Chapter 4.3 (Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed., ISBN-
10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You
الجامعة السعودية االلكترونية
الجامعة السعودية االلكترونية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
IT241
Operating Systems
Week 13
Chapter 16: Security
Chapter 17: Protection
CONTENTS
Program Threats
System and Network Threats
An Example: Windows 10
Goals of Protection
Principles of Protection
Domain of Protection
Access Matrix
Implementation of the Access Matrix
WEEKLY LEARNING OUTCOMES
Discuss security threats and attacks
Discuss the goals and principles of protection in a modern
computer system
Explain how protection domains combined with an access
matrix are used to specify the resources a process may
access
INTRODUCTION
• System secure if resources used and accessed as intended under all
circumstances
• Unachievable
• Intruders (crackers) attempt to breach security
• Threat is potential security violation
• Attack is attempt to breach security
• Attack can be accidental or malicious
• Easier to protect against accidental than malicious misuse
SECURITY VIOLATION CATEGORIES
• Breach of confidentiality
• Unauthorized reading of data
• Breach of integrity
• Unauthorized modification of data
• Breach of availability
• Unauthorized destruction of data
• Theft of service
• Unauthorized use of resources
• Denial of service (DOS)
• Prevention of legitimate use
SECURITY VIOLATION METHODS
• Masquerading (breach authentication)
• Pretending to be an authorized user to escalate privileges
• Replay attack
• As is or with message modification
• Man-in-the-middle attack
• Intruder sits in data flow, masquerading as sender to receiver and vice versa
• Session hijacking
• Intercept an already-established session to bypass authentication
• Privilege escalation
• Common attack type with access beyond what a user or resource is
supposed to have
STANDARD SECURITY ATTACK
SECURITY MEASURE LEVELS
• Impossible to have absolute security, but make cost to perpetrator sufficiently high to deter
most intruders
• Security must occur at four levels to be effective:
• Physical
• Data centers, servers, connected terminals
• Application
• Benign or malicious apps can cause security problems
• Operating System
• Protection mechanisms, debugging
• Network
• Intercepted communications, interruption, DOS
• Security is as weak as the weakest link in the chain
• Humans a risk too via phishing and social-engineering attacks
• But can too much security be a problem?
FOUR LAYERED MODEL OF SECURITY
PROGRAM THREATS
• Malware - Software designed to exploit, disable, or damage
computer
• Trojan Horse – Program that acts in a clandestine manner
• Spyware – Program frequently installed with legitimate software
to display adds, capture user data
• Ransomware – locks up data via encryption, demanding
payment to unlock it
• Others include trap doors, logic boms
• All try to violate the Principle of Least Privilege
• Goal frequently is to leave behind Remote Access Tool (RAT) for
repeated access
CODE INJECTION
Code-injection attack occurs when system code is not malicious but
has bugs allowing executable code to be added or modified
• Results from poor or insecure programming paradigms,
commonly in low level languages like C or C++ which allow for
direct memory access through pointers
• Goal is a buffer overflow in which code is placed in a buffer and
execution caused by the attack
• Can be run by script kiddies – use tools written but exploit
identifiers
PROGRAM THREATS
• Viruses
• Code fragment embedded in legitimate program
• Self-replicating, designed to infect other computers
• Very specific to CPU architecture, operating system, applications
• Usually borne via email or as a macro
• Visual Basic Macro to reformat hard drive
Sub AutoOpen()
Dim oFS
Set oFS = CreateObject(’’[Link]’’)
vs = Shell(’’c:[Link] /k format c:’’,vbHide)
End Sub
VIRUS CATEGORIES
• Many categories of viruses, literally many thousands of viruses
• File / parasitic
• Boot / memory
• Macro
• Source code
• Polymorphic to avoid having a virus signature
• Encrypted
• Stealth
• Tunneling
• Multipartite
• Armored
BOOT SECTOR VIRUS
PROGRAM THREATS
• Many variations, many names
• Trojan Horse
• Code segment that misuses its environment
• Exploits mechanisms for allowing programs written by users to be executed
by other users
• Spyware, pop-up browser windows, covert channels
• Up to 80% of spam delivered by spyware-infected systems
• Trap Door
• Specific user identifier or password that circumvents normal security
procedures
• Could be included in a compiler
• How to detect them?
PROGRAM THREATS
• Attacks still common, still occurring
• Attacks moved over time from science experiments to tools of
organized crime
• Targeting specific companies
• Creating botnets to use as tool for spam and DDOS delivery
• Keystroke logger to grab passwords, credit card numbers
• Why is Windows the target for most attacks?
• Most common
• Everyone is an administrator
• Licensing required?
• Monoculture considered harmful
SYSTEM AND NETWORK THREATS
• Some systems “open” rather than secure by default
• Reduce attack surface
• But harder to use, more knowledge needed to administer
• Network threats harder to detect, prevent
• Protection systems weaker
• More difficult to have a shared secret on which to base access
• No physical limits once system attached to internet
• Or on network with system attached to internet
• Even determining location of connecting system difficult
• IP address is only knowledge
SYSTEM AND NETWORK THREATS
• Worms – use spawn mechanism; standalone program
• Internet worm
• Exploited UNIX networking features (remote access) and bugs in finger and
sendmail programs
• Exploited trust-relationship mechanism used by rsh to access friendly
systems without use of password
• Grappling hook program uploaded main worm program
• 99 lines of C code
• Hooked system then uploaded main code, tried to attack connected
systems
• Also tried to break into other users accounts on local system via password
guessing
• If target system already infected, abort, except for every 7th time
SYSTEM AND NETWORK THREATS
• Denial of Service
• Overload the targeted computer preventing it from doing any useful work
• Distributed Denial-of-Service (DDoS) come from multiple sites at once
• Consider the start of the IP-connection handshake (SYN)
• How many started-connections can the OS handle?
• Consider traffic to a web site
• How can you tell the difference between being a target and being really
popular?
• Accidental – CS students writing bad fork() code
• Purposeful – extortion, punishment
• Port scanning
• Automated tool to look for network ports accepting connections
• Used for good and evil
SYSTEM AND NETWORK THREATS
• Port scanning
• Automated attempt to connect to a range of ports on one or a
range of IP addresses
• Detection of answering service protocol
• Detection of OS and version running on system
• nmap scans all ports in a given IP range for a response
• nessus has a database of protocols and bugs (and exploits) to
apply against a system
• Frequently launched from zombie systems
• To decrease trace-ability
EXAMPLE - WINDOWS 10
• Security is based on user accounts
• Each user has unique security ID
• Login to ID creates security access token
• Includes security ID for user, for user’s groups, and special privileges
• Every process gets copy of token
• System checks token to determine if access allowed or denied
• Uses a subject model to ensure access security
• A subject tracks and manages permissions for each program that a user
runs
• Each object in Windows has a security attribute defined by a security descriptor
• For example, a file has a security descriptor that indicates the access
permissions for all users
EXAMPLE - WINDOWS 7
• Win added mandatory integrity controls – assigns integrity label to each
securable object and subject
• Subject must have access requested in discretionary access-control list to
gain access to object
• Security attributes described by security descriptor
• Owner ID, group security ID, discretionary access-control list, system access-
control list
• Objects are either container objects (containing other objects, for example a
file system directory) or noncontainer objects
• By default an object created in a container inherits permissions from the
parent object
• Some Win 10 security challenges result from security settings being weak by
default, the number of services included in a Win 10 system, and the number of
applications typically installed on a Win 10 system
GOAL OF PROTECTION
• In one protection model, computer consists of a collection of
objects, hardware or software
• Each object has a unique name and can be accessed through a well-
defined set of operations
• Protection problem - ensure that each object is accessed correctly
and only by those processes that are allowed to do so
PRINCIPLES OF PROTECTION
• Guiding principle – principle of least privilege
• Programs, users and systems should be given just enough privileges
to perform their tasks
• Properly set permissions can limit damage if entity has a bug, gets
abused
• Can be static (during life of system, during life of process)
• Or dynamic (changed by process as needed) – domain switching,
privilege escalation
• Compartmentalization a derivative concept regarding access to data
• Process of protecting each individual system component through
the use of specific permissions and access restrictions
PRINCIPLES OF PROTECTION
• Must consider “grain” aspect
• Rough-grained privilege management easier, simpler, but least
privilege now done in large chunks
• For example, traditional Unix processes either have abilities of the
associated user, or of root
• Fine-grained management more complex, more overhead, but more
protective
• File ACL lists, RBAC
• Domain can be user, process, procedure
• Audit trail – recording all protection-orientated activities, important to
understanding what happened, why, and catching things that shouldn’t
• No single principle is a panacea for security vulnerabilities – need defense
in depth
DOMAIN OF PROTECTION
• Rings of protection separate functions into domains and order them
hierarchically
• Computer can be treated as processes and objects
• Hardware objects (such as devices) and software objects (such
as files, programs, semaphores
• Process for example should only have access to objects it currently
requires to complete its task – the need-to-know principle
DOMAIN OF PROTECTION
• Implementation can be via process operating in a protection
domain
• Specifies resources process may access
• Each domain specifies set of objects and types of operations on
them
• Ability to execute an operation on an object is an access right
• <object-name, rights-set>
• Domains may share access rights
• Associations can be static or dynamic
• If dynamic, processes can domain switch
DOMAIN STRUCTURE
• Access-right = <object-name, rights-set>
where rights-set is a subset of all valid operations that can be
performed on the object
• Domain = set of access-rights
DOMAIN STRUCTURE
• Each user may be a domain. In this case, the set of objects that can
be accessed depends on the identity of the user.
• Each process may be a domain. In this case, the set of objects that
can be accessed depends on the identity of the process.
• Each procedure may be a domain. In this case, the set of objects
that can be accessed corresponds to the local variables defined within
the procedure.
ACCESS MATRIX
• View protection as a matrix (access matrix)
• Rows represent domains
• Columns represent objects
• Access(i, j) is the set of operations that a process executing
in Domaini can invoke on Objectj
USE OF ACCESS MATRIX
• If a process in Domain Di tries to do “op” on object Oj, then “op” must be
in the access matrix
• User who creates object can define access column for that object
• Can be expanded to dynamic protection
• Operations to add, delete access rights
• Special access rights:
• owner of Oi
• copy op from Oi to Oj (denoted by “*”)
• control – Di can modify Dj access rights
• transfer – switch from domain Di to Dj
• Copy and Owner applicable to an object
• Control applicable to domain object
USE OF ACCESS MATRIX
• Access matrix design separates mechanism from policy
• Mechanism
• Operating system provides access-matrix + rules
• If ensures that the matrix is only manipulated by authorized
agents and that rules are strictly enforced
• Policy
• User dictates policy
• Who can access what object and in what mode
• But doesn’t solve the general confinement problem
ACCESS MATRIX WITH DOMAIN AS OBJECTS
ACCESS MATRIX WITH COPY RIGHTS
ACCESS MATRIX WITH OWNER RIGHTS
ACCESS MATRIX - IMPLEMENTATION
• Generally, a sparse matrix
• Option 1 – Global table
• Store ordered triples <domain, object, rights-set>
in table
• A requested operation M on object Oj within domain Di -> search
table for < Di, Oj, Rk >
• with M ∈ Rk
• But table could be large -> won’t fit in main memory
• Difficult to group objects (consider an object that all domains can
read)
ACCESS MATRIX - IMPLEMENTATION
• Option 2 – Access lists for objects
• Each column implemented as an access list for one object
• Resulting per-object list consists of ordered pairs <domain,
rights-set> defining all domains with non-empty set of
access rights for the object
• Easily extended to contain default set -> If M ∈ default set, also
allow access
ACCESS MATRIX - IMPLEMENTATION
• Each column = Access-control list for one object
Defines who can perform what operation
Domain 1 = Read, Write
Domain 2 = Read
Domain 3 = Read
• Each Row = Capability List (like a key)
For each domain, what operations allowed on what objects
Object F1 – Read
Object F4 – Read, Write, Execute
Object F5 – Read, Write, Delete, Copy
ACCESS MATRIX - IMPLEMENTATION
• Option 3 – Capability list for domains
• Instead of object-based, list is domain based
• Capability list for domain is list of objects together with operations allows
on them
• Object represented by its name or address, called a capability
• Execute operation M on object Oj, process requests operation and specifies
capability as parameter
• Possession of capability means access is allowed
• Capability list associated with domain but never directly accessible by
domain
• Rather, protected object, maintained by OS and accessed indirectly
• Like a “secure pointer”
• Idea can be extended up to applications
ACCESS MATRIX - IMPLEMENTATION
• Option 4 – Lock-key
• Compromise between access lists and capability lists
• Each object has list of unique bit patterns, called locks
• Each domain as list of unique bit patterns called keys
• Process in a domain can only access object if domain has key
that matches one of the locks
Main Reference
Chapter 16: Security
Chapter 17: Protection
(Operating System Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN:
978-1-119-32091-3, 2018)
Additional References
Chapter 9 (9.1 to 9.4)
(Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed.,
ISBN-10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed.)
Thank You
الجامعة السعودية االلكترونية
الجامعة السعودية االلكترونية
26/12/2021
College of Computing and Informatics
Data Science Pre-Master Program
CS351
OPERATING SYSTEMS
Week 14
Chapter 18: Virtual Machines
Chapter 19: Networks and Distributed Systems
CONTENTS
Overview
Benefits and Features
Types of Virtual Machines and Their Implementations
Virtualization and Operating-System Components
Advantages of Distributed Systems
Design Issues of Distributed Systems
Distributed File Systems
Weekly LEARNING OUTCOMES
Define benefits of virtual machines and distributed systems
Explain the various virtual machine technologies
Describe the methods used to implement virtualization
Evaluate hardware features that support virtualization and explain how they are used by
operating-system modules
Define the roles and types of distributed systems in use today
Describe issues concerning the design of distributed file systems
SYSTEM MODELS
Non-virtual machine Virtual machine
IMPLEMENTATION OF VMMS
• Vary greatly, with options including:
• Type 0 hypervisors - Hardware-based solutions that provide support for
virtual machine creation and management via firmware
• IBM LPARs and Oracle LDOMs are examples
• Type 1 hypervisors - Operating-system-like software built to provide
virtualization
• Including VMware ESX, Joyent SmartOS, and Citrix XenServer
• Type 1 hypervisors – Also includes general-purpose operating systems that
provide standard functions as well as VMM functions
• Including Microsoft Windows Server with HyperV and RedHat Linux with KVM
• Type 2 hypervisors - Applications that run on standard operating systems
but provide VMM features to guest operating systems
• Including VMware Workstation and Fusion, Parallels Desktop, and Oracle VirtualBox
IMPLEMENTATION OF VMMS (CONT.)
• Other variations include:
• Paravirtualization - Technique in which the guest operating
system is modified to work in cooperation with the VMM to
optimize performance
• Programming-environment virtualization - VMMs do not
virtualize real hardware but instead create an optimized virtual
system
• Used by Oracle Java and [Link]
• Emulators – Allow applications written for one hardware
environment to run on a very different hardware environment,
such as a different type of CPU
IMPLEMENTATION OF VMMS (CONT.)
• Application containment - Not virtualization at all but rather
provides virtualization-like features by segregating applications
from the operating system, making them more secure,
manageable
• Including Oracle Solaris Zones, BSD Jails, and IBM AIX WPARs
• Much variation due to breadth, depth and importance of
virtualization in modern computing
BENEFITS AND FEATURES (CONT.)
• Templating – create an OS + application VM, provide it to
customers, use it to create multiple instances of that
combination
• Live migration – move a running VM from one host to
another!
• No interruption of user access
• All those features taken together -> cloud computing
• Using APIs, programs tell cloud infrastructure (servers, networking,
storage) to create new guests, VMs, virtual desktops
TYPES OF VIRTUAL MACHINES AND IMPLEMENTATIONS
• Many variations as well as HW details
• Assume VMMs take advantage of HW features
HW features can simplify implementation, improve performance
• Whatever the type, a VM has a lifecycle
• Created by VMM
• Resources assigned to it (number of cores, amount of memory, networking details,
storage details)
• In type 0 hypervisor, resources usually dedicated
• Other types dedicate or share resources, or a mix
• When no longer needed, VM can be deleted, freeing resources
• Steps simpler, faster than with a physical machine install
• Can lead to virtual machine sprawl with lots of VMs, history and state difficult to track
TYPES OF VMS – TYPE 0 HYPERVISOR
• Old idea, under many names by HW manufacturers
• “partitions”, “domains”
• A HW feature implemented by firmware
• OS need to nothing special, VMM is in firmware
• Smaller feature set than other types
• Each guest has dedicated HW
• I/O a challenge as difficult to have enough devices, controllers to
dedicate to each guest
• Sometimes VMM implements a control partition running daemons
that other guests communicate with for shared I/O
• Can provide virtualization-within-virtualization (guest itself can be a
VMM with guests
• Other types have difficulty doing this
TYPE 0 HYPERVISOR
TYPES OF VMS – TYPE 1 HYPERVISOR
• Commonly found in company datacenters
• In a sense becoming “datacenter operating systems”
• Datacenter managers control and manage OSes in new,
sophisticated ways by controlling the Type 1 hypervisor
• Consolidation of multiple OSes and apps onto less HW
• Move guests between systems to balance performance
• Snapshots and cloning
TYPES OF VMS – TYPE 1 HYPERVISOR (CONT.)
• Special purpose operating systems that run natively on HW
• Rather than providing system call interface, create run and manage
guest OSes
• Can run on Type 0 hypervisors but not on other Type 1s
• Run in kernel mode
• Guests generally don’t know they are running in a VM
• Implement device drivers for host HW because no other component
can
• Also provide other traditional OS services like CPU and memory
management
TYPES OF VMS – TYPE 1 HYPERVISOR (CONT.)
• Another variation is a general purpose OS that also provides
VMM functionality
• RedHat Enterprise Linux with KVM, Windows with Hyper-V, Oracle
Solaris
• Perform normal duties as well as VMM duties
• Typically less feature rich than dedicated Type 1 hypervisors
• In many ways, treat guests OSes as just another process
• Albeit with special handling when guest tries to execute special
instructions
TYPES OF VMS – TYPE 2 HYPERVISOR
• Less interesting from an OS perspective
• Very little OS involvement in virtualization
• VMM is simply another process, run and managed by host
• Even the host doesn’t know they are a VMM running guests
• Tend to have poorer overall performance because can’t take
advantage of some HW features
• But also a benefit because require no changes to host OS
• Student could have Type 2 hypervisor on native host, run
multiple guests, all on standard host OS such as Windows, Linux,
MacOS
TYPES OF VMS – PARAVIRTUALIZATION
• Does not fit the definition of virtualization – VMM not presenting
an exact duplication of underlying hardware
• But still useful!
• VMM provides services that guest must be modified to use
• Leads to increased performance
• Less needed as hardware support for VMs grows
• Xen, leader in paravirtualized space, adds several techniques
• For example, clean and simple device abstractions
• Efficient I/O
• Good communication between guest and VMM about device I/O
• Each device has circular buffer shared by guest and VMM via shared memory
XEN I/O VIA SHARED CIRCULAR BUFFER
TYPES OF VMS – PARAVIRTUALIZATION (CONT.)
• Xen, leader in paravirtualized space, adds several techniques
(Cont.)
• Memory management does not include nested page tables
Each guest has own read-only tables
Guest uses hypercall (call to hypervisor) when page-table changes needed
• Paravirtualization allowed virtualization of older x86 CPUs (and
others) without binary translation
• Guest had to be modified to use run on paravirtualized VMM
• But on modern CPUs Xen no longer requires guest
modification -> no longer paravirtualization
TYPES OF VMS – PROGRAMMING ENVIRONMENT VIRTUALIZATION
• Also not-really-virtualization but using same techniques, providing similar features
• Programming language is designed to run within custom-built virtualized
environment
• For example Oracle Java has many features that depend on running in Java Virtual Machine
(JVM)
• In this case virtualization is defined as providing APIs that define a set of features
made available to a language and programs written in that language to provide an
improved execution environment
• JVM compiled to run on many systems (including some smart phones even)
• Programs written in Java run in the JVM no matter the underlying system
• Similar to interpreted languages
TYPES OF VMS – EMULATION
• Another (older) way for running one operating system on a different operating system
• Virtualization requires underlying CPU to be same as guest was compiled for
• Emulation allows guest to run on different CPU
• Necessary to translate all guest instructions from guest CPU to native CPU
• Emulation, not virtualization
• Useful when host system has one architecture, guest compiled for other architecture
• Company replacing outdated servers with new servers containing different CPU
architecture, but still want to run old applications
• Performance challenge – order of magnitude slower than native code
• New machines faster than older machines so can reduce slowdown
• Very popular – especially in gaming where old consoles emulated on new
TYPES OF VMS – APPLICATION CONTAINMENT
• Some goals of virtualization are segregation of apps, performance and resource
management, easy start, stop, move, and management of them
• Can do those things without full-fledged virtualization
• If applications compiled for the host operating system, don’t need full virtualization to meet
these goals
• Oracle containers / zones for example create virtual layer between OS and apps
• Only one kernel running – host OS
• OS and devices are virtualized, providing resources within zone with impression that they are
only processes on system
• Each zone has its own applications; networking stack, addresses, and ports; user accounts, etc
• CPU and memory resources divided between zones
• Zone can have its own scheduler to use those resources
VIRTUALIZATION AND OPERATING-SYSTEM COMPONENTS
• Now look at operating system aspects of virtualization
• CPU scheduling, memory management, I/O, storage, and unique
VM migration feature
How do VMMs schedule CPU use when guests believe they have
dedicated CPUs?
How can memory management work when many guests require large
amounts of memory?
OS COMPONENT – CPU SCHEDULING
• Even single-CPU systems act like multiprocessor ones when virtualized
• One or more virtual CPUs per guest
• Generally VMM has one or more physical CPUs and number of threads to
run on them
• Guests configured with certain number of VCPUs
Can be adjusted throughout life of VM
• When enough CPUs for all guests -> VMM can allocate dedicated CPUs,
each guest much like native operating system managing its CPUs
• Usually not enough CPUs -> CPU overcommitment
VMM can use standard scheduling algorithms to put threads on CPUs
Some add fairness aspect
OS COMPONENT – CPU SCHEDULING (CONT.)
• Cycle stealing by VMM and oversubscription of
CPUs means guests don’t get CPU cycles they
expect
• Consider timesharing scheduler in a guest trying to
schedule 100ms time slices -> each may take 100ms, 1
second, or longer
• Poor response times for users of guest
• Time-of-day clocks incorrect
• Some VMMs provide application to run in each guest
to fix time-of-day and provide other integration
features
OS COMPONENT – I/O
• Easier for VMMs to integrate with guests because I/O has lots of variation
• Already somewhat segregated / flexible via device drivers
• VMM can provide new devices and device drivers
• But overall I/O is complicated for VMMs
• Many short paths for I/O in standard OSes for improved performance
• Less hypervisor needs to do for I/O for guests, the better
• Possibilities include direct device access, DMA pass-through, direct interrupt delivery
Again, HW support needed for these
• Networking also complex as VMM and guests all need network access
• VMM can bridge guest to network (allowing direct access)
• And / or provide network address translation (NAT)
NAT address local to machine on which guest is running, VMM provides address translation to
guest to hide its address
OS COMPONENT – STORAGE MANAGEMENT
• Both boot disk and general data access need be provided by VMM
• Need to support potentially dozens of guests per VMM (so standard disk
partitioning not sufficient)
• Type 1 – storage guest root disks and config information within file system
provided by VMM as a disk image
• Type 2 – store as files in file system provided by host OS
• Duplicate file -> create new guest
• Move file to another system -> move guest
• Physical-to-virtual (P-to-V) convert native disk blocks into VMM format
• Virtual-to-physical (V-to-P) convert from virtual format to native or disk format
• VMM also needs to provide access to network attached storage (just
networking) and other disk images, disk partitions, disks, etc.
OS COMPONENT – LIVE MIGRATION
• Taking advantage of VMM features leads to new functionality not found on general operating
systems such as live migration
• Running guest can be moved between systems, without interrupting user access to the guest or
its apps
• Very useful for resource management, maintenance downtime windows, etc.
• The source VMM establishes a connection with the target VMM
• The target creates a new guest by creating a new VCPU, etc.
• The source sends all read-only guest memory pages to the target
• The source sends all read-write pages to the target, marking them as clean
• The source repeats step 4, as during that step some pages were probably modified by the
guest and are now dirty
• When cycle of steps 4 and 5 becomes very short, source VMM freezes guest, sends VCPU’s
final state, sends other state details, sends final dirty pages, and tells target to start running
the guest
• Once target acknowledges that guest running, source terminates guest
LIVE MIGRATION OF GUEST BETWEEN SERVERS
OVERVIEW
• A distributed system is a collection of loosely coupled nodes
interconnected by a communications network
• Nodes variously called processors, computers, machines, hosts
• Site is location of the machine, node refers to specific system
• Generally a server has a resource a client node at a different site wants to
use
OVERVIEW (CONT.)
• Nodes may exist in a client-server, peer-to-peer, or hybrid
configuration.
• In client-server configuration, server has a resource that a client would like to
use
• In peer-to-peer configuration, each node shares equal responsibilities and can
act as both clients and servers
• Communication over a network occurs through message passing
• All higher-level functions of a standalone system can be expanded to
encompass a distributed system
REASONS FOR DISTRIBUTED SYSTEMS
• Resource sharing
• Sharing files or printing at remote sites
• Processing information in a distributed database
• Using remote specialized hardware devices such as graphics processing units
(GPUs)
• Computation speedup
• Distribute subcomputations among various sites to run concurrently
• Load balancing – moving jobs to more lightly-loaded sites
• Reliability
• Detect and recover from site failure, function transfer, reintegrate failed site
DESIGN ISSUES OF DISTRIBUTED SYSTEMS
• We investigate three design questions:
• Robustness – Can the distributed system withstand
failures?
• Transparency – Can the distributed system be
transparent to the user both in terms of where files are
stored and user mobility?
• Scalability – Can the distributed system be scalable to
allow addition of more computation power, storage, or
users?
ROBUSTNESS
• Hardware failures can include failure of a link, failure of a
site, and loss of a message.
• A fault-tolerant system can tolerate a certain level of failure
• Degree of fault tolerance depends on design of system and the
specific fault
• The more fault tolerance, the better!
• Involves failure detection, reconfiguration, and recovery
FAILURE DETECTION
• Detecting hardware failure is difficult
• To detect a link failure, a heartbeat protocol can be used
• Assume Site A and Site B have established a link
• At fixed intervals, each site will exchange an I-am-up message indicating that
they are up and running
• If Site A does not receive a message within the fixed interval, it
assumes either (a) the other site is not up or (b) the message was lost
• Site A can now send an Are-you-up? message to Site B
• If Site A does not receive a reply, it can repeat the message or try an
alternate route to Site B
FAILURE DETECTION (CONT.)
• If Site A does not ultimately receive a reply from Site B, it concludes
some type of failure has occurred
• Types of failures:
- Site B is down
- The direct link between A and B is down
- The alternate link from A to B is down
- The message has been lost
• However, Site A cannot determine exactly why the failure has
occurred
RECONFIGURATION AND RECOVERY
• When Site A determines a failure has occurred, it must
reconfigure the system:
• If the link from A to B has failed, this must be broadcast to every
site in the system
• If a site has failed, every other site must also be notified indicating
that the services offered by the failed site are no longer available
• When the link or the site becomes available again, this
information must again be broadcast to all other sites
TRANSPARENCY
• The distributed system should appear as a
conventional, centralized system to the user
• User interface should not distinguish between local and
remote resources
• Example: Network File system (NFS)
• User mobility allows users to log into any machine in the
environment and see his/her environment
• Example: Lightweight Directory Access Protocol (LDAP) plus
desktop virtualization
SCALABILITY
• As demands increase, the system should easily accept the
addition of new resources to accommodate the increased
demand
• Reacts gracefully to increased load
• Adding more resources may generate additional indirect load on
other resources if not careful
• Data compression or deduplication can cut down on storage and
network resources used
DISTRIBUTED FILE SYSTEM
• Distributed file system (DFS) – a file system whose clients,
servers, and storage devices are dispersed among the
machines of a distributed system
• Should appear to its clients as a conventional, centralized file
system
• Key distinguishing feature is management of dispersed
storage devices
DISTRIBUTED FILE SYSTEM (CONT.)
• Service – software entity running on one or more machines and
providing a particular type of function to a priori unknown clients
• Server – service software running on a single machine
• Client – process that can invoke a service using a set of operations
that forms its client interface
• A client interface for a file service is formed by a set of primitive file
operations (create, delete, read, write)
• Client interface of a DFS should be transparent; i.e., not distinguish
between local and remote files
• Sometimes lower level inter-machine interface need for cross-
machine interaction
DISTRIBUTED FILE SYSTEM (CONT.)
• Two widely-used architectural models include client-server model
and cluster-based model
• Challenges include:
• Naming and transparency
• Remote file access
• Caching and cache consistency
CLIENT-SERVER DFS MODEL
• Server(s) store both files and metadata on attached storage
• Clients contact the server to request files
• Sever responsible for authentication, checking file permissions, and delivering
the file
• Changes client makes to file must be propagated back to the server
• Popular examples include NFS and OpenAFS
• Design suffers from single point of failure if server crashes
• Server presents a bottleneck for all requests of data and metadata
• Could pose problems with scalability and bandwidth
CLIENT-SERVER DFS MODEL (CONT.)
CLUSTER-BASED DFS MODEL
• Built to be more fault-tolerant and scalable than client-server
DFS
• Examples include the Google File System (GFS) and Hadoop
Distributed File System (HDFS)
• Clients connected to master metadata server and several data
servers that hold “chunks” (portions) of files
• Metadata server keeps mapping of which data servers hold chunks
of which files
• As well as hierarchical mapping of directories and files
• File chunks replicated n times
CLUSTER-BASED DFS MODEL (CONT.)
CLUSTER-BASED DFS MODEL (CONT.)
• GFS design was influenced by following observations:
• Hardware component failures are the norm rather than the exception
and should be routinely expected.
• Files stored on such a system are very large.
• Most files are changed by appending new data to the end rather than
overwriting existing data.
• Redesigning the applications and file system API increases system
flexibility
Requires applications to be programmed specially with new API
• Modularized software layer MapReduce can sit on top of GFS
to carry out large-scale parallel computations while utilizing
benefits of GFS
• Hadoop framework also stackable and modularized
NAMING AND TRANSPARENCY
• Naming – mapping between logical and physical objects
• Multilevel mapping – abstraction of a file that hides the details
of how and where on the disk the file is actually stored
• A transparent DFS hides the location where in the network the
file is stored
• For a file being replicated in several sites, the mapping returns
a set of the locations of this file’s replicas; both the existence of
multiple copies and their location are hidden
NAMING STRUCTURES
• Location transparency – file name does not reveal the file’s
physical storage location
• Location independence – file name does not need to be
changed when the file’s physical storage location changes
• In practice most DFSs use static, location-transparent mapping
for user-level names
• Some support file migration (e.g. OpenAFS)
• Hadoop supports file migration but without following POSIX
standards; hides information from clients
• Amazon S3 provides blocks of storage on demand via APIs, placing
storage dynamically and moving data as necessary
NAMING SCHEMES
• Three approaches:
• Files named by combination of their host name and local name;
guarantees a unique system-wide name. This naming scheme is
neither location transparent nor location independent.
• Attach remote directories to local directories, giving the appearance
of a coherent directory tree; only previously mounted remote
directories can be accessed transparently
• Single global name structures spans all files in the system. If a server is
unavailable, some arbitrary set of directories on different machines
also becomes unavailable
REMOTE FILE ACCESS
• Consider a user who requests access to a remote file. The server
storing the file has been located by the naming scheme, and now the
actual data transfer must take place.
• Remote-service mechanism is one transfer approach.
• A requests for accesses are delivered to the server, the server machine
performs the accesses, and their results are forwarded back to the user.
• One of the most common ways of implementing remote service is the RPC
paradigm
REMOTE FILE ACCESS (CONT.)
• Reduce network traffic by retaining recently accessed disk blocks in a
cache, so that repeated accesses to the same information can be
handled locally
• If needed data not already cached, a copy of data is brought from the server
to the user
• Accesses are performed on the cached copy
• Files identified with one master copy residing at the server machine, but
copies of (parts of) the file are scattered in different caches
• Cache-consistency problem – keeping the cached copies consistent
with the master file
• Could be called network virtual memory
CACHE LOCATION – DISK VS. MAIN MEMORY
• Advantages of disk caches
• More reliable
• Cached data kept on disk are still there during recovery and don’t need to
be fetched again
• Advantages of main-memory caches:
• Permit workstations to be diskless
• Data can be accessed more quickly
• Performance speedup in bigger memories
• Server caches (used to speed up disk I/O) are in main memory regardless
of where user caches are located; using main-memory caches on the user
machine permits a single caching mechanism for servers and users
CACHE UPDATE POLICY
• Write-through – write data through to disk as soon as they are
placed on any cache
• Reliable, but poor performance
• Delayed-write (write-back) – modifications are written to the cache
and then written through to the server later
• Write accesses complete quickly; some data may be overwritten before they
are written back, and so need never be written at all
• Poor reliability; unwritten data will be lost whenever a user machine crashes
• Variation – scan cache at regular intervals and flush blocks that have been
modified since the last scan
• Variation – write-on-close, writes data back to the server when the file is
closed
• Best for files that are open for long periods and frequently modified
CONSISTENCY
• Is locally cached copy of the data consistent with the master copy?
• Client-initiated approach
• Client initiates a validity check
• Server checks whether the local data are consistent with the master copy
• Server-initiated approach
• Server records, for each client, the (parts of) files it caches
• When server detects a potential inconsistency, it must react
CONSISTENCY (CONT.)
• In cluster-based DFS, cache-consistency issue more
complicated due to presence of metadata server and
replicated file data chunks
• HDFS allows append-only write operations (no random writes) and a
single file writer
• GFS allows random writes with concurrent writers
• Complicates write consistency guarantees for GFS while
simplifying it for HDFS
Required Reading
1. Chapter 18: Virtual Machines (Operating System Concepts by
Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-32091-3, 2018)
2. Chapter 19: Networks and Distributed Systems (Operating System
Concepts by Silberschatz, Abraham, et al. 10th ed., ISBN: 978-1-119-
32091-3, 2018)
Recommended Reading
1. Chapter 7 (Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed., ISBN-10:
0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
2. Chapter 8.3 (Modern Operating Systems by Andrew S. Tanenbaum and Herbert Bos. 4th ed., ISBN-
10: 0-13-359162-X, ISBN-13: 978-0-13-359162-0, 2015)
This Presentation is mainly dependent on the textbook: Operating System Concepts by Silberschatz, Abraham, et al. 10th ed
Thank You