Operating Systems Study Notes
Operating Systems Study Notes
Systems
This minisite contains notes taken by Chris Northwood whilst studying Computer Science at
the University of York between 2005-09 and the University of Sheffield 2009-10.
They are published here in case others find them useful, but I provide no warranty for their
accuracy, completeness or whether or not they are up-to-date.
The contents of this page have dubious copyright status, as great portions of some of my
revision notes are verbatim from the lecture slides, what the lecturer wrote on the board, or
what they said. Additionally, lots of the images have been captured from the lecture slides.
System Components
A computer system can be abstracted into various components. Users (people, or
other machines and systems) interact with system and application programs (defines
the way in which system resources are used to solve the computing problems of the
users), which interact with the OS (controls and co-ordinates the use of hardware
among the various application programs for the user), which interacts with the
hardware (the most basic computing resources - CPU, memory, I/O, etc). When
developing an OS, all levels must be considered, not just the OS level.
For users, they interact with the hardware (monitor, keyboard, mouse, etc) to control
the system and application programs. The program needs an efficient interface to
interact with the OS to utilise the resources it needs to accomplish the tasks the user
wants to do and to also communicate with the user. The OS interface describes the
calls that an application program can make into the OS.
The OS is designed for ease of use, which is in terms of not having to deal with low
level hardware issues. This is more important than CPU usage - a user doesn't care if
the CPU is idle 99% of the time. The goal of OS design is to provide a useful,
simple and efficient system:
ease of use
minimise resource usage
maximise resource utilisation
However, abstraction away from the physical machine to provide ease of use can
lead to inefficiencies. There's a trade off required between the extra functionality of
the virtual machine and the overheads required to implement this.
These systems had minimal operating systems and were typically mainframe
computers used to support commercial and scientific applications. Jobs and
programs were written using punch cards and the computers were normally
application specific.
The minimal OS was required to interact with I/O devices and a subroutine
library existed to manage I/O interaction. The library is loaded into the top of
memory and stays there. Application programs existed (compiler/card reader),
but were limited.
These systems had a new memory layout and the processor had two modes -
Normal, where some instructions (e.g., HALT) are forbidden, which was used
by the batch programs, and Privileged, where all instructions are available, this
was the level the OS ran in. Some CPUs also implemented protection in
Normal mode to prevent the user program overriding the first section of
memory in which the OS existed. Normal mode also has an instruction to call
the OS (hand back control).
Batch systems were good for CPU bound scientific computing with minimal
I/O, but bad for data processing with high I/O as the CPU is often idle. The
speeds of the mechanical I/O devices is considerably slower than that of the
electronic computer and jobs often have to pause when performing I/O. In
some cases, the CPU was often idle for >80% of the time.
In these OSs, the OS picks a job and then passes control to it. Eventually, the
job either finishes or waits for some task (e.g., I/O) to complete. In a non-
multiprogrammed system, the CPU would now be idle. In a multiprogrammed
system, the OS switches the CPU to another job.
Eventually, the operation the original job was waiting for completes, enabling
that job to be eligable to be waiting for CPU time. As long as there is at least
one job waiting to execute, the CPU is not idle.
When a job completes, the OS can load another job into that memory slot. This
technique is called spooling - Simultaneous Peripheral Operation On Line, and
is still used today for operations such as print queues.
Multiprogramming OSs are quite complex - they now handle quite complex
I/O devices (disks, etc) and swapping between applications when suspended on
I/O.
Multiprogramming works best when there are a small number of large jobs -
this reduces the overhead of the OS due to managing memory between less
jobs. This method also encourages a fairer overall use of the computer.
In time-share systems, the user has the illusion of using the computer
continuously and the user can now interact directly with the computer whilst it
is running. This encourages smaller jobs and more efficient use of the
computer. Additionally, many users can share the computer simultaneously,
submitting jobs/commands directly via keyboards/terminals, etc.
The first time-sharing system was CTSS, developed by MIT. This formed the
basis for MULTICS, the first commercial OS that had limited success. UNIX
was then developed by Bell Labs in the high-level language C, based on B.
The fourth generation of operating systems broguth about the first mass user
operating systems, as LSI and VLSI enabled low-cost computing, with mini
and desktop computers becoming common.
The age of mass computing is based around desktop computers. The desktop
PC is dedicated to a single user and the distinction between desktop and
minicomputers become increasingly blurred. Many minicomputers are now
merely highend desktop PCs, with advanced I/O capabilities and
multiprocessors.
The multiprogrammed OS (UNIX, VMS, etc) was developed further for mass
user computers, adding functionality like: massive file storage, fast
networking, complex application programs and the potential for
multiprocessor. Modern desktop systems have dedicated single user OSs (e.g.,
CP/M, DOS, MS-DOS, Mac OS, Windows) with complex I/O devices:
keyboards, mice, display screens, small printers, graphics tablets, etc, and
complex graphic applications, with user convenience and responsiveness
considered of high importance.
Much of the current OS trends are based in the relative cost of computers and
people. In the beginning, computers were few with very expensive hardware, so the
people cost was cheap compared to the computer cost, therefore the goal was the
maximise hardware utilisation. Now we have an amazing number of computers with
cheap hardware and the human cost is expensive compared to the hardware cost.
The goal now is ease of use.
Current OS trends are towards specialist requirements - moving away from the
general purpose OS for specialist needs, or tuning existing general purpose OSs. The
introduction of real-time OSs, which require low overheads and high efficiency of
resource usage. Multi-processor machines are becoming more common and OSs
need to update to be able to handle the extra hardware. On the other hand, embedded
devices are now become very common, which have limited power, space and weight
requirements.
Control Program
This hides interfaces to I/O devices, filing systems, etc, and provides a
programming interface for applications.
Kernel
The kernel is the only program resident all the time (all other applications are
application programs).
Process management
Main memory management
File management
I/O system management
Secondary storage management
Protection system
Command interpreter system
All but the last two in the above list can be consdered to be resource management
Processes
A program is an algorithm expressed in some notation. A process is the activity
of executing the program.
Multiple user processes run in seperate (protected) address space, and can't
directly access other memory spaces. Some user programs can share address
space (e.g., an interpreter), but only the code (which is read only) is shared.
The kernel can access all, however.
As the CPU only has one program counter, so the OS must use a system known
as process switching.
Process Creation
When a new process is created, the OS must assign memory and create a
task/process control block (TCB/PCB) which holds various data about the
currently running process, such as:
Process state
Unique program ID
Program counter
CPU registers
Memory information (this, along with the program counter and
CPU registers forms the process context)
I/O status information (e.g., open files)
Process Hierarchies
When a parent creates a child process, the child process can create its own
child processes, and we get a hierarchy (or tree) of processes. UNIX terms
this as a process group, whereas Windows has no such concept, and all
processes are created equal.
Process State
In single processor systems, only one process can be in the running state
at any time. If all processes are waiting (or blocked), then the processor is
idle, until one of the processes become unblocked, or ready.
Process Termination
System Calls
System calls provide an interface between the running process and the
OS. They are available as an assembly level instruction, normally some
kind of software interrupt (INT 0x80 on Linux) which corresponds to a
system call to which the OS reacts appropriately.
System calls normally map onto the services provided by the OS.
Threads
Traditional (heavyweight) processes have a single control path through their code.
Many applications are concurrent, but do not want the overhead of multiple
processes and inter-process communication (e.g., a word processor has UI, reading
input, a background spell checker, etc).
Many OS's provide threads (sometimes called lightweight processes), a basic unit of
CPU use. Each is a seperate control path through the code, and threads within a
process are essentially independent. Threads can access all the address space by the
process, and threads have no protection against each other (therefore, you should
program threads safely).
Each thread comprises of a thread ID, program counter, register set and a stack.
The thread shares with other threads belonging to a process, the code section,
the data section and OS resources such as open files. Each thread needs
context, but not as much as the process needs, as it inherits some of the context
of the process.
Threading has various benefits, such as responsiveness - other threads can continue
if one blocks (e.g., on I/O), resource benefits - there's only one copy of the code and
TCB, etc, economy - thread creation and management is less costly than that of
processes, and better utilisations of multi-processor architectures (threads can run in
parallel).
User Threads
This method does have disadvantages, however. When a thread calls I/O, the
whole process is blocked, as the OS sees the whole process as waiting for I/O.
When an application has threads that are likely to block often, then user threads
may not be the best option.
Kernel Threads
Kernel threads are implemented at the OS level, and therefore has higher
overheads (a thread control space/table is now needed in the TCB) and is
slower, due to syscalls. With kernel threads, the OS schedules all threads, so it
can schedule appropriately (e.g., when one thread blocks, only move that into
waiting instead of the whole process). Threads can also be scheduled over
multiple processes, so we have parallelism, instead of concurrency (which can
introduce its own set of problems).
Hybrid Threading
Both user and kernel threading has their advantages and disadvantages, so
using the hybrid methods give you the best of both worlds. Most real
implementations of threading are hybrid.
Many-to-One
One-to-One
All threads here are kernel threads - each user thread maps directly on to
one kernel thread. This has more concurrency than the many-to-one
model, as when one thread blocks another can run. This does have the
overhead of having to create a kernel thread for each user thread,
however.
Many-to-Many
This is the normal method of implementing threading. Here, the user
library controls all - which threads are in the kernel, which are not and
also controls swapping, etc. The library tries to control the threads so that
not all of the kernel threads end up blocked at the same time, by swapping
in and out waiting user threads to kernel threads.
This has the advantages of there being control over concurrency in the
application, and the application can allocate user threads to kernel threads
in a way to optimise the concurrency and ensure that at least one user
thread is runnable at any time.
This method is quite complex, however and there are overheads in both
the OS and the application. The user thread scheduling policy may also
conflict with the kernel thread scheduling policy, introducing
inefficiencies.
Threading Issues
fork() creates a seperate duplicate process and exec() replaces the entire
process with the program specified by its parameter. However, we need to
consider what happens in a multi-threaded process. exec() works in the same
manor - replacing the entire process including any threads (kernel threads are
reclaimed by the kernel), but if a thread calls fork(), should all threads be
duplicated, or is the new process single threaded?
Some UNIX systems work around this by having two versions of fork(), a
fork() that duplicates all threads and a fork that duplicates only the calling
thread, and usage of these two versions depends entirely on the application. If
exec() is called immediately after a fork(), then duplicating all threads is
not necessary, but if exec() is not called, all threads are copied (an OS
overhead is implied to copy all the kernel threads as well).
Thread Cancellation
Signal Handling
Synchronous signals are delivered to the same process that performs the
operation that caused the signal, e.g., divide by 0 errors, etc...
To the thread where the signal applies (how do you know this?)
To every thread (has a lot of overhead)
To certain threads (which ones?)
To a designated thread, which handles all signals (has the problem of
there being the overhead of another thread, and the thread may not be
scheduled in the best way to handle the signal)
CPU Scheduling
CPU scheduling refers to the assignment of the CPU amongst processes that are
ready. It selects from among the processes in memory that are ready to execute and
allocates the CPU to one of them. In terms of process state, the scheduler decides
which of the processes that are in the ready state should be allocated to the CPU and
enter the running state.
A scheduling decision may also take place as the result of a hardware interrupt.
Dispatcher
The dispatcher module gives control of the CPU to the process selected by the
scheduler. This involves:
switching context
switching to user mode
jumping to the proper location in the user program to restart that
program
With the dispatcher, we have a new metric, dispatch latency, which is the time
it takes for the dispatcher to stop one process and start another running.
Dispatch latency is optimal when it is minimised.
This is based upon the states in the process state transition graph. The OS
needs to find the TCBs of all processes efficiently, and needs to find a process
waiting on a particular event quickly when that event occurs. Specifically, the
OS needs to find the next process to dispatch (i.e., from the ready queue)
quickly on a context switch. You also need to be able to move processes
between queues efficiently.
A process is only in one queue at a time. For example, you may have a ready
queue, which is the set of all processes residing in main memory, ready to
execute, and also a number of waiting queues, one for each different I/O
device, which is the set of processes waiting on I/O from that device. Processes
move between queues as they change state. Pictorally, this may look like this:
With this model, we have two types of schedulers, the short-term scheduler (or
CPU scheduler), which selects which process should be executed next from the
ready queue and allocates CPU. It is invoked very frequently (every 100 or so
milliseconds), and must be very fast. Additionally, we have the long-term
scheduler (or job scheduler). This selects which new processes should be
brought into the ready queue, and is invoked very infrequently (in terms of
seconds and perhaps even minutes). The long-term scheduler controls the
degree of multi-programming (i.e., the number of processes in memory).
However, how do we decide which jobs should be brought in from main
memory?
Processes can be described as being either I/O-bound (spends more time doing
I/O than computations, and has many short CPU bursts) or CPU-bound (spends
more time doing computation, has a few very long CPU bursts). The long term
scheduler tries to maintain a balance between these two forms.
Some OSs also have a medium term scheduler, which swaps in/out partially
executed processes.
Scheduling Policy
When the OS needs to select another process to execute, the scheduling policy
decides which of the processes to assign to the CPU. Scheduling policy
effectively determines the order of the ready queue so that the next process to
run is at the head of the queue.
Scheduling policies are often defined on the CPU-I/O burst cycle. Most
process executions consist of a CPU execution and an I/O wait. The
characteristics of this cycle help determine policies. The relationship between
relative sizes of CPU and I/O bursts are very important. I/O bursts are very
long compared to CPU bursts in most applications. A histogram of CPU-burst
times gives us a typical CPU-burst graph, showing many short CPU bursts and
few long CPU bursts, which are the CPU bound processes.
The average waiting time for FCFS is not optimal, however, as it suffers
from the convoy effect, where short processes are delayed behind long
processes.
Shortest-Job-First Scheduling
This has two schemes: non-preemptive (once the CPU is given to the
process, it cannot be preempted until it completes its CPU burst); and
preemptive (if a new process arrives with CPU burst length less than the
remaining time of the current executing process, preempt), which is
sometimes known as Shortest-Remaining-Time-First (SRTF) .
SJF is optimal for giving a minimum averasge waiting time for a given set
of processes.
Determining the length of the next CPU burst can only be done by
estimating the length. One way of accomplishing this is by using the
length of previous CPU bursts and exponential averaging with the
forumla τn + 1 = αtn + (1 - α)τn, where tn = the actual length of the nth
CPU burst, τn + 1 = predicted value for next CPU burst and 0 ≤ α ≤ 1.
Priority Scheduling
A priority number (integer) is associated with each process and the CPU
is allocated to the process with the highest priority (the smallest integer).
This can be either preemptive or non-preemptive. SJF is a priority
scheduling policy if the priority is predicted next CPU burst time.
This does have the problem of starvation, where low priority processes
may never execute, and the solution to this is aging, where as time
progresses, the priority of the process is increased.
Round Robin
Each process gets a small unit of CPU time (a time quanta), which is
usually about 10-100 ms. After this time has elapsed, the process is
preempted and added to the end of the ready queue. A smaller time quanta
means more context switches and therefore more overheads, so the time
quanta needs to be set to some appropriate value that is large with respect
to the size of the context switch, but not too high as to basically become a
FIFO policy. If there are n processes in the ready queue and the time
quantum is q, then the process gets 1/n of the CPU time in chunks of at
most q time units at once. No process waits more than (n - 1)q time units.
Multi-level Scheduling
number of queues
scheduling algorithms for each queue
method used to determine when to upgrade a process
method used to determine when to downgrade a process
method used to determine which queue a process will enter when
that process needs service
Thread Scheduling
For user-level threads, any general purpose scheduling policy can be used
according to the best policy for that application. General user-level thread
implementations should allow the application sufficient freedom to define an
application-specific scheduling regime, i.e., allow the application to reorder the
thread scheduling queues at any time, with any policy.
Memory Management
Processes (including threads) need physical memory in order to execute. The OS
manages memory and allocates memory to processes from physical memory, and
virtual memory. Physical memory management enables the OS to safely share
physical memory among processes. This allows us to multiprogram the processor, to
time-share the system (overlap computation and I/O) and to prevent one process
from using another memory.
Most modern OSs work by giving each process a part of memory that it is allowed
to access and check each memory access to ensure that the process stays within its
own memory.
Main memory must accomodate both the OS and user processes. Usually, the
main memory is divided into a number of distinct partitions, such as the
resident OS, which is normally a single partition held in low memory, as this is
where CPUs place interrupt vectors. User processes normally consist of one or
more partitions, within memory above the OS. However, we need to consider
how we're going to divide up these user partitions.
The simplest scheme is to partition into regions with fixed boundaries - this is
called fixed partitioning and has two types, where the partitions are of equal
size, or where they are of unequal size. Any process whose size is less than or
equal to the partition size can be loaded into an available partition. If all the
partitions are full, the operating system can swap a process out of a partition.
However, if a process doesn't fit into a partition, it is the programmers
responsibility to make sure that the program can be split over multiple
partitions. With this, main memory use is inefficient. Any program, no matter
how small, occupies an entire partion. This is called internal fragmentation.
With unequal partitions, the problem is reduced, but not eliminated. Instead of
creating x partitions of equal size, partitions of varying sizes are created, so a
program can be loaded into the one most appropriate to its size.
First-fit is usually the best and fastest, but analysis has shown that under first-
fit, fragmentation can cause up to half of the available memory to be lost in
holes (external fragmentation) or unused memory (internal fragmentation).
Best-fit has the smallest amount of external fragmentation. Next-fit can
produce external fragmentation similar to best or first, depending upon the next
block found. It does use holes at the end of memory more frequently than
others perhaps leading to the largest blocks of contiguous memory.
At any time, the buddy system maintains a list of holes of each size 2i. A hole
on the i + 1 list can be split into 2 equal holes of size 2i in the i list.
To work round this, we have the concept of a logical address space that is
bound to a seperate physical address space, which is the key to memory
management. Logical addresses are the ones generated by the CPU, and
physical addresses are the ones seen by the memory unit. When using compile
or load-time address-binding schemes, the logical and physical addresses are
the same, but with the execution-time addres-binding scheme, the logical
(virtual) and physical addresses differ. This is accomplished using a memory
management unit.
The most simple form of MMU is the relocation register. The relocation
register holds the value of an offset and then adds the offset to all logical
addresses given to it to give the physical address. In this system, logical
addresses are in the range 0..max for all processes and physical addresses are in
the range R..R+max, where R is the value in the relocation register. The
process only needs to generate the logical addresses, and only sees the process
running in addresses 0..max. This principle of mapping is central to memory
management.
The relocation register can provide memory protection with the addition of a
limit register, which contains the allowed range of logical addresses - each
logical address must be less than the limit register. Hardware support is
obviously required for this.
Another way of thinking of this is not thinking of a program as one large chunk
of contiguous memory - a process may be broken up into pieces that do not
need to be loaded into main memory all the time. This increases opportunities
for multiprogramming as with many processes in main memory, it is more
likely a process will be in the ready state at any particular point. This is
achieved by programming techniques with minimal OS support.
So far, we have assumed that the entire program data and code must be
loaded into memory for process to execute - this is called static loading.
An alternative approach is called dynamic loading which works by not
loading routines until it is called. This gives a better memory-space
utilisation and unused routines are never loaded. This is useful when large
amounts of code are required to handle infrequently occuring cases. If
dynamic loading is implemented through program design, no special
support from the OS is required.
Overlays
These are slightly more structured, and codes are put into phases that are
loaded over one another. Again, the motivation is to keep only code and
data in memory that is actually needed. When one phase completes, the
application arranges to overlay its memory with the next phase when
required. The total memory required is the same as the memory required
by the largest phase. Some OS support may be given to implement this.
This is useful when there is a limited amount of physical RAM, and some
secondary storage is available to hold code/data, e.g., in embedded
systems.
Shared Libraries
Swapping
A process (or part of a process) can be swapped temporarily out of
memory to a backing store (e.g., disk) and then brought back into memory
for continued execution. This is how medium-level scheduling is used. If
you want to reload at a different address, execution time binding is
required.
The issue with swapping is the backing store - a fast disk large enough to
accomodate copies of all memory images for all users that must provide
direct access to these memory images is required. A major part of the
swap time overhead is the transfer time. The total transfer time is directly
proportional to the amount of memory swapped.
One of the more common swapping variants is called roll-out, roll-in and
it is used with priority-based scheduling algorithms. Lower-priority
processes are swapped out so higher-priority processes can be loaded and
executed.
Paging
The problems with using physical memory allocation with unequal, fixed
or variable sized partitions are fragmentation and inefficiencies
(algorithms to track holes, and compaction). Paging and segmentation are
memory allocation techniques that largely eliminate these problems.
With paging, the logical address generated by the CPU is divided into a
page number p and a page offset d. The page number is used as an index
into a page table which contains the base address of each page in physical
memory and the page offset is combined with the base address to define
the physical memory address that is sent to the memory unit.
Usually, page tables are a fixed length, often the same size as a frame. If a
process does not require all entries in the page table, the redundant entries
are set to be invalid. If a reference is made to an invalid page table entry,
the OS must handle a page fault and there is the potential for an erroneous
logical address to have been generated by the application process.
The page table is normally kept in main memory, so in this scheme every
data/instruction access requires two memory lookups, one for the page
table and then one for the actual instruction. The two memory access
problem can be solved using a special peice of hardware called a
translation look-aside buffer (TLB).
Often, having one page table per process can be expensive in terms of
physical memory. The alternative is the inverted page table, where there is
one entry for each real page of memory and the entry consists of the
virtual address of the page stored in that real memory location, with
information about the process that owns that page. This decreases
memory usage, but increases the time needed to search the table when a
page reference occurs. Hash tables can be used to limit the search to one,
or at most a few, page-table entries.
Segmentation
Paging seperates the user's view of memory from actual physical memory.
A user thinks of the program as a collection of segments. A segment is a
logical unit, such as: the main program, procedures, functions, methods,
objects, local variables, global variables, common blocks, the stack, the
symbol table and arrays. Segmentation is a memory management scheme
that supports a user view of memory.
Segmented Paging
Virtual Memory
Virtual memory is the seperation of user logical memory from physical
memory. Only part of the program needs to be in memory for execution (i.e.,
only the part needed for current execution - follows from the principle of
locality, where programs and data references inside a process tend to cluster
and only a few pieces of a process will be needed in any period of time). The
major advantages of virtual memory is that the logical address space can be
much larger than the physical address space and more processes can be partly
in physical memory at any time.
Virtual memory can be much larger that physical memory, with n pages of
virtual memory mapped to m pages of physical memory and other pages on
secondary storage.
Physical memory nowadays is very large, so virtual memory is only useful for
large data sets, and for compilation models.
Demand Paging
Demand paging brings a page into memory only when it is needed. This means
there is less I/O, less memory, a faster response time and more
multiprogramming.
When the first reference to a page occurs, the page will not be in memory
under this system, so a page fault occurs. This is a non-maskable interrupts
(can't be covered up by the user or OS) and it is synchronous, you need to
know the running process that caused the interrupt. This allows for the OS to
bring the appropriate page into main memory and for it to detect invalid
memory references - the OS contains a range of valid addresses for a process.
The process often does not need the full range of virtual memory, hence code
size, data size and stack size can be used to check if a memory reference is
valid. This information is stored on a per process basis in the TCB.
When a page fault occurs, the OS looks at the internal table to decide if the
process is permitted to make the memory reference and if it is, it finds an
empty frame and swaps the page into that frame, updates the page table and the
internal OS tables and continues the instruction that caused the page fault - as
with other standard interrupts.
Pure demand paging is where you start executing a program with no pages
loaded into memory. This will trigger a page fault immediately, however.
Page Replacement
When we want to load a page into memory, we try to find a free frame to
put the page into. If there is no free frame, then we use a page
replacement algorithm to select a victim frame, and then the victim is
copied back onto disk and the new frame is loaded into its place and the
page and frame tables are updated.
The aim for designing a page replacement algorithm is to give the lowest
page-fault rate. An algorithm is evaluated by running a particular string of
memory references and computing the number of page faults on that
string. An ideal page fault graph would look like:
Some algorithms, however give more page faults if the number of pages
is increased - this is called Belady's Anomaly.
The second chance algorithm uses a reference bit, which is initially set to
0 and then is updated to 1 when it is accessed. The second chance
algorithm uses a circular list of pages. When searching for a page to
replace, if a page has a reference bit one, it is set to 0 and the page left in
memory, and the next page in circular order is replaced subject to the
same rules.
The optimal algorithm replaces the page that will not be used for the
longest period of time. It is impossible to have perfect knowledge of
future events, but if you already know what's happening (as with a
reference string), you can use this algorithm to measure how well your
algorithm performs.
The OS may wish some frames to remain in physical memory frames, such as
parts of the OS itself, control structures, I/O buffers, etc..., which is why we
can use page and frame locking. When a frame is locked, it may not be
replaced. This usually requires a lock bit to be added with each frame, and is
usually supported in the OS, not hardware.
Frame Allocation
Again, there are different methods of allocating frames. If there are 100
frames and 5 processes, each process can be given 20 pages. This is called
equal allocation. A different way is called proportional allocation, where
allocation is given according to the size of the process: ai, the allocation
for process pi = si/S × m, where si = size of process pi, S = Σsi and m =
total number of frames.
Thrashing
If a process does not have enough frames allocated, the page fault rate can
get very high, so a lot of swapping back and forth between main memory
and secondary storage is done, and the result is thrashing, which is where
more work is done by the OS swapping and paging than is done by the
application in doing something useful. This has severe performance
implications.
This is not entirely accurate, as you cannot tell within an interval of 5000
when a page reference actually occured. You can improve this by
interrupting more frequently and increasing the number of reference bits,
but this has a greater overhead and reduces the time available to do useful
work.
The working set model is quite clumsy and expensive to implement, and
page-fault frequency is better. An "acceptable" page-fault rate is better,
and if the page fault rate travels outside of these bounds, then the process
loses a frame, and if the page-fault rate gets too high, the process gains a
frame.
Inter Process Communication
We need processes to co-operate to carry out particular jobs, e.g., one process
requests a service from another (such as requesting an I/O operation from the kernel,
as the kernel is still a process), pipelined processes working together (one process
can't proceed until the previous one has finished) and where multiple processes work
on the same task, for example one might read from disk to memory, and another
reads from memory to the audio device. This can be accomplished using inter
process communication (IPC).
The OS must solve some of the problems that IPC causes, however. It must provide
mechanisms to control interaction betwen the processes and must order accesses to
shared resources by processes. The OS must also provide mechanisms by which
resources can wait until the resource is available.
We now have a race condition, which is the situation where several processes access
and manipulate shared data concurrently and the final value of the shared data
depends on which process finishes last, so the data may become inconsistent. To
prevent race conditions and to ensure consistent data, concurrent processes must be
synchronised. This requires mechanisms to ensure the orderly execution of co-
operating processes.
Critical Sections
The final algorithm is a mixture of turn and flags, which waits for the turn of a
process that is waiting to enter its critical section to finish and then enter its
own. This meets all three requirements, and solves the critical section problem
for two processes.
However, we need to solve the problem for n processes, and one way of
implementing this is to use the Bakery (or Lamport) algorithm. Before entering
its critical section, a process receives a number, and the holder of the smallest
number enters its critical section. If processes i and j receive the same number,
then if i < j, then i is served first, else j is served first. The number scheme used
always generates numbers in an increasing order of enumeration.
The software algorithms discussed above for exclusion of processes from critical
sections are problematic as they rely on the continuous testing of variables (busy
waiting), which is inefficient and wasteful of CPU cycles. The protocols are
complex and unenforceable, relying on programmer discipline and co-operation.
The OS implements efficient IPC mechanisms instead.
Signals
Synchronous signals occur at the same place, every time in the program (e.g.,
SIGFPE, SIGILL, SIGSEGV) and are caused by the program itself.
Asynchronous signals may occur at any time from a source other than the
process (another process e.g., SIGKILL, a terminal driver on a key press e.g.,
SIGINT on Ctrl+C, on an alarm clock, e.g., SIGALRM) and a process can
choose how to handle the specific signal.
A signal can be treated in the following ways, it can be ignored (except for
SIGKILL), the default action set by the OS can be executed (the default action
is decided by the signal used, such as terminating the process, suspending the
process, terminating the process and saving a copy in memory, continue a
suspended process or just ignore), or a specific signal-catching function can be
invoked.
Semaphores
Semaphores are "a non-negative integer, which apart from initialisation can
only be acted on by two standard, atomic, uninterruptible operations, WAIT
(which decrements the value of the semaphore) and SIGNAL (which
increments the value)" - E. Dijkstra
You must guarantee that a maximum of one process can execute a WAIT or
SIGNAL operation on the same semaphore at the same time (atomicity).
There can be two types of semaphore, a counting semaphore (the integer value
can range over an unrestricted domain) and a binary semaphore, where the
binary value can range only between 0 and 1. This can be simpler to
implement.
Semaphores can be used to implement mutual exclusion for the critical section
of n processes. The shared data is a binary-semaphore mutex which is WAITed
on when a program wants to enter the critical section and then SIGNALed
when the critical section completes.
POSIX semaphores support two interfaces (Linux does not support these fully).
Unnamed semaphores are used within a process (or between related processes)
and are created by a call to sem_init(...) by a process. Operations
include sem_wait(...) for a wait and sem_post(...) for a signal.
sem_trywait(...) also exists, which works like wait if the semaphore has
a greater count than 0, but doesn't wait if the semaphore is 0, returning
immediately to the calling process. These semaphores are destroyed by a call to
sem_destroy(...), or if the creating process dies.
Named semaphores can be used between any process. The namespace is the
same as the filesystem (so the name of a semaphore is effectively a filename).
These are created using sem_create(...) and then can be manipulated as
above. sem_close(...) removes the link between a process and a
semaphore, and they can live on past the termination of the creating process, so
must be explictly destroyed using sem_destroy(...). Here, the OS
merely stores a list of processes waiting on a particular semaphore (i.e.,
processes that are blocked on a semaphore), but hardware support is still
required for atomicity.
Message Passing
With message passing, processes must name each other explicitly, such as:
send(P, message), to send a message to process P and receive(Q,
message) to receive a message from process Q.
With this method, links are established automatically and a link is associated
with exactly one pair of communicating processes, and between each pair their
exists exactly one link. The link may be unidirectional, but it is usually
bidirectional.
A method of indirect communication involves using messages being directed
and received from mailboxes. Each mailbox has a unique ID and processes can
only communicate only if they share a mailbox. This introduces the problem of
deciding who gets a message if a mailbox is shared, however. The primitives in
this method are: send(A, message) and receive(A, message),
which sends a receives messages from mailbox A.
Zero capacity - 0 messages are held, and the sender must wait for the
receiver
Bounded capacity - here there is a buffer with a finite length of n
messages and the sender must wait if the link is full
Unbounded capacity - infinite length, and the sender never waits
Client-Server Communication
In POSIX, message queues between processes are used. Messages are stored in
FIFO order, and the queue is set up so that either the process sends or receives
blocking (on the full or empty queue respectively), or a the process sends or
receives non-blocking. A process reading from an empty queue will block.
These processes are not implemented at the OS level, but usually as part of the
programming language. There is no atomicity at the application level.
Regions referring to the same shared variable exclude each other in time (i.e.,
mutual exclusion is enforced). When a process tries to execute the region
statement, the boolean expression B is evaluated. If B is true, S is executed, but
if B is false, the process is delayed until B becomes true and no other process is
in the region associated with v.
Critical regions are implemented using semaphores. With the shared variable x,
the following are associated: semaphore mutex, first-delay,
second-delay; int first-count, second-count;. Mutually
exclusive access to the critical section is provided by mutex. If a process can
not enter the critical section because B is false, it initially waits on first-
delay and is moved to second-delay before it is allowed to re-evaluate B.
A track is kept on the number of processes waiting on first-delay and second-
delay by using first-count and second-count respectively.
Monitors are high-level synchronisation constructs that allows the safe sharing
of an abstract data type among concurrent processes. The monitor encapsulates
shared data together with procedures that act upon that data. Only the
procedures can be accessed from outside the monitor; the data inside the
monitor can not be directly accessed from outside the monitor. Only one
process can be active inside a monitor at a time.
To allow a process to wait within a monitor, a condition variable must be
declared as: condition x, y;. The condition variable can only be used
with the operations wait and signal. [Link]() means that the process
invoking the operation is suspended until another process invokes
[Link](), which resumes exactly one suspended process. If no process is
suspended, then the signal operation has no effect.
User processes must always make their calls on the monitor in a correct
sequence.
You must ensure that an uncooperative process does not ignore the
mutual-exclusion gateway provided by the monitor and try to access the
shared resource directly, without using the access protocols.
Deadlocks
Deadlocks occur when a set of blocked processes are each holding a resource and
waiting to acquire a resource held by another process in the set. Deadlocks can be
characterised by the bridge-crossing problem:
There is only one lane across the bridge. We can view each section of the bridge as a
resource, and if a deadlock occurs, it can be resolved if one car backs up (pre-empt
resources and rollback), but sometimes that will require the cars following it to be
backed up also. Starvation can also occur, if only cars from one direction are
allowed to pass.
1. Mutual exclusion: only one process at a time can use the resource
2. Hold and wait: a process holding at least one resource is waiting to acquire
additional resources held by other processes
3. No preemption: a resource can only be released voluntarily by the process
holding it, after the process has completed its task
4. Circular wait: there exists a set {P0, P1, ..., Pn} of waiting processes such that
P0 is waiting for a resource that is held by P1, which is waiting for a resource
that is held by P2, ..., Pn - 1 is waiting for a resource that is held by Pn, that is
waiting for a resource that is held by P0.
We could plot a resource allocation graph which consists of a set of vertices V and a
set of edges E, such that V is partitioned into two types, P = {P1, P2, ..., Pn}, the set
of all processes in the system and R = {R1, R2, ..., Rm}, the set of all resource types
in the system. Each process utilises a resource using: requests, a directed edge Pi →
Rj, which if it can not be assigned immediately, the requesting process is blocked;
and resource use, indicated by a directed edge Rj → Pi. That is, the process uses the
resource. When a resource is released, edges are removed.
The above resource allocation graph has no deadlocks, as there are no circularities.
P3 has all the resources it needs; when it frees R3 then P2 can proceed; when P2
releases R1, then P1 can proceed.
The above resource allocation graph does have a deadlock, however. Mutual
exclusion holds over all of the resources, a number of processes hold resources and
are waiting for others and no process wants to release a resource. P3 holds R3 and
wants R2; P2 holds R1 and R2 and wants R3 and P1 holds R2 and wants R1. This is a
circular wait, and we have all the conditions for deadlock.
If a graph contains a cycle, the following basic rules can be applied: if the graph
contains no cycles, no deadlock occurs; if the graph contains a cycle, if there is only
one instance per resource type, then deadlock occurs, otherwise if there are several
instances per resource type, there is the possibility of deadlock.
Handling Deadlocks
The free basic methods for handling deadlocks are prevention/avoidance
(ensuring that the system will never enter a deadlock state), detection and
recovery (allows the system to enter a deadlock state and then recover from it)
and the head-in-the-sand approach (ignore the problem and pretend that
deadlocks never occur - this is the approach used by most OS, including UNIX,
Linux and Windows).
Mutual exclusion is not required for sharable resources, but it must hold
for non-sharable resources.
Hold and wait must guarantee that whenever a process requests a
resource, it does not hold any other resources. The process must request
and be allocated all of its resources before it begins execution, or the
process can only request resources when the process currently has none.
This does have the possibility of low resource utilisation and starvation.
No preemption - If a process that is holding some resources requests
another resource that can not be immediately allocated to it, then all
resources currently being held are released. Preempted resources are
added to the list of resources for which the process is waiting and the
process will only be restarted when it can regain its old resources, as
well as the new ones it is requesting.
Circular wait - impose a total ordering of all resource types and require
that each process requests resources in an increasing order of
enumeration.
Deadlock algorithms prevent deadlock, so deadlock will never occur if (at least
one of) the restrictions are enforced. But, there is a cost to this system. To
restrain how resource requests are made can have the potential for low device
utilisation and a reduced system throughput.
Systems States
Avoidance requires that the system will never enter an unsafe state, so an
algorithm must pick a safe sequence of events and processes for execution.
One such algorithm is the resource-allocation graph algorithm. A claim edge Pi
→ Rj indicates that process Pi may request resource Rj which is represented by
a dashed line. A claim edge is converted to a request edge when a process
requests a resource. When a resource is released by a process, the assignment
reconverts to a claim edge. Resources must be claimed a priori in the system.
File Systems
Files are an integral part of most computer systems. Input to applications can be by
means of a file, and output can be saved in a file for long-term storage. File systems
can store large amounts of data, and the storage is persistent. The file management
system is considered part of the OS.
The file managment system is a way a process may access files. The programmer
does not need to develop file management software, this is provided by the OS. The
objectives of a file management system are to meet the data management needs of
the user, to guarantee the data in the file is valid, to minimise the potential for lost or
destroyed data and to optimise performance. We can also break down what the user
requirements for data management are into:
To enable this, the file management system typically provides functions to indentify
and locate a selected file, to organise files in a structure (often accomplished using
directories) which describes the locations of all files and their attributes, and to
provide mechanisms for protection. The file management systems provides
secondary storage management, that is, it allocates files to free disk blocks and
manages free storage for available disk blocks.
Files
A file is a contiguous, logical addres space (apart from sparse files), however,
they may not be stored as a contiguous file. Files may be of different types, but
from the perspective of the OS, files have no structure - all a file is is a
sequence of machine words and bytes. However, from the user or application
perspective, files may have a simple record structure made up of lines (of fixed
or variable length) or complex structures, which have special control characters
in the structure to make up files of various types.
The OS only provides basic file operations on files, such as: create, write, read,
seek, delete and truncate, so more complex operations must be made up of
combinations of these operations, e.g., copy is a create, followed by a read
from a source file and a write to the new file.
File systems have criteria for file organisation, such as rapid access (especially
needed when accessing a single data item), ease of update (not always a concern,
however, e.g., CD-ROM), economy of storage (want to reduced wasted space,
although this is less of a concern in modern computing), simple maintenance and
reliability.
A fixed format is used for structuring data into records. Records are normally
the same length, and the fields in a record are in fixed order. One field (usually
the first) is a key, which uniquely identifies the record within that file. Usually,
records are stored in key order and batch updates are used to reduce the update
time (records are cached and then ordered before an update is done). This
method is designed for sequential access devices (tapes), but sequential files
can be implemented on random access devices, such as disks.
Additions to and deletions from the file can cause problems. The file is stored
in a sequential ordering of record keys, so insertion requires "moving" some of
the records on the storage medium. This is very expensive in time. Usually,
new records are placed in a new file, called the overflow, log or transaction file
and a periodic batch update occurs that merges the overflow file with the
original file. Alternatively, records are organised as a linked list. More
overhead in terms of storage is required, and additional processing is required
also. This only really works (on sequential storage devices) if the records
remain in key order. Deletion requests can also be stored in the log file.
To reduce slow, sequential access files, and to support random access storage
devices better, indexed sequential files can be used. The index provides the
lookup capability to get in the vicinity of a desired record faster. The index is
searched for a key field that is equal or less than the desired key value which
contains a pointer to the main file. The search then continues in the main file
by the location indicated by the pointer. The records are still stored in order,
and new records are added to an overflow file (a record in the main file
preceding the new record is updated to contain a pointer to the new record in
the overflow file). The overflow is then periodically merged with the main file
during a batch update.
Direct access files exploit the capability of random access devices (such as
disks) to access directly any address on the device. No sequential ordering of
data in the file is required, and this method can be used where rapid access is
required, such as in swapping or convential user files. It is not that useful for
backups or archives where only occasional access is required (these can
therefore be implemented on cheaper media, such as tapes).
Directories
Directories are used to organise files, and they themselves are files owned by
the OS. Directories contain information about files, such as name, attributes,
location, ownership, size, access/update times and protection information. They
provide a mapping between file names and the files themselves.
From a user perspective, they need to search for a file (by name or attribute),
create a file, delete a file, list a directory and rename a file, etc...
There are multiple ways of implementing directories. One such way is the
single-level directory, where a single directory exists for all users. However, if
all users share the same directory, they must ensure that all file names are
unique, and there is the grouping problem of little structure or organisation.
In a two-level directory setup, each user has their own directory, which allows
a user to have the same filename as another user, this allows for efficient
searching, but has the problem of there being no grouping capability between
users.
Tree structure prohibits the sharing of files and directories, however. Acyclic
graphs remove this restriction. They are more complex than tree, and two
different names can refer to the same file (aliasing), so there can be multiple
path names for the same file. This does introduce a deletion problem, however
- you can not delete a file until all references to it are solved. This is difficult,
so we can solve it using backpointers (so we can delete all the pointers to a
file), have backpointers in a daisy-chain organisation, or by using an entry-
hold-count solution.
Acyclic graphs are complex to manage, however. You need to ensure there are
no cycles, which links can introduce. Merely having subdirectories and files
can not introduce cycles. Another way of guaranteeing no cycles is to only
allow links to files, not subdirectories, or by using a cycle detection algorithm
every time a new link is added to determine whether or not it is okay (this is
expensive).
A file has to be opened before it is of any use, and similarly, a file system must be
mounted before any files or directories are available.
Protection
Protection exists to make sure that files are not read or modified by
unauthorised persons. Protection mechanisms are provided by the OS to
safeguard information inside the computer. Objects and the rights that they
have in particular contexts are identified in the aim to prohibit processes from
accessing objects they are not authorised to access.
A domain is associated with a set of pairs <objects, rights>, where rights will
be a subset of operations that are allowed on the object, and a domain could be
a user or a process.
The capability list is a row of the matrix, and a capability then is like an
abstract data type, that is, it points to an object and specifies which operations
are allowed on it (it may also have a field 'type' indicating what type of object
it is). The list then specifies what capabilities are accessible in this domain.
In UNIX, the file owner (by default, the creator) is able to control what should
be done, and by whom. Types of accesses include read, write and execute. In
UNIX, this is implemented as a bitmask with three domains, owner, group and
other. Groups in this case are created by a sysadmin, not the user.
File systems reside on secondary storage and contains both programs and data. It
also contains swapped out pages as part of the virtual memory management system.
The main issues to be addressed here are how to store files (a contiguous logical
structure) on secondary storage (disk) and mapping high level file operations on to
low-level disk operations, whilst requiring efficient use of the disk and fast
read/write times.
Disks provide a series of blocks, usually of a fixed size, which are sometimes related
to the memory frame/page size, often as a multiple. The number of file blocks can
be loaded into one physical memory frame.
Disks are random access, that is, you can directly address any given block on the
disk, and it is simple to access any file sequentially or randomly. For I/O efficiency,
transfers between memory and disk are often performed in units of blocks.
For convenient access to disks, the OS imposes a logical file system, which defines
how files look to the user (i.e., operations, file attributes, directories, etc...).
Algorithms and data structures are needed to map the logical file structure onto the
physical disk. This file organisation maps logical file structure onto physical block
addresses. A basic file system issues generic commands to a device driver to read
and write blocks. Finally, the I/O control contains device drivers to translate basic
file system commands into commands for a specific disk.
Several structures are used to implement file systems, such as on disk structures
including the boot control block, the partition control block, the directory structure
to organise the files and a per file file control block (FCB) containing permissions,
dates (access, create, modify), owner, group, size and blocks. In memory structures
are also used, such as the partition table, the directory structure (with recently
accessed directories) and an open file table (both system wide, and per process).
The above diagram shows the necessary file system structures provided by the OS to
open a file (a) and reading a file (b).
Virtual file systems exist so that a single interface can be provided by the OS to
many different types of file systems. Different file systems can be mounted on
different mount points within the same directory structure, and operations have
the same effect on all the different file systems.
This logical file system is called the Virtual File System, or VFS and it allows
the same system call interface (the API) to be used for different types of file
system. The API is to the VFS interface, rather than any specific type of file
system, and the VFS can distinguish between local and remote files.
Directory Implementation
Linked allocation solves the problems of contiguous allocation, but it does not
support random access tables without a FAT. This can be achieved with an
index table - all the pointers are brought together into an index block, which
exists per file, where the ith entry points to the ith block of file, and this gives
us random access. With large files, however, large index tables are needed,
which is a disadvantage; multi-level indexing can be introduced to combat this,
however. Additionally, the index table is an overhead.
Since disk space is limited, space needs to be reused from deleted files and free
space needs to be monitored. One way of implementing this is using a bit
vector which shows when a block is free or occupied. An alternative to this bit
vector is the linked list. A pointer is kept to the first free block, which contains
a pointer to the next free block, etc... This is inefficient, as to traverse the list
requires multiple disk block reads, however this is an uncommon operation, as
most operations require just one block. However, the linked list does not
require any extra free space, unlike the bit vector, although there is no easy
way to get contiguous space easily.
These methods can be modified using grouping, storing the addresses of the
next n free blocks in the first free block; next n addresses of free blocks in the
next free block, etc... - this is more efficient in the number of blocks that need
to be read; and counting, which takes advantage of the fact that free blocks
usually form together. The first free block contains the number of free blocks
immediately following (i.e., forming a contiguous free store), together with the
address of the next free block at the start of some contiguous free store.
This assumes a structure where all files have distinct absolute paths, using a
tree hierarchy. It is possible to set up cycles via symbolic links
The VPS provides uniform access to all file systems, including special files
(e.g., /proc, /dev).
In an i-node, the first few disk addresses are stored (which for small files, will
be all the disk addresses) and there are links to further i-nodes which contain
further disk addresses. This may go up to further indirection, which gives us a
very flexible system at the expense of complexity and storage space.
For directories, each entry contains a file name and an i-node number. Here,
the i-node contains information about the time, size, type, ownership and disk
blocks contained in the i-node. If you imagine that the file system knows the
root directory and wants to look up /usr/neil/ops/exam, then from the root i-
node, the system locates the i-node number of the file /usr, and then looks for
the next component neil, and so on. Relative paths work in the same way,
where there is a known current directory i-node.
The Ext2 Linux general purpose filesystem implements this, with a mostly
static allocation of i-nodes. The disk is split up into block groups of one or
more disk cylinders (i.e., those physically together) and blocks are used for i-
nodes and file data blocks. Bitmaps are used for keeping data blocks and i-
nodes in the group. When allocating disk blocks, related information is
attempted to be kept together (i.e., i-nodes of all files in a directory in the same
block group) and new directories are created in block groups with the smallest
number of directories.
This scheme works well as long as there is > 10% free space.
With journalling file systems, files are stored in both disc and main memory (a
memory cache of disk blocks is used for efficiency), however we have to
consider the possibility of what happens when the system fails. On the disk,
file system data structure may be corrupted, but we must ensure there is no loss
of data (i.e., consistency is maintained). Journaling (or log-based transaction)
file systems attempt to maintain consistency.
For a file operation, updates to the disk structure are written to a log, and when
complete, the changes are considered committed. The updates for a file
operation form a transaction, and actual updates to disk structures are
performed so that if there is a crash, it is known how much of the transaction
has been performed (and which other transactions remain in the log). ReiserFS
under Linux is an example of a journalling file system.
I/O
Handling I/O efficiently is important to the user, and therefore to the OS. All user
interaction with the machine is done using I/O and most user invoked operations
(i.e., applications) will use I/O at some point.
Devices tend to be accessible in the same place for a particular architecture and each
address range maps to the I/O control registers for the deviced accessed by special
I/O instructions. Some devices have both direct and memory-mapped address space
(e.g., graphics cards).
I/O ports typically consist of a status register, which contains the bits read by the
host that indiciate device state (e.g., data ready for a ready), a control register which
enables the host to change the mode of a device (e.g., from full to half-duplex for
serial ports), a data-in register which is read by the host to get input and the data-out
register, which is written to by the host for data to be sent to the device. This
introduces the issue of how does the host know when a particular condition has
arisen in the status register.
Polling is one way to solve this problem, and is a host initiated approach. The device
status register is polled until the wanted condition occurs, but this way is inefficient
as there is a busy-wait cycle to wait for I/O from a device. The alternative solution is
using interrupts, which is initiated by both the host and the device. A CPU interrupt
request line is triggered by the device and the interrupt handler receives interrupts
and uses an interrupt vector to dispatch an interrupt to the correct handler based on
priority. Most interrupts can be maskable to ignore or delay some interrupts, and the
interrupt mechanism can be used for exceptions. There is no need for busy-waiting
in this model.
In the IA32 architecture, interrupts 0-31 are unmaskable, as some are exceptions,
and 32-255 are maskable and are used for devices.
There are many different hardware devices, but application processes want a
common interface, but devices can vary in many dimensions: character (deals
with bytes or words of data, e.g., keyboards, mice, serial ports, etc and
commands to deal with this include get and put - often libraries are
implemented to allow in-line editing), stream (a stream of bits) or block
(devices deal with blocks of data, such as disks, commands include read, write,
seek and most raw I/O filesystem access uses this) transfer; sequential or
random access; sharable or dedicated device; device speed; read-write, read-
only or write-only. Timer devices also exist, which provide the current time,
elapsed time, or just a standard timer.
The I/O subsystem of the kernel implements I/O scheduling and some I/O
request ordering via a per-device queue, although some OSs do try to
implement fairness. Additionally, buffering is also implemented, where data is
stored in memory while it is transferred between devices (this copes with
device speed mismatch and transfer size mismatch and "copy semantics).
The kernel has data structures for things like keeping state info for I/O
components including open file tables, network connections and the character
device state, as well as many, many other data structures to track buffers,
memory allocation and "dirty" blocks. Some OSs use object-orrientated
methods and message passing to implement I/O.
Some OSs use a file based approach to kernel I/O structures. Devices are
named as files and accesses to devices are by file operations (with some
exceptions, e.g., seek in a character device may have no effect). In Linux, an
additional command ioctl() provides direct access to the device driver, perhaps
to set the device into a particular mode (e.g., changing speed or other
parameters of a serial port).
I/O Performance
The main challenge with direct access files is efficiently organizing and retrieving data without requiring structurally ordered storage, which can lead to complexities in locating and updating records . These files leverage random access capabilities of storage devices to address this by enabling direct retrieval of data blocks regardless of their physical location . Algorithms and data structures, such as hash tables, provide efficient mappings from logical to physical storage . This organization allows direct access files to provide rapid data retrieval and update capabilities.
Sequential file access reads records in order, which can be inefficient, especially for large files requiring mid-file access, as it necessitates starting from the file's beginning each time . Indexed sequential access improves efficiency by adding an index used to quickly locate desired records, reducing lookup times drastically. For example, with 1,000,000 records, sequential access may take 500,000 accesses on average, whereas indexed sequential access reduces this to about 1000 accesses by first locating record pointers via an index . Hence, indexed sequential access is much more efficient for large datasets requiring frequent non-sequential access.
The VFS facilitates interoperability by providing a uniform API for accessing various file systems, regardless of their type or underlying structure . It abstracts the complexities of different file systems, allowing users and applications to interact with files without concerning themselves with physical details. VFS manages files mounted at different points within the file structure and distinguishes between local and remote files, enabling seamless operations across disparate systems . This interface enhances flexibility and reduces the need for specialized handling of different file types within applications.
The multilevel feedback queue allows processes to move between various queues based on specific scheduling policies . It provides the flexibility to adjust process priorities dynamically, a key advantage for accommodating different process requirements and improving overall system performance. The inclusion of aging helps address starvation by ensuring that lower-priority processes eventually receive CPU time, as they get promoted to higher priority queues over time . This balances workload distribution effectively across different types of processes.
Dynamic linking defers the resolution of external symbols until program execution, unlike static linking, where all needed libraries are linked at compile time . The main benefits of dynamic linking include reduced memory usage since libraries are loaded only as needed during execution, leading to shared code and data across different programs . This results in decreased disk space and memory requirements, as multiple programs can share the same copy of a library, enhancing system efficiency and flexibility.
Paging addresses memory fragmentation by dividing the physical memory into fixed-size blocks called frames and the logical memory into pages that fit into these frames . This eliminates the need for contiguous memory allocation, thus tackling issues of external fragmentation. However, paging still results in internal fragmentation when a process does not perfectly use up a complete number of frames, but this is minimal compared to problems arising from contiguous memory allocation . By ensuring processes are given noncontiguous blocks of memory, paging effectively resolves the issues of fragmentation and enhances memory utilization.
Memory management allocates and manages both physical and virtual memory to ensure safe sharing of memory among processes . It provides protections to prevent processes from accessing each other's memory spaces, using techniques such as partitioning memory and implementing paging to manage workload effectively. These techniques help in time-sharing the system and multiprogramming the processor without interfering with each process's allocated memory space . Techniques like virtual memory allow processes to utilize more memory than the physically available memory, ensuring efficiency.
Using a hash table in directory implementation offers the advantage of decreased search time due to its efficient lookup capabilities, making it suitable for quick access to files . However, it introduces challenges related to hash collisions, where multiple file names map to the same hash location. This necessitates additional handling mechanisms, such as chained-overflow hash tables, to resolve conflicts effectively, adding overhead to maintenance . While offering speed advantages, the complexity and potential increase in storage due to collision resolution can be seen as drawbacks depending on the specific use case requirements.
Swapping is necessary when processes exceed available physical memory, requiring temporary storage of inactive process data on a backing store such as a disk . It allows the system to manage memory usage dynamically, making room for higher-priority processes by moving lower-priority ones out temporarily, known as roll-out, roll-in . While swapping enables better resource allocation and CPU time sharing, its impact on system performance includes overhead related to transfer time, which is proportional to the amount of memory swapped . Thus, while necessary for resource management, efficient swapping mechanisms are crucial to mitigate potential performance degradation.
In UNIX systems, processes can create a hierarchy or tree structure called a process group, where a parent process can create child processes, and each child can further spawn its own children . In contrast, Windows does not implement a process hierarchy in the same way; all processes are created as equals, without forming a structured tree . This difference affects how processes are managed and terminated, especially concerning resource allocation and signal broadcasting within the systems.