0% found this document useful (0 votes)
7 views22 pages

Disk Performance Optimization Techniques

The document discusses disk performance optimization, focusing on disk scheduling algorithms used by operating systems to manage I/O requests efficiently. It covers various algorithms such as FCFS, SSTF, SCAN, C-SCAN, and LOOK, explaining their advantages and disadvantages in terms of seek time, rotational latency, and response time. Additionally, it addresses file management, including file operations, structures, types, access mechanisms, and space allocation methods.

Uploaded by

Indrakumari
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views22 pages

Disk Performance Optimization Techniques

The document discusses disk performance optimization, focusing on disk scheduling algorithms used by operating systems to manage I/O requests efficiently. It covers various algorithms such as FCFS, SSTF, SCAN, C-SCAN, and LOOK, explaining their advantages and disadvantages in terms of seek time, rotational latency, and response time. Additionally, it addresses file management, including file operations, structures, types, access mechanisms, and space allocation methods.

Uploaded by

Indrakumari
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT V

DEVICE AND INFORMATION MANAGEMENT DISK PERFORMACE OPTIMIZATION

Disk scheduling is done by operating systems to schedule I/O requests arriving for the disk. Disk
scheduling is also known as I/O scheduling.
Disk scheduling is important because:
• Multiple I/O requests may arrive by different processes and only one I/O request can be served at a
time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need to be
scheduled.
• Two or more request may be far from each other so can result in greater disk arm movement.
• Hard drives are one of the slowest parts of the computer system and thus need to be accessed in an
efficient manner.
There are many Disk Scheduling Algorithms but before discussing them let’s have a quick look at some of
the important terms:
OPERATION OF MOVING-HEAD DISK STORAGE

• Data is recorded in a system using disk and can be access when needed.
• Data is recorded on a serious of magnetic disk or platters.
• These disks are connected by a common spindle that spins at very high speed. (30600 revaluation
per minutes)
• The data is access by the read / write head, one head per disk surface.
• A read / write head can access only the data adjacent to it.
• For accessing the data the disk must rotate until the read / write head is adjacent to it.
• The time it takes for the data to rotate from the current position to s position to read /write
head is called latency time.
• All read / write heads are attached to a single boom or moving arm assembly.
• The boom may move in or out.
• It moves in when the disk is to be read or write and
• Moves out, when the read / write operation is completed.

Figure: Schematic of a moving-head disk

73
Components of a disk access

• Seek Time:Seek time is the time taken to locate the disk arm to a specified track where the data is
to be read or write. So the disk scheduling algorithm that gives minimum average seek time is better.
• Rotational Latency: Rotational Latency is the time taken by the desired sector of disk to rotate into
a position so that it can access the read/write heads. So the disk scheduling algorithm that gives
minimum rotational latency is better.
• Transfer Time: Transfer time is the time to transfer the data. It depends on the rotating speed of
the disk and number of bytes to be transferred.
• Disk Access Time: Disk Access Time is:

Disk Access Time = Seek Time +


Rotational Latency +
Transfer Time

74
• Disk Response Time: Response Time is the average of time spent by a request waiting to perform
its I/O operation. Average Response time is the response time of the all requests. Variance Response
Time is measure of how individual request are serviced with respect to average response time. So the
disk scheduling algorithm that gives minimum variance response time is better.
NEED FOR DISK SCHEDULING

Disk Scheduling

• disk transfer speeds are limited primarily by seek times and rotational latency. When multiple
requests are to be processed there is also some inherent delay in waiting for other requests to be
processed.
• Bandwidth is measured by the amount of data transferred divided by the total amount of time from
the first request being made to the last transfer being completed, ( for a series of disk requests. )
• Both bandwidth and access time can be improved by processing requests in a good order.
• Disk requests include the disk address, memory address, number of sectors to transfer, and whether
the request is for reading or writing.

FCFS Scheduling

• First-Come First-Serve is simple and intrinsically fair, but not very efficient. Consider in
the following sequence the wild swing from cylinder 122 to 14 and then back to 124:

Figure - FCFS disk scheduling.

SSTF Scheduling

• Shortest Seek Time First scheduling is more efficient, but may lead to starvation if a
constant stream of requests arrives for the same general area of the disk.
• SSTF reduces the total head movement to 236 cylinders, down from 640 required for the
same set of requests under FCFS. Note, however that the distance could be reduced still
further to 208 by starting with 37 and then 14 first before processing the rest of the requests.

75
Figure - SSTF disk scheduling.

SCAN Scheduling

• The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from one end of
the disk to the other, similarly to an elevator processing requests in a tall building.

Figure - SCAN disk scheduling.

76
• Under the SCAN algorithm, if a request arrives just ahead of the moving head then it will be
processed right away, but if it arrives just after the head has passed, then it will have to wait
for the head to pass going the other way on the return trip.
• Consider, for example, when the head reaches the high end of the disk: Requests with high
cylinder numbers just missed the passing head, which means they are all fairly recent
requests,
• whereas requests with low numbers may have been waiting for a much longer time. Making
the return scan from high to low then ends up accessing recent requests first and making
older requests wait that much longer.

C-SCAN Scheduling

• The Circular-SCAN algorithm improves upon SCAN by treating all requests in a circular
queue fashion - Once the head reaches the end of the disk, it returns to the other end without
processing any requests, and then starts again from the beginning of the disk:

Figure - C-SCAN disk scheduling.

LOOK Scheduling

• LOOK scheduling improves upon SCAN by looking ahead at the queue of pending
requests, and not moving the heads any farther towards the end of the disk than is necessary.
The following diagram illustrates the circular form of LOOK:

77
Figure - C-LOOK disk scheduling.

Selection of a Disk-Scheduling Algorithm

• With very low loads all algorithms are equal, since there will normally only be one request
to process at a time.
• For slightly larger loads, SSTF offers better performance than FCFS, but may lead to
starvation when loads become heavy enough.
• For busier systems, SCAN and LOOK algorithms eliminate starvation problems.
• The actual optimal algorithm may be something even more complex than those discussed
here, but the incremental improvements are generally not worth the additional overhead.
• Some improvement to overall filesystem access times can be made by intelligent placement
of directory and/or inode information.
• On modern disks the rotational latency can be almost as significant as the seek time,
however it is not within the OSes control to account for that, because modern disks do not
reveal their internal sector mapping schemes, ( particularly when bad blocks have been
remapped to spare sectors. )
• Some disk manufacturers provide for disk scheduling algorithms directly on their
disk controllers, ( which do know the actual geometry of the disk as well as any
remapping ), so that if a series of requests are sent from the computer to the
controller then those requests can be processed in an optimal order.
• Unfortunately there are some considerations that the OS must take into account that
are beyond the abilities of the on-board disk-scheduling algorithms, such as priorities
of some requests over others, or the need to process certain requests in a particular
order. For this reason OSes may elect to spoon-feed requests to the disk controller
one at a time in certain situations.

78
SEEK OPTIMIZATION

1) FCFS (First-Come-First Served) Scheduling:

• In FCFS scheduling, the first request to arrive is the first one serviced. FCFS is fair in the same that
once a request has arrived, its place in the schedule is fixed. A request cannot be displaced because
of the arrival of a higher priority request.
• FCFS will actually do a lengthy seek to service a distant waiting request even though another
request may have just arrived on the same cylinder to which the read-write head is currently
positioned.
• It ignores the positional relationships among the pending requests in the queue. FCFS is acceptable
when the load on a disk is light. FCFS tend to saturate the device and response times become large.

2) SSTF(Shortest-Seek-Time-First) Scheduling:

• In SSTF Scheduling, the request that results in the shortest seek distance is serviced next, even if
that request is not the first one in the queue.
• SSTF is a cylinder –oriented scheme SSTF seek patterns tend to be highly localized with the result
that the innermost and outermost tracks can receive poor service compared with the mid-range
tracks.

• SSTF results in better throughput rates than FCFS, and mean response times tend to be lower for
moderate loads.
• One significant drawback is that higher variance occur on response times because of the
discrimination against the outermost and innermost tracks.
• SSTF is useful in batch processing systems where throughput is the major consideration. But its
high variance of response times (ie., its lack of predictability) makes it unacceptable in interactive
systems.

79
3) SCAN Scheduling:

• Denning developed the SCAN scheduling strategy to overcome the discrimination and high
variance in response times of SSTF.
• SCAN operates like SSTF except that it chooses the request that results in the shortest seek distance
in a preferred direction.
• If the preferred direction is currently outward, then the SCAN strategy chooses the shortest seek
distance in the outward direction.
• SCAN does not change direction until it reaches the outermost cylinder or until there are no further
requests pending in the preferred direction.
• It is sometimes called the elevator algorithm because an elevator normally continues in one
direction until there are no more requests pending and then it reverses direction.

• SCAN behaves very much like SSTF in terms of improved throughput and improved mean
response times, but it eliminates much of the discrimination inherent in SSTF schemes and offers
much lower variance.

4) N-STEP SCAN SCHEDULING:

• One interesting modification to the basic SCAN strategy is called N-STEP SCAN . In this strategy,
the disk arm moves back and forth as in SCAN except that it services only those requests waiting
when a particular sweep begins.
• N-STEP SCAN offers good performance in throughput and mean response time
• N-STEP SCAN avoids the possibility of indefinite postponement occurring if a large number of
requests arrive for the current cylinder.

80
5)C-SCAN SCHEDULING:
• Another interesting modification to the basic SCAN strategy is called C-SCAN (for circular
SCAN).
• The arm moves from the outer cylinder to the inner cylinder, servicing requests on a shortest-seek
basis.
• When the arm has completed its inward sweep, it jumps (without servicing requests) to the request
nearer the outermost cylinder, and then resumes its inward sweep processing requests.
• Thus C-SCAN completely eliminates the discrimination against requests for the innermost or
outermost cylinder.
• It has a very small variance in response times. At low loading, the SCAN policy is best. At medium
to heavy loading, C-SCAN yields the best results.

81
FILE AND DATABASE SYSTEM

Introduction

• File is a named collection of data it resides on a secondary storage device such as disk or tap.

File operations:

• Open: prepare a file to be referenced.

• Close: prevent file reference.

• Create: build a new file.

• Destroy: remove a file.

82
• Copy: create another version of file with a new name.

• Rename: change the name of a file.

• List: print or display the content of a file.

File Manipulation operations:

• Read: Input a data item to a process from a file.

• Write: Output a data item from a process to a file.

• Update: Modify an existing data item in a file.

• Insert: Add a new data item in to the file.

• Delete: Remove a data item in a file.

File are characterized by the following:

• Volatility: it refers to the frequency in which addition and deletion are mad in file.

• Activity: it refers to percentage of a files record accessed in a given times.

• Size: it refers to the amount of information stored the file.

THE FILE SYSTEM

83
File
A file is a named collection of related information that is recorded on secondary storage such as magnetic
disks, magnetic tapes and optical disks. In general, a file is a sequence of bits, bytes, lines or records
whose meaning is defined by the files creator and user.
File Structure
A File Structure should be according to a required format that the operating system can understand.
• A file has a certain defined structure according to its type.
• A text file is a sequence of characters organized into lines.
• A source file is a sequence of procedures and functions.
• An object file is a sequence of bytes organized into blocks that are understandable by the machine.
• When operating system defines different file structures, it also contains the code to support these
file structure. Unix, MS-DOS support minimum number of file structure.
File Type
File type refers to the ability of the operating system to distinguish different types of file such as text files
source files and binary files etc. Many operating systems support many types of files. Operating system
like MS-DOS and UNIX have the following types of files −

Ordinary files

• These are the files that contain user information.


• These may have text, databases or executable program.
• The user can apply various operations on such files like add, modify, delete or even remove the
entire file.

Directory files

• These files contain list of file names and other information related to these files.

Special files

• These files are also known as device files.


• These files represent physical device like disks, terminals, printers, networks, tape drive etc.
These files are of two types −
• Character special files − data is handled character by character as in case of terminals or printers.
• Block special files − data is handled in blocks as in the case of disks and tapes.
File Access Mechanisms
File access mechanism refers to the manner in which the records of a file may be accessed. There are
several ways to access files −

• Sequential access
• Direct/Random access

84
• Indexed sequential access

Sequential access

A sequential access is that in which the records are accessed in some sequence, i.e., the information in the
file is processed in order, one record after the other. This access method is the most primitive one.
Example: Compilers usually access files in this fashion.

Direct/Random access

• Random access file organization provides, accessing the records directly.


• Each record has its own address on the file with by the help of which it can be directly accessed for
reading or writing.
• The records need not be in any sequence within the file and they need not be in adjacent locations
on the storage medium.

Indexed sequential access

• This mechanism is built up on base of sequential access.


• An index is created for each file which contains pointers to various blocks.
• Index is searched sequentially and its pointer is used to access the file directly.
Space Allocation
Files are allocated disk spaces by operating system. Operating systems deploy following three main ways
to allocate disk space to files.

• Contiguous Allocation
• Linked Allocation
• Indexed Allocation

Contiguous Allocation

• Each file occupies a contiguous address space on disk.


• Assigned disk address is in linear order.
• Easy to implement.
• External fragmentation is a major issue with this type of allocation technique.

Linked Allocation

• Each file carries a list of links to disk blocks.


• Directory contains link / pointer to first block of a file.
• No external fragmentation
• Effectively used in sequential access file.
• Inefficient in case of direct access file.

85
Indexed Allocation

• Provides solutions to problems of contiguous and linked allocation.


• A index block is created having all pointers to files.
• Each file has its own index block which stores the addresses of disk space occupied by the file.
• Directory contains the addresses of index blocks of files.

FILE ORGANIZATION

• File organization refers to the way data is stored in a file. File organization is very important
because it determines the methods of access, efficiency, flexibility and storage devices to use. There
are four methods of organizing files on a storage media. This include:

• sequential,
• random,
• serial and
• indexed-sequential

Sequential file organization

• Records are stored and accessed in a particular order sorted using a key field.
• Retrieval requires searching sequentially through the entire file record by record to the end.
• Because the record in a file are sorted in a particular order, better file searching methods like the
binary search technique can be used to reduce the time used for searching a file .
• Since the records are sorted, it is possible to know in which half of the file a particular record being
searched is located, Hence this method repeatedly divides the set of records in the file into two
halves and searches only the half on which the records is found.
• For example, of the file has records with key fields 20, 30, 40, 50, 60 and the computer is searching
for a record with key field 50, it starts at 40 upwards in its search, ignoring the first half of the set.

Advantages of sequential file organization

• The sorting makes it easy to access records.


• The binary chop technique can be used to reduce record search time by as much as half the time
taken.

Disadvantages of sequential file organization

• The sorting does not remove the need to access other records as the search looks for particular
records.
• Sequential records cannot support modern technologies that require fast access to stored records.
• The requirement that all records be of the same size is sometimes difficult to enforce.

Random or direct file organization

• Records are stored randomly but accessed directly.


• To access a file stored randomly, a record key is used to determine where a record is stored on the
storage media.

86
• Magnetic and optical disks allow data to be stored and accessed randomly.

Advantages of random file access

• Quick retrieval of records.


• The records can be of different sizes.

Serial file organization

• Records in a file are stored and accessed one after another.


• The records are not stored in any way on the storage medium this type of organization is mainly
used on magnetic tapes.

Advantages of serial file organization

• It is simple
• It is cheap

Disadvantages of serial file organization

• It is cumbersome to access because you have to access all proceeding records before retrieving the
one being searched.
• Wastage of space on medium in form of inter-record gap.
• It cannot support modern high speed requirements for quick record access.

Indexed-sequential file organization method

• Almost similar to sequential method only that, an index is used to enable the computer to locate
individual records on the storage media. For example, on a magnetic drum, records are stored
sequential on the tracks. However, each record is assigned an index that can be used to access it
directly.

ALLOCATING AND FREEING SPACE

The allocation methods define how the files are stored in the disk blocks. There are three main disk space
or file allocation methods.
• Contiguous Allocation
• Linked Allocation
• Indexed Allocation

The main idea behind these methods is to provide:


• Efficient disk space utilization.
• Fast access to the file blocks.
All the three methods have their own advantages and disadvantages as discussed below:
1. Contiguous Allocation

In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file requires n
blocks and is given a block b as the starting location, then the blocks assigned to the file will be: b, b+1,

87
b+2,……b+n-1. This means that given the starting block address and the length of the file (in terms of
blocks required), we can determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
• Address of starting block
• Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks. Therefore, it
occupies 19, 20, 21, 22, 23, 24 blocks.

Advantages:
• Both the Sequential and Direct Accesses are supported by this. For direct access, the address of the
kth block of the file which starts at block b can easily be obtained as (b+k).
• This is extremely fast since the number of seeks are minimal because of contiguous allocation of
file blocks.
Disadvantages:
• This method suffers from both internal and external fragmentation. This makes it inefficient in
terms of memory utilization.
• Increasing file size is difficult because it depends on the availability of contiguous memory at a
particular instance.

2. Linked List Allocation


In this scheme, each file is a linked list of disk blocks which need not be contiguous. The disk blocks can
be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each block contains a
pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last block (25)
contains -1 indicating a null pointer and does not point to any other block.

88
Advantages:

• This is very flexible in terms of file size. File size can be increased easily since the system does not
have to look for a contiguous chunk of memory.
• This method does not suffer from external fragmentation. This makes it relatively better in terms of
memory utilization.
Disadvantages:
• Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to
access every block individually. This makes linked allocation slower.
• It does not support random or direct access. We can not directly access the blocks of a file. A block
k of a file can be accessed by traversing k blocks sequentially (sequential access ) from the starting
block of the file via block pointers.
• Pointers required in the linked allocation incur some extra overhead.

3. Indexed Allocation

In this scheme, a special block known as the Index block contains the pointers to all the blocks occupied
by a file. Each file has its own index block. The ith entry in the index block contains the disk address of the
ith file block. The directory entry contains the address of the index block as shown in the image:

89
Advantages:
• This supports direct access to the blocks occupied by the file and therefore provides fast access to
the file blocks.
• It overcomes the problem of external fragmentation.
Disadvantages:
• The pointer overhead for indexed allocation is greater than linked allocation.
• For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one
entire block (index block) for the pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer per block.
For files that are very large, single index block may not be able to hold all the pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks together for holding the pointers.
Every index block would then contain a pointer or the address to the next index block.
2. Multilevel index: In this policy, a first level index block is used to point to the second level index
blocks which inturn points to the disk blocks occupied by the file. This can be extended to 3 or more
levels depending on the maximum file size.
3. Combined Scheme: In this scheme, a special block called the Inode (information Node) contains
all the information about the file such as the name, size, authority, etc and the remaining space of
Inode is used to store the Disk Block addresses which contain the actual file as shown in the image
below.

90
FILE DESCRIPTOR

A file descriptor is a number that uniquely identifies an open file in a computer's operating system. It
describes a data resource, and how that resource may be accessed.
When a program asks to open a file — or another data resource, like a network socket — the kernel:

1. Grants access.
2. Creates an entry in the global file table.
3. Provides the software with the location of that entry.

• The descriptor is identified by a unique non-negative integer, such as 0, 12, or 567. At least one
file descriptor exists for every open file on the system.
• File descriptors were first used in Unix, and are used by modern operating systems
including Linux, macOS, and BSD. In Microsoft Windows, file descriptors are known as file
handles.

• When a process makes a successful request to open a file, the kernel returns a file descriptor
which points to an entry in the kernel's global file table. The file table entry contains information
such as the inode of the file, byte offset, and the access restrictions for that data stream (read-
only, write-only, etc.).

91
On a Unix-like operating system, the first three file descriptors, by default, are STDIN (standard input),
STDOUT (standard output), and STDERR (standard error).

File
Name Description Abbreviation
descriptor

The default data stream for input, for example in a command


Standard
0 pipeline. In the terminal, this defaults to keyboard input from the stdin
input
user.
Standard The default data stream for output, for example when a command
1 stdout
output prints text. In the terminal, this defaults to the user's screen.
Standard The default data stream for output that relates to an error occurring.
2 stderr
error In the terminal, this defaults to the user's screen.

Redirecting file descriptors

File descriptors may be directly accessed using bash, the default shell of Linux, macOS X, and Windows
Subsystem for Linux.

For example, when you use the find command, successful output goes to stdout (file descriptor 1), and error
messages go to stderr (file descriptor 2). Both streams display as terminal output:

ACCESS CONTROL MATRIX

• Access Matrix is a security model of protection state in computer system. It is represented as a matrix.

• Access matrix is used to define the rights of each process executing in the domain with respect to each
object.

92
• The rows of matrix represent domains and columns represent objects.

• Each cell of matrix represents set of access rights which are given to the processes of domain means
each entry(i, j) defines the set of operations that a process executing in domain Di can invoke on object
Oj.

F1 F2 F3 PRINTER

D1 read read

D2 print

D3 read execute

D4 read write read write

• According to the above matrix: there are four domains and four objects- three files(F1, F2, F3) and
one printer. A process executing in D1 can read files F1 and F3. A process executing in domain D4
has same rights as D1 but it can also write on files. Printer can be accessed by only one process
executing in domain D2. The mechanism of access matrix consists of many policies and semantic
properties. Specifically, We must ensure that a process executing in domain Di can access only
those objects that are specified in row i.
• Policies of access matrix concerning protection involve which rights should be included in the (i,
j)th entry. We must also decide the domain in which each process executes. This policy is usually
decided by the operating system. The Users decide the contents of the access-matrix entries.
• Association between the domain and processes can be either static or dynamic. Access matrix
provides an mechanism for defining the control for this association between domain and processes.
When we switch a process from one domain to another, we execute a switch operation on an
object(the domain). We can control domain switching by including domains among the objects of
the access matrix. Processes should be able to switch from one domain (Di) to another domain (Dj)
if and only is a switch right is given to access(i, j).

93
F1 F2 F3 PRINTER D1 D2 D3 D4

D1 read read switch

D2 print switch switch

D3 read execute

D4 read write read write switch

According to the matrix: a process executing in domain D2 can switch to domain D3 and D4. A process
executing in domain D4 can switch to domain D1 and process executing in domain D1 can switch to
domain D2.

94

You might also like