MODULE 4
FILE CONCEPT
● A file is a named collection of related information that is recorded on
secondary storage such as magnetic disks, magnetic tapes an optical disks.
● In general, a file is a sequence of bits, bytes, lines or records whose meaning
is defined by the files creator and user.
● A file can be defined as a data structure which stores the sequence of
records.
● Files are stored in a file system, which may exist on a disk or in the main
memory. Files can be simple (plain text) or complex (specially-formatted).
● The collection of files is known as Directory.
● The collection of directories at the different levels, is known as File System.
FILE ATTRIBUTES
• Name : Every file carries a name by which the file is recognized in
the file system.
• Identifier : Along with the name, Each File has its own extension
which identifies the type of the file. For example, a text file has the
extension .txt, A video file can have the extension .mp4
• Type : In a File System, the Files are classified in different types
such as video files, audio files, text files, executable files, etc.
• Location : In the File System, there are several locations on which,
the files can be stored.
• Size: The Size of the File is one of its most important attribute.
• Protection : The Admin of the computer may want the different
protections for the different files.
• Time and Date : Every file carries a time stamp which contains the
time and date on which the file is last modified.
FILE OPERATIONS
Create operation: This operation is used to create a file in the file
system. To create a new file of a particular type the associated
application program calls the file system. This file system allocates
space to the file. As the file system knows the format of directory
structure, so entry of this new file is made into the appropriate
directory.
Open operation: Once the file is created, it must be opened
before performing the file processing operations. When the user
wants to open a file, it provides a file name to open the particular
file in the file system. It tells the operating system to invoke the
open system call and passes the file name to the file system.
Write operation: A system call write is issued that specifies the
name of the file and the length of the data has to be written to the
file. Whenever the file length is increased by specified value and
the file pointer is repositioned after the last byte written.
Read operation: This operation reads the contents from a file. A
Read pointer is maintained by the OS, pointing to the position up
to which the data has been read.
Re-position or Seek operation: The seek system call re-positions
the file pointers from the current position to a specific place in the
file i.e. forward or backward depending upon the user's
requirement. This operation is generally performed with those file
management systems that support direct access files.
Delete operation: Deleting the file will not only delete all the data
stored inside the file it is also used so that disk space occupied by it
is freed. In order to delete the specified file the directory is
searched. When the directory entry is located, all the associated
file space and the directory entry is released.
Truncate operation: Truncating is simply deleting the file except
deleting attributes. The file is not completely deleted although the
information stored inside the file gets replaced.
Close operation: When the processing of the file is complete, it
should be closed so that all the changes made
File organization
File organization indicates how the records are organized in a file. There are
different types of organizations for files so as to increase their efficiency of
accessing the records. Following are the types of file organization schemes −
1. Sequential file organization
2. Indexed sequential file organization
Sequential file organization
● Records can be read in sequential order. For reading the 10th record, all the
previous 9 records should be read.
● Records are written in sequential order. A new record cannot be inserted in
between. A new record is always inserted at the end of the file.
● After placing a record into a sequential file, it is not possible to delete,
shorten, or lengthen a record.
● Order of the records, once inserted, can never be changed.
● Updation of record is possible. A record can be overwritten, if the new
record length is same as the old record length.
● Sequential output files are good option for printing
Indexed Access
● If a file can be sorted on any place, then an index can be
assigned to a group of certain records. However, A particular
record can be accessed by its index.
● The index is nothing but the address of a record in the file.
● In index accessing, searching in a large database became very
quick and easy but we need to have some extra space in the
memory to store the index value.
• Single Level index
• Multi Level index
DIRECTORY STRUCTURE
• Directory can be defined as the listing of the related files on the
disk. The directory may store some or the entire file attributes.
• To get the benefit of different file systems on the different
operating systems, A hard disk can be divided into the number of
partitions of different sizes. The partitions are also called volumes
or mini disks.
• Each partition must have at least one directory in which, all the files of
the partition can be listed.
• A directory entry is maintained for each file in the directory which
stores all the information related to that file.
• A directory can be viewed as a file which contains the Meta data of
the bunch of files.
[Link] Level Directory
The simplest method is to have one big list of all the files on the
disk. The entire system will contain only one directory which is
supposed to mention all the files present in the file
Advantages
• Implementation is very simple.
• If the sizes of the files are very small then the searching becomes
faster.
• File creation, searching, deletion is very simple since we have only
one directory.
Disadvantages
• We cannot have two files with the same name.
• The directory may be very big therefore searching for a file may take
so much time.
• Protection cannot be implemented for multiple
[Link] Level Directory
In two level directory systems, we can create a separate directory
for each user.
• There is one master directory which contains separate directories
dedicated to each user.
• For each user, there is a different directory present at the second
level, containing a group of user's file.
• The system doesn't let a user to enter in the other user's
directory without permission.
Characteristics of two level directory system
• Each files has a path name as /User-name/directory-name/
• Different users can have the same file name.
• Searching becomes more efficient as only one user's list needs to
be traversed.
• The same kind of files cannot be grouped into a single directory
for a particular user.
Every Operating System maintains a variable as PWD which
contains the present directory name (present user name) so that
the searching can be done appropriately.
[Link] Structured Directory
• In Tree structured directory system, any directory entry can either
be a file or sub directory.
Tree structured directory system overcomes the drawbacks of two level
directory system. The similar kind of files can now be grouped in one
directory.
• Each user has its own directory and it cannot enter in the other
user's directory. However, the user has the permission to read the
root's data but he cannot write or modify this. Only administrator
of the system has the complete access of root directory.
• Searching is more efficient in this directory structure. The concept
of current working directory is used. A file can be accessed by two
types of path, either relative or absolute.
● Absolute path is the path of the file with respect to the root
directory of the system while relative path is the path with
respect to the current working directory of the system.
In tree structured directory systems, the user is given the privilege to
create the files as well as directories.
FILE ALLOCATION METHODS
1. Contiguous Allocation
• Each file occupies a contiguous address space on disk.
• Assigned disk address is in linear order.
• Easy to implement. External fragmentation is a major issue with
this type of allocation technique.
2. Linked List Allocation
● Linked List allocation solves all problems of contiguous
allocation.
● In linked list allocation, each file is considered as the linked
list of disk blocks.
● The disk blocks allocated to a particular file need not to be
contiguous on the disk.
● Each disk block allocated to a file contains a pointer which
points to the next disk block allocated to the same file.
Advantages
• There is no external fragmentation with linked allocation.
• Any free block can be utilized in order to satisfy the file block
requests.
• File can continue to grow as long as the free blocks are available.
• Directory entry will only contain the starting block address.
Disadvantages
• Random Access is not provided.
• Pointers require some space in the disk blocks.
• Any of the pointers in the linked list must not be broken otherwise
the file will get corrupted.
• Need to traverse each block.
Indexed Allocation
• File allocation table tries to solve as many problems as possible but
leads to a drawback.
• Therefore, we need to allocate more space to a file allocation table.
Since, file allocation table needs to be cached therefore it is
impossible to have as many space in cache.
• Instead of maintaining a file allocation table of all the disk pointers,
Indexed allocation scheme stores all the disk pointers in one of the
blocks called as indexed block.
Indexed block doesn't hold the file data, but it holds the pointers to
all the disk blocks allocated to that particular file.
• Directory entry will only contain the index block address.
DISK SCHEDULING
• A process needs two type of time, CPU time and IO time.
• For I/O, it requests the Operating system to access the disk.
• However, the operating system must be fare enough to satisfy each
request and at the same time, operating system must maintain the
efficiency and speed of process execution.
• The technique that operating system uses to determine the request
which is to be satisfied next is called disk scheduling.
Important terms related to disk scheduling
Seek Time : Seek time is the time taken in locating the disk arm to
a specified track where the read/write request will be satisfied.
Rotational Latency : It is the time taken by the desired sector to
rotate itself to the position from where it can access the R/W
heads.
Transfer Time : It is the time taken to transfer the data.
Disk Access Time : Disk access time is given as,
Disk Access Time = Rotational Latency + Seek Time + Transfer Time
Disk Response Time : It is the average of time spent by each
request waiting for the IO operation.
DISK SCHEDULING ALGORITHMS
• The main purpose of disk scheduling algorithm is to select a disk
request from the queue of IO requests and decide the schedule
when this request will be processed.
FCFS SCHEDULING
• It is the simplest Disk Scheduling algorithm.
• It services the IO requests in the order in which they arrive. • There is
no starvation in this algorithm, every request is serviced.
Example
Example:
Suppose the order of request is- (82,170,43,140,24,16,190)
And current position of Read/Write head is: 50
So, total overhead movement (total distance covered by the disk arm) =
(82-50)+(170-82)+(170-43)+(140-43)+(140-24)+(24-16)+(190-16) =642
SSTF (Shortest Seek Time First)
In SSTF (Shortest Seek Time First), requests having the shortest seek
time are executed first. So, the seek time of every request is calculated in
advance in the queue and then they are scheduled according to their
calculated seek time. As a result, the request near the disk arm will get
executed first. SSTF is certainly an improvement over FCFS as it
decreases the average response time and increases the throughput of the
system. Let us understand this with the help of an example.
Example:
Shortest Seek Time First
Suppose the order of request is- (82,170,43,140,24,16,190)
And current position of Read/Write head is: 50
So,
total overhead movement (total distance covered by the disk arm) =
(50-43)+(43-24)+(24-16)+(82-16)+(140-82)+(170-140)+(190-170) =208
Advantages of Shortest Seek Time First
Here are some of the advantages of Shortest Seek Time First.
● The average Response Time decreases
● Throughput increases
Disadvantages of Shortest Seek Time First
Here are some of the disadvantages of Shortest Seek Time First.
● Overhead to calculate seek time in advance
● Can cause Starvation for a request if it has a higher seek time as
compared to incoming requests
● The high variance of response time as SSTF favors only some
requests
SCAN
In the SCAN algorithm the disk arm moves in a particular direction and
services the requests coming in its path and after reaching the end of the
disk, it reverses its direction and again services the request arriving in its
path. So, this algorithm works as an elevator and is hence also known as
an elevator algorithm. As a result, the requests at the midrange are
serviced more and those arriving behind the disk arm will have to wait.
Example:
SCAN Algorithm
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And
the Read/Write arm is at 50, and it is also given that the disk arm should
move “towards the larger value”.
Therefore, the total overhead movement (total distance covered by the
disk arm) is calculated as
= (199-50) + (199-16) = 332
Advantages of SCAN Algorithm
Here are some of the advantages of the SCAN Algorithm.
● High throughput
● Low variance of response time
● Average response time
Disadvantages of SCAN Algorithm
Here are some of the disadvantages of the SCAN Algorithm.
● Long waiting time for requests for locations just visited by disk
arm
C-SCAN
In the SCAN algorithm, the disk arm again scans the path that has been
scanned, after reversing its direction. So, it may be possible that too
many requests are waiting at the other end or there may be zero or few
requests pending at the scanned area.
These situations are avoided in the CSCAN algorithm in which the disk
arm instead of reversing its direction goes to the other end of the disk
and starts servicing the requests from there. So, the disk arm moves in a
circular fashion and this algorithm is also similar to the SCAN algorithm
hence it is known as C-SCAN (Circular SCAN).
Example:
Circular SCAN
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And
the Read/Write arm is at 50, and it is also given that the disk arm should
move “towards the larger value”.
So, the total overhead movement (total distance covered by the disk
arm) is calculated as:
=(199-50) + (199-0) + (43-0) = 391
Advantages of C-SCAN Algorithm
Here are some of the advantages of C-SCAN.
● Provides more uniform wait time compared to SCAN.
LOOK
LOOK Algorithm is similar to the SCAN disk scheduling algorithm
except for the difference that the disk arm in spite of going to the end of
the disk goes only to the last request to be serviced in front of the head
and then reverses its direction from there only. Thus it prevents the extra
delay which occurred due to unnecessary traversal to the end of the disk.
Example:
LOOK Algorithm
Suppose the requests to be addressed are-82,170,43,140,24,16,190. And
the Read/Write arm is at 50, and it is also given that the disk arm should
move “towards the larger value”.
So, the total overhead movement (total distance covered by the disk
arm) is calculated as:
= (190-50) + (190-16) = 314
C-LOOK
As LOOK is similar to the SCAN algorithm, in a similar way, C-LOOK
is similar to the CSCAN disk scheduling algorithm. In CLOOK, the disk
arm in spite of going to the end goes only to the last request to be
serviced in front of the head and then from there goes to the other end’s
last request. Thus, it also prevents the extra delay which occurred due to
unnecessary traversal to the end of the disk.
Example:
1. Suppose the requests to be addressed are-82,170,43,140,24,16,190.
And the Read/Write arm is at 50, and it is also given that the disk
arm should move “towards the larger value”
C-LOOK
So, the total overhead movement (total distance covered by the disk
arm) is calculated as
= (190-50) + (190-16) + (43-16) = 341