Understanding File Systems and Operations
Understanding File Systems and Operations
File Concept: A file is a collection of similar records. The file is treated as a single entity by
users and applications and may be referred by name. Files have unique file names and may be
created and deleted. Restrictions on access control usually apply at the file level.
A file is a container for a collection of information. The file manager provides a protection
mechanism to allow users administrator how processes executing on behalf of different users can
access the information in a file. File protection is a fundamental property of files because it
allows different people to store their information on a shared computer.
File represents programs and data. Data files may be numeric, alphabetic, binary or alpha
numeric. Files may be free form, such as text files. In general, file is sequence of bits, bytes, lines
or records.
A common technique for implementing file types is to include the type as part of the file name.
the name is split into two parts a name and an extension. Various file types are shown in the
following table.
For example, a telephone book is analogous to a file. It contains a list of records, each of which
consists of three fields: name, address, and telephone number.
File Attributes: A file has certain attributes which vary from one operating system to another.
Name: Every file has a name by which it is referred.
Identifier: It is unique number that identifies the file within the file system.
Type: This information is needed for those systems that support different types of files.
Location: It is a pointer to a device & to the location of the file on that device
Size: It is the current size of a file in bytes, words or blocks.
Protection: It is the access control information that determines who can read, write &
execute a file.
Time, date & user identification: It gives information about time of creation or last
modification & last use.
File Operations: Any file system provides not only a means to store data organized as files,
but a collection of functions that can be performed on files. Typical operations include the
following:
Creating files: Two steps are necessary to create a file. First, space must be found for the
file in the file system. Secondly, an entry must be made in the directory for the new file.
Reading a file: Data & read from the file at the current position. The system must keep a
read pointer to know the location in the file from where the next read is to take place.
Once the read has been taken place, the read pointer is updated.
Writing a file: Data are written to the file at the current position. The system must keep a
write pointer to know the location in the file where the next write is to take place. The
write pointer must be updated whenever a write occurs.
Repositioning within a file: The directory is searched for the appropriate entry & the
current file position is set to a given value. After repositioning data can be read from or
written into that position. Repositioning within a file does not need to involve any actual
I/O. this file operation is also known as a file seek.
Deleting a file: To delete a file, we search the directory for the required file. After
deletion, the space is released so that it can be reused by other files.
Truncating a file: the user may want to erase the contents of a file but keep its attribute.
Rather than forcing the user to delete the file and then recreate it, this function allows all
attributes to remain unchanged- except for file length- but lets the file be reset to length
zero and its file space released.
File Structure: File types also can be used to indicate the internal structure of the file. The
operating system requires that an executable file have a specific structure so that it can determine
where in memory to load the file and what the location of the first instruction is. If
OS supports multiple file structures; the resulting size of OS is large. If the OS defines 5
different file structures, it needs to contain the code to support these file structures. All OS must
support at least one structure that of an executable file so that the system is able to load and run
programs.
Internal File Structure: In UNIX OS, defines all files to be simply stream of bytes. Each
byte is individually addressable by its offset from the beginning or end of the file. In this case,
the logical record size is 1 byte. The file system automatically packs and unpacks bytes into
physical disk blocks, say 512 bytes per block.
The logical record size, physical block size, packing determines how many logical
records are in each physical block. The packing can be done by the user’s application program or
OS. A file may be considered a sequence of blocks. If each block were 512 bytes, a file of 1949
bytes would be allocated 4 blocks (2048 bytes). The last 99 bytes would be wasted. It is called
internal fragmentation all file systems suffer from internal fragmentation, the larger the block
size, the greater the internal fragmentation.
File Access Methods: Files store information. When it is used, this information must be
accessed and read into computer memory. The information in the file can be accessed in several
ways.
Sequential Access: It is the simplest access method. Information in the file is processed in
order i.e. one record after another. A process can read all the data in a file in order starting from
beginning but can’t skip & read arbitrarily from any location. Sequential files can be rewound. It
is convenient when storage medium was magnetic tape rather than disk.
Eg : A file consisting of 100 records, the current position of read/write head is 45th record,
suppose we want to read the 75th record then, it access sequentially from 45, 46, 47 …….. 74,
75. So the read/write head traverse all the records between 45 to 75.
Beginning Current Position Target Record End
0 45 75 100
Fig 1: sequential access file
Direct Access: A file is made up of fixed length-logical records that allow programs to read &
write records rapidly in no particular order. This method can be used when disk are used for
storing files. This method is used in many applications e.g. database systems. If an airline
customer wants to reserve a seat on a particular flight, the reservation program must be able to
access the record for that flight directly without reading the records before it. In a direct access
file, there is no restriction in the order of reading or writing. For example, we can read block 14,
then read block 50 & then write block 7 etc. Direct access files are very useful for immediate
access to large amount of information.
Indexed Access: In this method an index is created which contains a key field and pointers to
the various blocks. To find an entry in the file for a key value, we first search the index and then
use the pointer to directly access a file and find the desired entry.
With large files, the index file itself may become too large to be keep in memory. One
solution is to create an index for the index file. The primary index file would contain pointers to
secondary index files, which would point to the actual data items. Figure 2 shows a situation as
implemented by VMS (Virtual Memory Storage) index and relative files.
last name record number
Gupta
Rohit Singh, Ram Employee ID Age
Ankur
. Relative File
.
.
Singh
File Directories: Sometimes the file system consisting of millions of files, at that situation it
is very hard to manage the files. To manage these files grouped these files and load one group
into one partition. Each partition is called a directory. A directory structure provides a
mechanism for organizing many files in the file system.
Operation on the directories: the various operations that can be performed on directory
are:
Searching for a file: we need to search a directory structure to find the entry for a
particular file. Since files have symbolic names in a user-readable form and similar
names may indicate a relationship between files. We may want to able to find all files
whose names match a particular pattern.
Create a file: new files are created and added to the directory.
Delete a file: when we do not require to use a particular file, it is removed from the
directory.
List a directory: when we want to list all the files in a particular directory, and the
contents of directory entry for each file in the list.
Rename a file: Whenever we need to change the name of the file, we can change the
name.
Traverse the file system: in a directory structure, we may wish to access every directory,
and every file within a directory structure.
Directory Structure: The most common schemes for defining the structure of the directory
are:
(i) Single-level Directory: it is the simplest directory structure. All files are
contained in the same directory which is easy to support and understand (figure 3)
Directory
D1 D2 D3 D4 D5 D6 D7
F1 F2 F3 F4 F5 F6 F7
Advantages:
Since it is a single directory, so its implementation is very easy.
If the files are smaller in size, searching will become faster.
The operations like file creation, searching, deletion, updating is very easy in such a
directory structure.
Disadvantages:
There may chance of name collision because two files can not have the same name.
Searching will become time taking if the directory is large.
The same type of files cannot be grouped together.
(ii) Two-level Directory: the disadvantage of single level directory is the confusion
of files names between different users. The solution for this problem is to create a
directory for each user as shown in figure 4.
In the two-level directory structure, each user has his own user files directory (UFD).
Each user has similar structure but lists only the files of a single user. When user login, the
system’s master file directory (MFD) is searched. The master file directory is indexed by
user name or account number and each entry points to the user directory for that user.
When users refer to a particular file, only their own user file directory is searched. Thus
different users may have files with the same name, as long as all file names within each user
file directory are unique.
MFD
D1 D2 D3 D4 D5 D6 D7 D8
F2 F3 F4 F5 F6 F7 F8
F1
(iii) Tree- Structured Directory: the tree-structure allows user to create their own sub-
directories and organize their files accordingly. The tree has a root directory. Every file in
the systems has a unique path name. A path is the path from the root through all the sub-
directories to a specified file. A directory contains a set of files and or sub-directories.
Fig 5: Tree-structured Directory
Advantages:
User can access other user’s files by specifying the path name
name.
User can create his own sub
sub-directories.
Searching becomes very easy; we can use both absolute path as well as relative.
Disadvantages:
Every file does not fit into the hierarchical model; files may be saved into multiple
directories.
We cannot share files.
It is inefficient, because accessing a file may go under multiple directories.
(iv) Acyclic Graph Directories: A shared directory or file will exist in the file system in
two or more places at once. A shared directory or file is not the same as two copies of the
file. With a shared file there is on
only
ly one actual field and any changes made by one person
would be immediately visible to the other.
An acyclic graph allows directories to have shared sub sub-directories
directories and files. The
same file or sub-directory
directory may be in two or more process exists in the file system at a
time. An acyclic graph directory structure is more flexible than a simple structure but it is
also more complex.
Fig 6: Acyclic graph structured directory
Advantages:
We can share files.
Searching is easy due to different
different-different paths.
Allow multiple directories to contain same file.
Disadvantages:
We share the
he files via linking; in case of del
deleting
eting it may create the problem.
Need to be cautions of dangling pointers when files are deleted.
…………………………….THE
.THE END
END………………………………
……………….
References:
(1) Abraham Silberschatz, Galvin & Gagne, Operating System Concepts, John Wiley &
Sons, INC.
(2) Harvay [Link], Introduction to Operating System, Addition Wesley Publication
Company.
(3) Vijay Shukla, Operating System, S.K. Kataria & Sons
Sons.
(4) Naresh Chauhan, Principles of Operating System, Oxford University Press
Operating System Handout
File Types
2. Direct Access
• Sometimes it is not necessary to process every record in a file.
• It is not necessary to process all the records in the order in which they are present in the
memory. In all such cases, direct access is used.
• The disk is a direct access device which gives us the reliability to random access of any
file block.
• In the file, there is a collection of physical blocks and the records of that blocks.
• Example: Databases are often of this type since they allow query processing that
involves immediate access to large amounts of information. All reservation systems fall
into this category.
In brief:
• This method is useful for disks.
• The file is viewed as a numbered sequence of blocks or records.
• There are no restrictions on which blocks are read/written, it can be dobe in any
order.
• User now says "read n" rather than "read next".
• "n" is a number relative to the beginning of file, not relative to an absolute
physical disk location.
Advantages:
• Direct access file helps in online transaction processing system (OLTP) like
online railway reservation system.
• In direct access file, sorting of the records are not required.
• It accesses the desired records immediately.
• It updates several files quickly.
• It has better control over record allocation.
Disadvantages:
• Direct access file does not provide backup facility.
• It is expensive.
• It has less storage space as compared to sequential file.
Swapping:
• Swapping is a mechanism in which a process can be swapped temporarily out of
main memory (or move) to secondary storage (disk) and make that memory
available to other processes.
• At some later time, the system swaps back the process from the secondary
storage to main memory.
• Though performance is usually affected by swapping process but it helps in
running multiple and big processes in parallel and that's the reason
• Swapping is also known as a technique for memory compaction.
• Swap space is a space on hard disk which is a substitute of physical memory.
• It is used as virtual memory which contains process memory image.
• Whenever our computer run short of physical memory it uses its virtual memory
and stores information in memory on disk.
• This means that given the starting block address and the length of the file (in
terms of blocks required), we can determine the blocks occupied by the file.
• The directory entry for a file with contiguous allocation contains
1. Address of starting block
2. Length of the allocated portion.
• The file ‘mail’ in the following figure starts from the block 19 with length = 6
blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
• Each file has its own index block which stores the addresses of disk space
occupied by the file.
• Directory contains the addresses of index blocks of files.
Advantages:
• This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
• It overcomes the problem of external fragmentation.
Disadvantages:
• The pointer overhead for indexed allocation is greater than linked allocation.
• For very small files, say files that expand only 2-3 blocks, the indexed allocation
would keep one entire block (index block) for the pointers which is inefficient in
terms of memory utilization. However, in linked allocation we lose the space of
only 1 pointer per block.
1. Single-level directory –
• Single level directory is simplest directory structure.
• In it all files are contained in same directory which make it easy to support and
understand.
• A single level directory has a significant limitation, however, when the number
of files increases or when the system has more than one user.
• Since all the files are in the same directory, they must have the unique name. if
two users call their dataset test, then the unique name rule violated.
Advantages:
• Since it is a single directory, so its implementation is very easy.
• If files are smaller in size, searching will faster.
• The operations like file creation, searching, deletion, updating are very easy in
such a directory structure.
Disadvantages:
• There may chance of name collision because two files cannot have the same
name.
• Searching will become time taking if directory will large.
• In this cannot group the same type of files together.
2. Two-level directory –
• As, a single level directory often leads to confusion of files names among
different users hence the solution to this problem is to create a separate directory
for each user.
• In the two-level directory structure, each user has their own user files directory
(UFD).
• The UFDs has similar structures, but each lists only the files of a single user.
system’s master file directory (MFD) is searches whenever a new user id=s
logged in.
• The MFD is indexed by username or account number, and each entry points to
the UFD for that user.
Advantages:
• We can give full path like /User-name/directory-name/.
• Different users can have same directory as well as file name.
• Searching of files become more easy due to path name and user-grouping.
Disadvantages:
• A user is not allowed to share files with other users.
• Still it not very scalable, two files of the same type cannot be grouped together
in the same user.
3. Tree-structured directory –
• Once we have seen a two-level directory as a tree of height 2, the natural
generalization is to extend the directory structure to a tree of arbitrary height.
• This generalization allows the user to create their own subdirectories and to
organize on their files accordingly.
• A tree structure is the most common directory structure. The tree has a root
directory, and every file in the system have a unique path.
Advantages:
• Very generalize, since full path name can be given.
• Very scalable, the probability of name collision is less.
• Searching becomes very easy, we can use both absolute path as well as relative.
Disadvantages:
• Every file does not fit into the hierarchical model; files may be saved into
multiple directories.
• We cannot share files.
• It is inefficient, because accessing a file may go under multiple directories.
Disk Organization:
A physical structure of disk is a memory storage device which looks like this:
• The disk head can read or write data only when the desired disk surface area is
under the disk head.
• Read-Write(R-W) head moves over the rotating hard disk.
• It is this Read-Write head that performs all the read and write operations on the
disk and hence, position of the R-W head is a major concern.
• To perform a read or write operation on a memory location, we need to place the
R-W head over that position. Some important terms must be noted here:
1. Seek time – The time taken by the R-W head to reach the desired track from
it’s current position.
2. Rotational latency – Time taken by the sector to come under the R-W head.
3. Data transfer time – Time taken to transfer the required amount of data. It
depends upon the rotational speed.
4. Controller time – The processing time taken by the controller.
5. Average Access time – seek time + Average Rotational latency + data
transfer time + controller time.
Basically, hard disk can be divided in the logical structure in the following five logical
terms:
• MBR (Master Boot Record)
• DBR (DOS Boot Record)
• FAT (File Allocation Tables)
• Root Directory
• Data Area
1. The Master Boot Record (or MBR)
• At the beginning of the hard drive is the MBR. When your computer starts using
your hard drive, this is where it looks first.
• The MBR itself has a specific organization. The size of the MBR is 512 bytes.
• The boot loader is the first 446 bytes of the MBR. This section contains
executable code, where programs are housed.
• The partition tables are 4 slots of 16 bytes each, containing the description of a
partition (primary or extended) on the disk.
Here is how to describe a partition:
• State of the partition (inactive partition bootable) - (1 byte)
• Custom heads at the beginning of the partition - (1 byte)
• Cylinder sector and the beginning of the partition - (2 bytes)
• Type of partition (file system, eg, 32 fat, ext2 etc ...) - (1 bytes)
• Head of the end of the partition (1 byte)
• Cylinder sector and the end of the score - (2 bytes)
• Number of sectors between the MBR and the first sector of the partition - (4
bytes)
• Number of sector of the partition - (4 bytes)
• The Magic Number is two bytes used to determine if the hard disk has a
bootloader or not. If it does, the magic number should be equal in value to
hexadecimal 55AA.
the hard disk drive into the main memory of computer and give the systems
control to the loaded program.
• Previously the root directory used to be fixed in size and located at a fixed
position on disk but now it is free to grow as necessary as it is now treated as a
file.
RAID 1: Also known as disk mirroring, this configuration consists of at least two drives
that duplicate the storage of data. There is no striping. Read performance is improved
since either disk can be read at the same time. Write performance is the same as for
single disk storage.
RAID 2: This configuration uses striping across disks, with some disks storing error
checking and correcting (ECC) information. It has no advantage over RAID 3 and is no
longer used.
RAID 3: This technique uses striping and dedicates one drive to storing parity
information. The embedded ECC information is used to detect errors. Data recovery is
accomplished by calculating the exclusive OR (XOR) of the information recorded on
the other drives. Since an I/O operation addresses all the drives at the same time, RAID
3 cannot overlap I/O. For this reason, RAID 3 is best for single-user systems with long
record applications.
RAID 4: This level uses large stripes, which means you can read records from any
single drive. This allows you to use overlapped I/O for read operations. Since all write
operations have to update the parity drive, no I/O overlapping is possible. RAID 4 offers
no advantage over RAID 5.
Mr. [Link] Page 16 of 17
Operating System Handout
RAID 5: This level is based on block-level striping with parity. The parity information
is striped across each drive, allowing the array to function even if one drive were to fail.
The array's architecture allows read and write operations to span multiple drives. This
results in performance that is usually better than that of a single drive, but not as high
as that of a RAID 0 array. RAID 5 requires at least three disks, but it is often
recommended to use at least five disks for performance reasons.
RAID 6: This technique is similar to RAID 5, but includes a second parity scheme that
is distributed across the drives in the array. The use of additional parity allows the array
to continue to function even if two disks fail simultaneously. However, this extra
protection comes at a cost. RAID 6 arrays have a higher cost per gigabyte (GB) and
often have slower write performance than RAID 5 arrays.
Disk Scheduling:
In case of multiple I/O request disk scheduling
algorithm must decide which request must be
executed first.
Disk Scheduling Algorithm:
Advantage:
o Easy to understand
o Easy to implement
o It can be used with less load of process
Disadvantage :
Example :
Answer:
= 202(Move).
0 12 14 25 34 39 52 53 68 80 90
SSTF(Shortest Seek Time First)
Advantage:
Disadvantage :
o Overhead to find out the closet requirement
.
Example :
Answer:
=97(MOVES).
0 12 14 25 34 39 52 53 68 80 90
SCAN
Disadvantage :
o Long waiting time for location just visited
by head.
Example :
Answer:
=115(MOVES).
0 12 14 25 34 39 52 53 68 80 90
C-SCAN
Disadvantage :
o More Risk ,compare to simple Scan
Algorithm
Example :
Answer:
=179(MOVES).
0 12 14 25 34 39 52 53 68 80 90
LOOK
Advantage:
.
• Better performance compare to scan
algorithm
Disadvantage :
Answer:
=71(MOVES).
0 12 14 25 34 39 52 53 68 80 90
C-LOOK
Disadvantage :
Example :
Answer:
0 12 14 25 34 39 52 53 68 80 90
0 12 14 25 34 39 52 53 68 80 90