File Management
File System Architecture
Field
Records
Files
Database
Field
Roll No. Name Address Telephone …….
865472 Jhony North East 985647234 ….
P.
…. ….. ….. ….. …..
….. ….. ….. ….. …..
Record
FILE
File
File is a named collection of related
information that is recorded on a secondary
storage.
It made of fixed length logical records that
allow programs to read and write records
rapidly in no particular order.
File Concept
File system is the most visible aspect of an operating
system.
It consists of 2 parts: collection of files(each storing
related data) and a directory structure which
organizes and provides all the information about all
the files in your system.
File is the named collection of related information.
File is a sequence of bits, bytes, lines, or records, the
meaning of which is defined by the file’s creator and
user.
Commonly files represent programs(both source and
object forms) and data.
File Attributes
Name – only information kept in human-readable form.
Name is usually a string of characters such as example.c.
some systems differentiate between upper case and lower
case and some don’t.
Identifier – unique tag (number) identifies file within file
system
Type – needed for systems that support different types
Location – pointer to file location on device
Size – current file size
Protection – controls who can do reading, writing,
executing
Time, date, and user identification – data for protection,
security, and usage monitoring
Information about files are kept in the directory structure,
File Operations
File is an abstract data type. To define a file properly,
we need to consider some operations which can be
performed on files
Create
Write
Read
Reposition within file
Delete
Truncate
Open(Fi) – search the directory structure on disk for entry
Fi, and move the content of entry to memory
Close (Fi) – move the content of entry Fi in memory to
directory structure on disk
File Structure
Internal File Structure
External File structure
Operating system provides various file
structures for managing files.
Also provides set of special operations for
manipulating files with these file structures.
Access Methods
Files store information and this information
must be accessed and read into computer
memory.
The information in the file can be accessed in
several ways.
Sequential Access
Information is processed on one order, one record after the
other.
A read operation—read next—reads the next portion of the
file and automatically
advances a file pointer, which tracks the I/O location.
Similarly, the write
operation—write next—appends to the end of the file and
advances to the
end of the newly written material (the new end of file).
Direct Access
In direct access file is seen as a numbered
sequence of blocks or records.
Immediate access to large amount of
information.
The operations include:
read n, where n is the block number, rather
than read next.
write n, where n is the block number, rather
than write next.
position to n, where n is the block number.
Sequential Access • Direct Access
read next read n
write next write n
reset position to n
read next
write next
n = relative block number
Indexed Access Method
An index is associated with the file containing
pointers to the blocks.
To find the record in the file, we search the index
and then use the pointer to access the file
directly and to find the desired record.
Disk Structure
As the number of files increase the issue of
managing the files becomes an issue.
The organization is done in two parts, Firstly:
Disk can be subdivided into partitions
Disks or partitions can be RAID protected
against failure
Disk or partition can be formatted with a file
system
Partitions also known as minidisks, slices or
volume
Secondly, Each volume containing file system also
tracks that file system’s info in device directory
or volume table of contents.
A Typical File-system Organization
Operations Performed on Directory
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
Directory Structure
The directory contains information for all files.
Both the directory structure and the files reside on disk
Single-Level Directory
A single directory for all users
Easy to support and understand.
Limitation:
When number of files increases or when the system has more than one
user, then Naming problem occurs. All files should have unique names.
Two-Level Directory
In two level directory, each user has his own user file
directory(UFD). UFDs have the similar structure, but each
lists only the files of a single user.
Path name
Can have the same file name for different user
Efficient searching
Tree-Structured Directories
Tree-Structured Directories (Cont)
Absolute or relative path name
Creating a new file is done in current directory
Delete a file
rm <file-name>
Creating a new subdirectory is done in current directory
mkdir <dir-name>
Example: if in current directory /mail
mkdir count
mail
prog copy prt exp count
Deleting “mail” deleting the entire subtree rooted by “mail”
Acyclic-Graph Directories
Have shared subdirectories and files
General Graph Directory
Directory Implementation
1. Linear List
Simple to program
Time consuming to execute
Create new file
Make sure it’s a unique file name
Add entry at end of directory
Delete a file
Search for the file
Release the space allocated to it
Mark the space as unused or attach it to list of free
directory entries or copy the last directory entry here.
2. Hash Table
Linear list + hash structure
Compute hash value of file names and returns
pointer to file in the linear list
Decreases search time
Collisions are to be avoided in case two file
names hash to same location
Disadvantage – hash functions are of fixed size
(0-64 or 0-128 etc)
Allocation Methods
An allocation method refers to how disk
blocks are allocated for files so that disk
space is utilized effectively and files can be
accessed quickly.
Contiguous allocation
Linked allocation
Indexed allocation
Allocation Methods
1) Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on
the disk.
For example, if a file requires n blocks and is given a block b as
the starting location, then the blocks assigned to the file will
be: b, b+1, b+2,……b+n-1.
This means that given the starting block address and the length
of the file (in terms of blocks required), we can determine the
blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19
with length = 6 blocks. Therefore, it occupies 19, 20, 21, 22, 23,
24 blocks.
Advantages:
Both the Sequential and Direct Accesses are supported by
this. For direct access, the address of the kth block of the
file which starts at block b can easily be obtained as
(b+k).
This is extremely fast since the number of seeks are
minimal because of contiguous allocation of file blocks.
Disadvantages:
This method suffers from both internal and external
fragmentation. This makes it inefficient in terms of
memory utilization.
Increasing file size is difficult because it depends on the
availability of contiguous memory at a particular instance.
2) Linked Allocation
In this scheme, each file is a linked list of disk blocks
which need not be contiguous. The disk blocks can
be scattered anywhere on the disk.
The directory entry contains a pointer to the starting
and the ending file block. Each block contains a
pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the
blocks are randomly distributed. The last block (25)
contains -1 indicating a null pointer and does not
point to any other block.
Advantages:
This is very flexible in terms of file size. File size can be increased
easily since the system does not have to look for a contiguous
chunk of memory.
This method does not suffer from external fragmentation. This
makes it relatively better in terms of memory utilization.
Disadvantages:
Because the file blocks are distributed randomly on the disk, a
large number of seeks are needed to access every block
individually. This makes linked allocation slower.
It does not support random or direct access. We can not directly
access the blocks of a file. A block k of a file can be accessed by
traversing k blocks sequentially (sequential access ) from the
starting block of the file via block pointers.
Pointers required in the linked allocation incur some extra
overhead.
3) Indexed Allocation
In this scheme, a special block known as
the Index block contains the pointers to all the
blocks occupied by a file. Each file has its own
index block. The ith entry in the index block
contains the disk address of the ith file block. The
directory entry contains the address of the index
block as shown in the image:
Advantages:
This supports direct access to the blocks occupied by
the file and therefore provides fast access to the file
blocks.
It overcomes the problem of external fragmentation.
Disadvantages:
The pointer overhead for indexed allocation is greater
than linked allocation.
For very small files, say files that expand only 2-3
blocks, the indexed allocation would keep one entire
block (index block) for the pointers which is inefficient
in terms of memory utilization. However, in linked
allocation we lose the space of only 1 pointer per block.
Performance
Contiguous
Requires only 1 access to get a disk block
Linked
Requires i disk reads to read ith block
Indexed
Depends on
level of indexing
Size of file
Position of desired block
Free Space management
Disk space is limited, the space from deleted
files is reused for new files, if possible. To keep
track of free disk space, the system maintains a
free-space list.
1. Bit vector
2. Linked list
3. Grouping
4. Counting
Bit Vector
The free-space list is implemented as a bit
map or bit vector. Each block is
represented by 1 bit. If the block is free,
the bit is 1; if the block is allocated, the bit
is 0.
Linked List
Link together all the
free disk blocks, keeping
a pointer to the first free
block in a special
location on the disk
and caching it in
memory.
Grouping
Store the addresses of n free blocks in the
first free block.
The first n—1 of these blocks are actually
free.
The last block contains the addresses of
another n free blocks, and so on.
Counting
Keep the address of the first free block and
the number n of free contiguous blocks that
follow the first block.
Each entry in the free-space list then consists
of a disk address and a count.
Each entry requires more space than would a
simple disk address, overall list will be
shorter, as long as the count is generally
greater than 1.
Summary
Files
File Structure
Access Methods
Directory Structure
Directory Implementation
Allocation Methods
Free Space Management