File Structures
Lecture 1
Dr. Neveen
COURSE OBJECTIVES
Study efficient techniques for organization and
manipulation of data in secondary storage.
• You will learn aspects of file manipulation:
– Basic file operations (create, read, write, etc.)
– Secondary storage devices.
• Then, you will learn the most important high-level file
structures tools (indexing, B-trees, etc.)
COURSE OUTLINE
1) Introduction to file management. Fundamental file
processing operations.
2) Secondary storage, physical storage devices: disks,
and buffer management.
3) Explain the concepts of records, record types, and
files, as well as the different techniques for placing
file records on disk.
4) Organizing files for performance: data compression
(Huffman and Lempel-Ziv codes), reclaiming
space in files, binary searching, and keysorting.
5) Indexing.
6) Describe the relationships among hashing,
compression, and efficient database searches.
7) Multilevel indexing and B trees.
Introduction
Computers exist to process data
(data represents
information)
Data processing from a computer science
perspective means:
– Storage of data.
– Organization of data. Understand Data Structures
– Access to data.
– Processing of data.
What is a file?
A file is a named collection of data that is
recorded on secondary storage.
Computer storage media
Primary Storage Secondary Storage
Storage media that can be Data cannot be processed
operated on directly by the directly by the computer CPU;
computer CPU. it must first be copied into
primary storage.
Fast access to data (since Slow (since electronic and
electronic). mechanical).
Small (since expensive). Large (cost less).
Volatile (data is lost when Stable, persistent (data is
power failure occurs). preserved longer).
Examples: main memory, and Examples: magnetic and
cache memory. optical disk, CD-ROM, DVD.
Why files are important?
Data cannot be written to secondary storage
unless they are within a file.
What is a file structure?
A file structure is a combination of representation
for data in files and of operations for accessing the
data.
Data structures versus File structures
• Both involve: representation of data and
operation for accessing data.
• Difference: data structures deal with data in main
memory, while file structures deal with data in
secondary storage (files).
Why file structures are important?
A user or programmer needs to know how the data are
organized, or structured and then how to find the
data.
Example:
• Databases are stored physically as files of records,
which are typically stored on magnetic disks.
• It is important to study and understand the properties
and characteristics of magnetic disks and the way
data files can be organized on disk in order to design
effective database with acceptable performance.
What are the objectives of file structures?
• Minimize number of trips to the disk in order
to get desired data. Ideally get what we need in
one disk access or get it with as few disk
accesses as possible.
• Grouping related data so that we are likely to
get everything we need only one trip to the
disk.
What is a file management system?
A file management system is that set of system
software that provides services to users and
applications related to the use of files.
Definition of File structure
• File Structure is a combination of
representations for data in files and of
operations for accessing the data. A file
structure allows applications to read, write,
and modify data.
Goal of file structure design
• Our goal is to show you how to think
creatively about file structure design
problem.
• We want to get the information with one
access or as few accesses as possible.
• We want our file structures to group
information so we are likely to get
everything we need with only one trip to
the disk.
• In 1963 researchers developed the self-
adjusting binary tree structure, called AVL tree
for data in memory and researchers apply it to
files but the problem was appear in the balance
of the tree which lead to dozens of accesses
required to fined the record.
• After 10 years in research they reach to “B-tree”
this solution provide excellent access
performance.
• After that the combination of B-tree and a
sequential linked list yield B+ tree.
• After that reach to hashing technique but it is not
practical with volatile and dynamic files.
File structure litracy
• Over the last three decades, watching file
structure design evolve as it addresses
dynamic files fist sequentially, then tree
structures and finally through direct
access, we see that the same design
problems and design tools keep emerging.
• We decrease the number of disk access
by collecting data into buffers, blocks or
buckets.
An Object oriented toolkit:
Making File structure usable
• To make the file structure more powerful and
uasable in application development it is
required an advanced programming
language employ an object oriented
approach.
• OOP approach in which data types and
operations are presented in a unified fashion
as class definitions.
• We here will use the C++ programming
language to give precise specifications to the
file structure classes.
Objects in C++
• In OOP data content and behavior are
integrated into a single design called class.
• Ex: class Person
{ public: char
lastName[11],firstName[11],address[16];
char City[16],state[3],Zipcode[10];
Person();
};
Access modifiers
• There are three levels of access to class
members:
• public
• private
• protected
• For class the default modifier(access) is
private but for structure is public.
Overleading
• Overloading of symbols allows a particular
symbol to have more than one meaning.
• Ex: the operators = and == overloaded to
the class String to assign and compare
between Strings.
• + is used for numeric types we can
overload it to combine tow strings …and
so on for other operators to facilitate deal
with classes.