Key Issues in Physical Database Design
Key Issues in Physical Database Design
6
File Organization
The database design involves the process of logical design with the help of E-R diagram, normalisation,
etc., followed by the physical design.
The Key issues in the Physical Database Design are:
l The purpose of physical database design is to translate the logical description of data into the
technical specifications for storing and retrieving data for the DBMS.
l The goal is to create a design for storing data that will provide adequate performance and ensure
database integrity, security and recoverability.
Some of the basic inputs required for Physical Database Design are:
l Normalised relations
l Attribute definitions
l Data usage: entered, retrieved, deleted, updated
l Requirements for security, backup, recovery, retention, integrity
Copyright © 2014. Alpha Science International. All rights reserved.
l DBMS characteristics.
l Performance criteria such as response time requirement with respect to volume estimates.
However, for such data some of the Physical Database Design Decisions that are to be taken are:
l Optimising attribute data types.
l Modifying the logical design.
l Specifying the file organization.
l Choosing indexes.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.2 Database Management System
Physical Records
These are the records that are stored in the secondary storage devices. For a database relation, physical
records are the group of fields stored in adjacent memory locations and retrieved together as a unit.
Considering the page memory system, data page is the amount of data read or written in one I/O
operation to and from secondary storage device to the memory and vice versa. In this context we
define a term blocking factor that is defined as the number of physical records per page.
The issues relating to the Design of the Physical Database Files
Physical File is a file as stored on the disk. The main issues relating to physical files are:
l Constructs to link two pieces of data:
v Sequential storage.
v Pointers.
l File Organization: How the files are arranged on the disk?
l Access Method: How the data can be retrieved based on the file Organization?
Copyright © 2014. Alpha Science International. All rights reserved.
At this point, it is worthwhile to note the difference between the terms file Organization and the
access method. A file organization refers to the organization of the data of a file into records, blocks,
and access structures; this includes the way records and blocks are placed on the storage medium
and interlinked. An access method, on the other hand, is the way how the data can be retrieved based
on the file Organization.
Mostly the databases are stored persistently on magnetic disks for the reasons given below:
l The databases being very large may not fit completely in the main memory.
l Storing the data permanently using the non-volatile storage and provide access to the users with
the help of front end applications.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.3
l Primary storage is considered to be very expensive and in order to cut short the cost of the
storage per unit of data to substantially less.
Each hard drive is usually composed of a set of disk platters. Each disk platter has a layer of
magnetic material deposited on its surface. The entire disk can contain a large amount of data, which
is organized into smaller packages called BLOCKS (or pages). On most computers, one block is
equivalent to 1 KB of data (= 1024 Bytes).
A block is the smallest unit of data transfer between the hard disk and the processor of the
computer. Each block therefore has a fixed, assigned, address. Typically, the computer processor
will submit a read/write request, which includes the address of the block, and the address of RAM
in the computer memory area called a buffer (or cache) where the data must be stored / taken from.
The processor then reads and modifies the buffer data as required, and, if required, writes the block
back to the disk. Let us see how the tables of the database are stored on the hard disk.
How are tables stored on Disk?
We realise that each record of a table can contain different amounts of data. This is because in some
records, some attribute values may be ‘null’. Or, some attributes may be of type varchar (), and
therefore each record may have a different length string as the value of this attribute. Therefore, the
record is stored with each subsequent attribute separated by the next by a special ASCII character
called a field separator. Of course, in each block, we may place many records. Each record is separated
from the next, again by another special ASCII character called the record separator. Let us see in the
next section about the types of file Organization briefly.
Just as arrays, lists, trees and other data structures are used to implement data Organization in main
memory, a number of strategies are used to support the Organization of data in secondary memory.
A file organization is a technique to organise data in the secondary memory. In this section, we are
concerned with obtaining data representation for files on external storage devices so that required
functions (e.g. retrieval, update) may be carried out efficiently.
File Organization is a way of arranging the records in a file when the file is stored on the disk. Data
files are organized so as to facilitate access to records and to ensure their efficient storage. A tradeoff
Copyright © 2014. Alpha Science International. All rights reserved.
between these two requirements generally exists: if rapid access is required, more storage is required
to make it possible. Selection of File Organizations is dependent on two factors as shown below:
l Typical DBMS applications need a small subset of the DB at any given time.
l When a portion of the data is needed it must be located on disk, copied to memory for processing
and rewritten to disk if the data was modified.
A file of record is likely to be accessed and modified in a variety of ways, and different ways of
arranging the records enable different operations over the file to be carried out efficiently. A DBMS
supports several file Organization techniques. The important task of the DBA is to choose a good
Organization for each file, based on its type of use.
The particular organization most suitable for any application will depend upon such factors as
the kind of external storage available, types of queries allowed, number of keys, mode of retrieval
and mode of update.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.4 Database Management System
File organization refers to the relationship of the key of the record to the physical location of that
record in the computer file. File organization may be either physical file or a logical file. A physical
file is a physical unit, such as magnetic tape or a disk.
One access key ?
No
Yes using secondary
also
ISAM VSAM
Binary
search tree
B
tree
B+
tree } Implementation
mechanism
Hashing
technique } Implementation
mechanism
A logical file on the other hand is a complete set of records for a specific application or purpose.
A logical file may occupy a part of physical file or may extend over more than one physical file.
The objectives of computer based file organization:
l Ease of file creation and maintenance
Efficient means of storing and retrieving information.
Copyright © 2014. Alpha Science International. All rights reserved.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.5
location identification. It is used in applications like payroll management where the file is to be
processed in entirety, i.e., each record is processed. Here, to have an access to a particular record,
each record must be examined until we get the desired record.
Sequential files are normally created and stored on magnetic tape using batch processing method.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.6 Database Management System
Advantages:
l Simple to understand.
l Easy to maintain and organize
l Loading a record requires only the record key.
l Relatively inexpensive I/O media and devices can be used.
l Easy to reconstruct the files.
l The proportion of file records to be processed is high.
Disadvantages:
l Entire file must be processed, to get specific information.
l Very low activity rate stored.
l Transactions must be stored and placed in sequence prior to processing.
l Data redundancy is high, as same data can be stored at different places with different keys.
l Impossible to handle random enquiries.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.7
reports are generated. The data is generated in the sequence in which it was recorded on the
storage media. The records must be accessed one after the other. The user cannot jump from one
record to another. It is not accessible to the CPU until it has been loaded on to an input device.
Punched cards are the earliest types of storage media which are sequential access.
2. Direct access storage and retrieval: It is also called as random access, is best suited to situations
in which only few records directly in no particular order. The most common approach is to
use a unique element of data called a key field or key contained in each record as a basis for
identifying the record and for determining which storage location on the disk the record should
be stored in or retrieved from.
3. Indexed sequential storage and retrieval: To access the data in either a sequential or direct
fashion, a third storage and retrieval methodology was developed. Indexed sequential access
method. It is used with direct access micro computer storage devices to provide maximum
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.8 Database Management System
flexibility for processing and has proved to be the most flexible for business applications. In
this approach, each file contains an index of the records stored in it. This index is some what
like the index at the back of the book. When you want to look up something in the book, you
check the index for the item you want and locate the page number. In the case of on the disk.
The file can be indexed using any key field. Multiple indexes can be created on the same file
using one or more fields as primary key field.
Les us first reconsider the binary search tree. A BST is a data structure that has a property that all
the keys that are to the left of a node are smaller than the key value of the node and all the keys to
Copyright © 2014. Alpha Science International. All rights reserved.
the right are larger than the key value of the node.
To search a typical key value, you start from the root and move towards left or right depending
on the value of key that is being searched. Since an index is a <value, address> pair, thus while using
BST, we need to use the value as the key and address field must also be specified in order to locate the
records in the file that is stored on the secondary storage devices. The following figure demonstrates
the use of BST index for a University where a dense index exists on the enrolment number field.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.9
Now, let us examine the suitability of BST as a data structure to implement index. A BST as
a data structure is very much suitable for an index, if an index is to be contained completely in the
primary memory. However, indexes are quite large in nature and require a combination of primary
and secondary storage. As far as BST is concerned it might be stored level by level on a secondary
storage which would require the additional problem of finding the correct sub-tree and also it may
require a number of transfers, with the worst condition as one block transfer for each level of a tree
being searched. This situation can be drastically remedied if we use B-Tree as data structure.
A B-Tree as an index has two advantages:
l It is completely balanced
l Each node of B-Tree can have a number of keys. Ideal node size would be if it is somewhat
equal to the block size of secondary storage.
The question that needs to be answered here is what should be the order of B-Tree for an index.
It ranges from 80-200 depending on various index structures and block size.
Let us recollect some basic facts about B-Trees indexes.
The basic B-tree structure was discovered by [Link] and [Link] (1970) of Bell Scientific
Research Labs and has become one of the popular structures for organising an index structure. Many
variations on the basic B-tree structure have been developed.
The B-tree is a useful balanced sort-tree for external sorting. There are strong uses of B-trees
in a database system as pointed out by D. Comer (1979): “While no single scheme can be optimum
for all applications, the techniques of organising a file and its index called the B-tree is the standard
Organization for indexes in a database system.”
A B-tree of order N is a tree in which:
l Each node has a maximum of N children and a minimum of the ceiling of [N/2] children.
However, the root node of the tree can have 2 to N children.
l Each node can have one fewer keys than the number of children, but a maximum of N-1 keys
can be stored in a node.
l The keys are normally arranged in an increasing order. All keys in the sub tree to the left of a
key are less than the key, and all the keys in the sub-tree to the right of a key are higher then
the value of the key.
Copyright © 2014. Alpha Science International. All rights reserved.
l If a new key is inserted into a full node, the node is split into two nodes, and the key with the
median value is inserted in the parent node. If the parent node is the root, a new root node is
created.
l All the leaves of B-tree are on the same level. There is no empty sub-tree above the level of the
leaves. Thus a B-tree is completely balanced.
To implement a database efficiently, there are several design tradeoffs needed. One of the most
important ones is the file organization. For example, if there were to be an application that required
only sequential batch processing, then the use of indexing techniques would be pointless and wasteful.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.10 Database Management System
There are several important consequences of an inappropriate file organization being used
in a database. Thus using replication would be wasteful of space besides posing the problem of
inconsistency in the data. The wrong file
Organization can also-
l Mean much larger processing time for retrieving or modifying the required record
l Require undue disk access that could stress the hardware
Needless to say, these could have many undesirable consequences at the user level, such as
making some applications impractical.
Copyright © 2014. Alpha Science International. All rights reserved.
Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.