0% found this document useful (0 votes)
9 views10 pages

Key Issues in Physical Database Design

Chapter 6 discusses physical database design, focusing on translating logical data descriptions into technical specifications for efficient storage and retrieval. Key considerations include optimizing data types, file organization, and access methods, while various file organization techniques such as heap files, sequential access, and direct access are examined. The chapter emphasizes the importance of balancing performance, storage efficiency, and data integrity in database management systems.

Uploaded by

abelahabtamu1063
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Key Issues in Physical Database Design

Chapter 6 discusses physical database design, focusing on translating logical data descriptions into technical specifications for efficient storage and retrieval. Key considerations include optimizing data types, file organization, and access methods, while various file organization techniques such as heap files, sequential access, and direct access are examined. The chapter emphasizes the importance of balancing performance, storage efficiency, and data integrity in database management systems.

Uploaded by

abelahabtamu1063
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter

6
File Organization

PHYSICAL DATABASE DESIGN ISSUES

The database design involves the process of logical design with the help of E-R diagram, normalisation,
etc., followed by the physical design.
The Key issues in the Physical Database Design are:
l The purpose of physical database design is to translate the logical description of data into the
technical specifications for storing and retrieving data for the DBMS.
l The goal is to create a design for storing data that will provide adequate performance and ensure
database integrity, security and recoverability.
Some of the basic inputs required for Physical Database Design are:
l Normalised relations
l Attribute definitions
l Data usage: entered, retrieved, deleted, updated
l Requirements for security, backup, recovery, retention, integrity
Copyright © 2014. Alpha Science International. All rights reserved.

l DBMS characteristics.
l Performance criteria such as response time requirement with respect to volume estimates.
However, for such data some of the Physical Database Design Decisions that are to be taken are:
l Optimising attribute data types.
l Modifying the logical design.
l Specifying the file organization.
l Choosing indexes.

Designing the Fields in the Database


The following are the considerations one has to keep in mind while designing the fields in the data base.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.2 Database Management System

l Choosing data type


l Coding, compression, encryption
l Controlling data integrity
l Default value
v Range control
v Null value control
v Referential integrity
l Handling missing data
v Substitute an estimate of the missing value
v Trigger a report listing missing values
v In programs, ignore missing data unless the value is significant.

Physical Records
These are the records that are stored in the secondary storage devices. For a database relation, physical
records are the group of fields stored in adjacent memory locations and retrieved together as a unit.
Considering the page memory system, data page is the amount of data read or written in one I/O
operation to and from secondary storage device to the memory and vice versa. In this context we
define a term blocking factor that is defined as the number of physical records per page.
The issues relating to the Design of the Physical Database Files
Physical File is a file as stored on the disk. The main issues relating to physical files are:
l Constructs to link two pieces of data:
v Sequential storage.
v Pointers.
l File Organization: How the files are arranged on the disk?
l Access Method: How the data can be retrieved based on the file Organization?
Copyright © 2014. Alpha Science International. All rights reserved.

STORAGE OF DATABASE ON HARD DISKS

At this point, it is worthwhile to note the difference between the terms file Organization and the
access method. A file organization refers to the organization of the data of a file into records, blocks,
and access structures; this includes the way records and blocks are placed on the storage medium
and interlinked. An access method, on the other hand, is the way how the data can be retrieved based
on the file Organization.
Mostly the databases are stored persistently on magnetic disks for the reasons given below:
l The databases being very large may not fit completely in the main memory.
l Storing the data permanently using the non-volatile storage and provide access to the users with
the help of front end applications.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.3

l Primary storage is considered to be very expensive and in order to cut short the cost of the
storage per unit of data to substantially less.
Each hard drive is usually composed of a set of disk platters. Each disk platter has a layer of
magnetic material deposited on its surface. The entire disk can contain a large amount of data, which
is organized into smaller packages called BLOCKS (or pages). On most computers, one block is
equivalent to 1 KB of data (= 1024 Bytes).
A block is the smallest unit of data transfer between the hard disk and the processor of the
computer. Each block therefore has a fixed, assigned, address. Typically, the computer processor
will submit a read/write request, which includes the address of the block, and the address of RAM
in the computer memory area called a buffer (or cache) where the data must be stored / taken from.
The processor then reads and modifies the buffer data as required, and, if required, writes the block
back to the disk. Let us see how the tables of the database are stored on the hard disk.
How are tables stored on Disk?
We realise that each record of a table can contain different amounts of data. This is because in some
records, some attribute values may be ‘null’. Or, some attributes may be of type varchar (), and
therefore each record may have a different length string as the value of this attribute. Therefore, the
record is stored with each subsequent attribute separated by the next by a special ASCII character
called a field separator. Of course, in each block, we may place many records. Each record is separated
from the next, again by another special ASCII character called the record separator. Let us see in the
next section about the types of file Organization briefly.

FILE ORGANIZATION AND ITS TYPES

Just as arrays, lists, trees and other data structures are used to implement data Organization in main
memory, a number of strategies are used to support the Organization of data in secondary memory.
A file organization is a technique to organise data in the secondary memory. In this section, we are
concerned with obtaining data representation for files on external storage devices so that required
functions (e.g. retrieval, update) may be carried out efficiently.
File Organization is a way of arranging the records in a file when the file is stored on the disk. Data
files are organized so as to facilitate access to records and to ensure their efficient storage. A tradeoff
Copyright © 2014. Alpha Science International. All rights reserved.

between these two requirements generally exists: if rapid access is required, more storage is required
to make it possible. Selection of File Organizations is dependent on two factors as shown below:
l Typical DBMS applications need a small subset of the DB at any given time.
l When a portion of the data is needed it must be located on disk, copied to memory for processing
and rewritten to disk if the data was modified.
A file of record is likely to be accessed and modified in a variety of ways, and different ways of
arranging the records enable different operations over the file to be carried out efficiently. A DBMS
supports several file Organization techniques. The important task of the DBA is to choose a good
Organization for each file, based on its type of use.
The particular organization most suitable for any application will depend upon such factors as
the kind of external storage available, types of queries allowed, number of keys, mode of retrieval
and mode of update.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.4 Database Management System

File organization refers to the relationship of the key of the record to the physical location of that
record in the computer file. File organization may be either physical file or a logical file. A physical
file is a physical unit, such as magnetic tape or a disk.
One access key ?

No
Yes using secondary
also

Sequential access only


Yes No
Multi-key organization
Sequential Direct access
Organisation only?
Yes No
Inverted Multi-list
file Cellular
Direct Index file Structure
Organisation Structure Structure
Sequential
File Ring file
Organisation
Technology (Sequential)
Technology Technology
dependent independent

ISAM VSAM

Binary
search tree
B
tree
B+
tree } Implementation
mechanism
Hashing
technique } Implementation
mechanism

Fig. 6.1 File organization

A logical file on the other hand is a complete set of records for a specific application or purpose.
A logical file may occupy a part of physical file or may extend over more than one physical file.
The objectives of computer based file organization:
l Ease of file creation and maintenance
Efficient means of storing and retrieving information.
Copyright © 2014. Alpha Science International. All rights reserved.

The various file organization methods are:


l Heap files
l Sequential access.
l Direct or random access.
l Index sequential access.
l Hash files
The selection of a particular method depends on:
l Type of application.
l Method of processing.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.5

l Size of the file.


l File inquiry capabilities.
l File volatility.
l The response time.

Heap Files (Unordered File)


Basically these files are unordered files. It is the simplest and most basic type. These files consist of
randomly ordered records. The records will have no particular order. The operations we can perform on
the records are insert, retrieve and delete. The features of the heap file or the pile file organization are:
l New records can be inserted in any empty space that can accommodate them.
l When old records are deleted, the occupied space becomes empty and available for any new
insertion.
l If updated records grow; they may need to be relocated (moved) to a new empty space. This
needs to keep a list of empty space.
Advantages of heap files
1. This is a simple file organization method.
2. Insertion is somehow efficient.
3. Good for bulk-loading data into a table.
4. Best if file scans are common or insertions are frequent.
Disadvantages of heap files
1. Retrieval requires a linear search and is inefficient.
2. Deletion can result in unused space/need for reorganization.

Sequential Access Method


Here the records are arranged in the ascending or descending order or chronological order of a key
field which may be numeric or both. Since the records are ordered by a key field, there is no storage
Copyright © 2014. Alpha Science International. All rights reserved.

location identification. It is used in applications like payroll management where the file is to be
processed in entirety, i.e., each record is processed. Here, to have an access to a particular record,
each record must be examined until we get the desired record.

Fig. 6.2 Sequential access method

Sequential files are normally created and stored on magnetic tape using batch processing method.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.6 Database Management System

Advantages:
l Simple to understand.
l Easy to maintain and organize
l Loading a record requires only the record key.
l Relatively inexpensive I/O media and devices can be used.
l Easy to reconstruct the files.
l The proportion of file records to be processed is high.
Disadvantages:
l Entire file must be processed, to get specific information.
l Very low activity rate stored.
l Transactions must be stored and placed in sequence prior to processing.
l Data redundancy is high, as same data can be stored at different places with different keys.
l Impossible to handle random enquiries.

Direct Access Files Organization


Also known as Random or relative organization. Files in his type are stored in direct access storage
devices such as magnetic disk, using an identifying key. The identifying key relates to tits actual
storage position in the file. The computer can directly locate the key to find the desired record without
having to search through any other record first. Here the records are stored randomly, hence the name
random file. It uses online system where the response and updation are fast.
Advantages:
l Records can be immediately accessed for updation.
l Several files can be simultaneously updated during transaction processing.
l Transaction need not be sorted.
l Existing records can be amended or modified.
Copyright © 2014. Alpha Science International. All rights reserved.

l Very easy to handle random enquiries.


l Most suitable for interactive online applications.
Disadvantages:
l Data may be accidentally erased or over written unless special precautions are taken.
l Risk of loss of accuracy and breach of security. Special backup and reconstruction procedures
must be established.
l Les efficient use of storage space.
l Expensive hardware and software are required.
l High complexity in programming.
l File updation is more difficult when compared to that of sequential method.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.7

Indexed Sequential Access Organization


Here the records are stored sequentially on a direct access device i.e., magnetic disk and the data is
accessible randomly and sequentially. It covers the positive aspects of both sequential and direct access files.
The type of file organization is suitable for both batch processing and online processing.
Here, the records are organized in sequence foe efficient processing of large batch jobs but an index
is also used to speed up access to the records.
Indexing permit access to selected records without searching the entire file.
Advantages:
l Permits efficient and economic use of sequential processing technique when the activity rate is
high.
l Permits quick access to records, in a relatively efficient way when this activity is a fraction of
the work load.
Disadvantages:
l Slow retrieval, when compared to other methods.
l Does not use the storage space efficiently.
l Hardware and software used are relatively expensive.

Various Data Storage and Retrieval Methods


It deals with the storage and retrieval of data in computer usable form. The three principal methods
of storing and retrieving the data are
1. Sequential: Meaning that the records are stored and retrieved in a sequential order.
2. Direct: Meaning that the records are not stored or retrieved in any special order.
3. Indexed sequential access method: Indexed sequential is a combination of the preceding two
methods where by, records are stored in sequential order but with an index that allows both
sequential and random retrieval.
Each of them is suited to different applications and processing requirements.
1. Sequential storage and retrieval: This method is suitable for situations in which comprehensive
Copyright © 2014. Alpha Science International. All rights reserved.

reports are generated. The data is generated in the sequence in which it was recorded on the
storage media. The records must be accessed one after the other. The user cannot jump from one
record to another. It is not accessible to the CPU until it has been loaded on to an input device.
Punched cards are the earliest types of storage media which are sequential access.
2. Direct access storage and retrieval: It is also called as random access, is best suited to situations
in which only few records directly in no particular order. The most common approach is to
use a unique element of data called a key field or key contained in each record as a basis for
identifying the record and for determining which storage location on the disk the record should
be stored in or retrieved from.
3. Indexed sequential storage and retrieval: To access the data in either a sequential or direct
fashion, a third storage and retrieval methodology was developed. Indexed sequential access
method. It is used with direct access micro computer storage devices to provide maximum

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.8 Database Management System

flexibility for processing and has proved to be the most flexible for business applications. In
this approach, each file contains an index of the records stored in it. This index is some what
like the index at the back of the book. When you want to look up something in the book, you
check the index for the item you want and locate the page number. In the case of on the disk.
The file can be indexed using any key field. Multiple indexes can be created on the same file
using one or more fields as primary key field.

Hashed File Organization


Hashing is the most common form of purely random access to a file or database. It is also used to
access columns that do not have an index as an optimisation technique. Hash functions calculate
the address of the page in which the record is to be stored based on one or more fields in the record.
The records in a hash file appear randomly distributed across the available space. It requires some
hashing algorithm and the technique. Hashing Algorithm converts a primary key value into a record
address. The most popular form of hashing is division hashing with chained overflow.
Advantages of Hashed file organization
1. Insertion or search on hash-key is fast.
2. Best if equality search is needed on hash-key.
Disadvantages of Hashed file organization
1. It is a complex file organization method.
2. Search is slow.
3. It suffers from disk space overhead.
4. Unbalanced buckets degrade performance.
5. Range search is slow.

INDEX AND TREE STRUCTURE

Les us first reconsider the binary search tree. A BST is a data structure that has a property that all
the keys that are to the left of a node are smaller than the key value of the node and all the keys to
Copyright © 2014. Alpha Science International. All rights reserved.

the right are larger than the key value of the node.
To search a typical key value, you start from the root and move towards left or right depending
on the value of key that is being searched. Since an index is a <value, address> pair, thus while using
BST, we need to use the value as the key and address field must also be specified in order to locate the
records in the file that is stored on the secondary storage devices. The following figure demonstrates
the use of BST index for a University where a dense index exists on the enrolment number field.

Fig. 6. 3 Binary search tree

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
File Organization 6.9

Now, let us examine the suitability of BST as a data structure to implement index. A BST as
a data structure is very much suitable for an index, if an index is to be contained completely in the
primary memory. However, indexes are quite large in nature and require a combination of primary
and secondary storage. As far as BST is concerned it might be stored level by level on a secondary
storage which would require the additional problem of finding the correct sub-tree and also it may
require a number of transfers, with the worst condition as one block transfer for each level of a tree
being searched. This situation can be drastically remedied if we use B-Tree as data structure.
A B-Tree as an index has two advantages:

l It is completely balanced
l Each node of B-Tree can have a number of keys. Ideal node size would be if it is somewhat
equal to the block size of secondary storage.
The question that needs to be answered here is what should be the order of B-Tree for an index.
It ranges from 80-200 depending on various index structures and block size.
Let us recollect some basic facts about B-Trees indexes.
The basic B-tree structure was discovered by [Link] and [Link] (1970) of Bell Scientific
Research Labs and has become one of the popular structures for organising an index structure. Many
variations on the basic B-tree structure have been developed.
The B-tree is a useful balanced sort-tree for external sorting. There are strong uses of B-trees
in a database system as pointed out by D. Comer (1979): “While no single scheme can be optimum
for all applications, the techniques of organising a file and its index called the B-tree is the standard
Organization for indexes in a database system.”
A B-tree of order N is a tree in which:
l Each node has a maximum of N children and a minimum of the ceiling of [N/2] children.
However, the root node of the tree can have 2 to N children.
l Each node can have one fewer keys than the number of children, but a maximum of N-1 keys
can be stored in a node.
l The keys are normally arranged in an increasing order. All keys in the sub tree to the left of a
key are less than the key, and all the keys in the sub-tree to the right of a key are higher then
the value of the key.
Copyright © 2014. Alpha Science International. All rights reserved.

l If a new key is inserted into a full node, the node is split into two nodes, and the key with the
median value is inserted in the parent node. If the parent node is the root, a new root node is
created.
l All the leaves of B-tree are on the same level. There is no empty sub-tree above the level of the
leaves. Thus a B-tree is completely balanced.

IMPORTANCE OF FILE ORGANIZATION IN DATABASE

To implement a database efficiently, there are several design tradeoffs needed. One of the most
important ones is the file organization. For example, if there were to be an application that required
only sequential batch processing, then the use of indexing techniques would be pointless and wasteful.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.
6.10 Database Management System

There are several important consequences of an inappropriate file organization being used
in a database. Thus using replication would be wasteful of space besides posing the problem of
inconsistency in the data. The wrong file
Organization can also-
l Mean much larger processing time for retrieving or modifying the required record
l Require undue disk access that could stress the hardware
Needless to say, these could have many undesirable consequences at the user level, such as
making some applications impractical.
Copyright © 2014. Alpha Science International. All rights reserved.

Bhatia, Ashima Bhatnagar, and Ashima Bhatnagar Bhatia. Database Management System, Alpha Science International, 2014. ProQuest Ebook Central,
[Link]
Created from georgetown on 2023-02-21 16:28:53.

You might also like