0% found this document useful (0 votes)

11 views111 pages

Database Management Systems Overview

Q: What are the differences between the UNION and INTERSECT operators in SQL, and how do their behaviors affect the results of queries?

The UNION operator combines the results of two or more SELECT statements into a single result set, with only distinct values being included by default . It requires the same number of columns and compatible data types across all SELECT statements . In contrast, the INTERSECT operator returns only the common rows found in both result sets, effectively an intersection . While UNION eliminates duplicates and allows ordering, INTERSECT inherently has no duplicates and also arranges data in ascending order by default. The choice between these operators impacts query results by either combining distinct datasets or focusing only on mutual elements .

Q: What is the purpose of using a hash function in hash file organization, and what are its implications for data retrieval?

A hash function is used in hash file organization to convert a record's key value into an address, which determines the data block where the record is stored . This method allows for quick access to records by directly calculating the data location without needing sequential searching. However, it means that the data is not stored in a specific order, possibly causing collisions where multiple records are mapped to the same bucket. This necessitates mechanisms to handle such collisions, influencing retrieval efficiency .

Q: What are the limitations of Indexed Sequential Access Method (ISAM), and how do these affect database performance?

ISAM's main limitations include its static nature and the need for periodic maintenance of overflow areas . Since ISAM uses a static primary index that does not adjust dynamically with data changes, it can lead to inefficient data access patterns if the data distribution significantly alters. Overflow areas temporarily house new records, which disrupts the sequential ordering, requiring periodic merging back into the main file to restore order and performance. These factors can degrade retrieval performance over time, requiring careful management and maintenance to uphold optimal access speeds .

Q: In what ways do indexed and hashed file organizations differ in structure and performance, particularly regarding data retrieval?

Indexed and hashed file organizations differ significantly in structure and performance. Indexed file organization uses an auxiliary file (index) that contains key values and pointers to database records, allowing quick random access based on key values . This structure is beneficial for range and partial retrievals due to orderly data storage . In contrast, hashed file organization uses a hash function to compute a record's storage address, offering fast data retrieval for exact queries but often leading to unordered data storage. Hashed organization is typically faster for individual record access but less efficient for range queries due to unordered storage .

Q: How do different types of SQL JOIN operations selectively combine data from multiple tables, and what are the typical use cases for each type?

SQL JOIN operations are used to selectively combine data from multiple tables. An INNER JOIN returns records with matching values in both tables, typically used when ensuring data inclusion from both datasets . LEFT JOIN returns all records from the left table and matched records from the right table, useful for retaining full data from a primary dataset while including related data from another . RIGHT JOIN is similar, but focuses on preserving all records from the right table. FULL JOIN combines results of both LEFT and RIGHT JOIN operations, including all records from both tables with matches where available, addressing comprehensive data integration scenarios .

Q: Explain the role of transaction timestamps in timestamp-based concurrency control protocols.

In timestamp-based concurrency control protocols, each transaction is given a unique timestamp upon initiation, determining its priority and order in the schedule . The timestamps ensure that operations comply with their temporal order, maintaining consistency. Two-time stamp values, R_TS (read) and W_TS (write), track the timestamps of successful read and write operations on database items, respectively. These timestamps prevent conflicting transactions from violating the set order, thus enforcing serializability by giving priority to older transactions, ensuring isolation and consistency in transaction execution .

Q: Describe how the concept of view serializability is utilized to prevent database inconsistency.

View serializability is used to prevent database inconsistency by ensuring that the database reflects a sequence of transactions as if they were executed in isolation, even if they were interleaved in practice . It maintains consistency by allowing the user to perceive transactions in a serial order that could produce the same results as if the transactions were processed one after the other without interference. The primary focus is on ensuring that the final state of the database and the views drawn from it remain consistent, avoiding conflicts between transactions .

Q: How does the concept of multiple granularity affect lock management in databases, and why is it beneficial?

Multiple granularity enhances lock management by hierarchically breaking down the database into blocks that can be independently locked . This approach reduces lock overhead and improves concurrency control by allowing different levels of locking granularity, which can either narrow down or expand depending on the transaction needs. By distinguishing hierarchically between data items, multiple granularity allows flexible lock control, reducing unnecessary locking and unlocking actions, thereby improving database performance and resource efficiency .

Q: How does the implementation of DBMS recoverability ensure data consistency after a system failure?

DBMS recoverability ensures data consistency post-failure by recording all changes made during transaction processing in transaction logs . These logs enable the system to differentiate between committed and uncommitted transactions. Upon failure, the system reverts uncommitted changes to maintain consistency or redoes committed changes to ensure durability. This process guarantees that the database returns to a consistent state where only durable transactions persist, safeguarding against data inconsistency caused by incomplete or erroneous transactions .

Q: How do optimistic concurrency control techniques differ from traditional locking methods in handling database transactions?

Optimistic concurrency control techniques differ from traditional locking methods by assuming low conflict rates and deferring consistency checks until the transaction's end . Instead of acquiring locks, these techniques use phases: read phase, where data is read and local copies are updated; validation phase, which checks for serializability violations before committing changes; and a write phase to apply updates if validation succeeds. Traditional methods like locking involve acquiring locks during the transaction, potentially causing wait times and deadlocks. Optimistic techniques minimize such contention but risk rollback if validation fails, trading immediate consistency for increased concurrency .

Uploaded by

seheraliya786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views111 pages

Database Management Systems Overview

Uploaded by

seheraliya786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

AY 2022-23 J. B. Institute of Engineering and Technology B.

Tech: AI & ML
onwards (UGC Autonomous) II Year – I Sem

DATABASE MANAGEMENT SYSTEMS

(Common to AI & ML, AI&DS, CSE & IT)

MODULE-1 UNIT-1: NOTES

Unit-1: Introduction
Syllabus:
Database System Applications, Database Systems Vs File Systems,

View of Data-Data Abstraction, Instances and Schemas. Data Models

– The ER Model, Relational Model, Other Data Models. Database

Languages – DDL, DML. Database Access for Applications Programs,

Data Base Users and Administrator, Transaction Management, Data

Base System Structure, Storage Manager, The Query Processor.

Topics Covered:
Database System Applications:
The different applications of DBMS are as follows −

 Railway Reservation System − It is used to keep record of booking of tickets,

departure of the train and the status of arrival and give updates to the passengers

with the help of a database.

1
 Library Management System − There will be so many numbers of books in the

library and it is very hard to keep a record of all the books in a register or a copy.

So, DBMS is necessary to keep track of all the book records, issue dates, name of

the books, author and maintain the records.

 Banking − We are doing a lot of transactions daily without directly going to the

banks. The only reason is the usage of databases and it manages all the data of

the customers over the database.

 Educational Institutions − All the examinations and the data related to the

students maintained over the internet with the help of a database management

system. It contains registration details of the student, results, grades and courses

available. All these works can be done online without visiting an institution.

 Social Media Websites − By filling the required details we are able to access
social media platforms. Many users daily sign up for social websites such as

Facebook, Pinterest and Instagram. All the information related to the users are

stored and maintained with the help of DBMS.

.
2
Database Systems Vs File Systems
[Link].
File System DBMS
File system is software that DBMS is software to create and manage the
manages the data file and the user database file, it provides more advantages
has to write the programs for than a file system. DBMS is a collection of
managing the database. Handling data. In DBMS the user is not required to
a file system is easier than DBMS, write the programs.

In this system, the user has to Example:

write the programs for managing
ORACLE, MySQL, Microsoft SQL Server,
the database.
Postre SQL, IBM DB2, SQLite, Microsoft
Access, MariaDB and Microsoft Azure SQL
1. Examples:
Database.
DOS(Disk operating system)
follows FAT (File allocation Table)
system As of June 2021, the most popular RDBMS in
WINDOWS follows NTFS (Newest the world was ORACLE.
Technology File System)

UNIX follows UNIX file system

Linux follows Extended File

System
Bigdata follows Zetta byte File
System.

File system stored data in the form Data stored in a tabular form i.e., structured
of unstructured format. format.
Exampe: Data stores in a tabular form
Exampe: In a Notepad file data
2 stores like Empno Name Salary
101 xyz 20000
1. Abc 20000 manager 102 Abc 40000
hyderabad 103 Sam 35000
104 Smith 45000
3
2. Xyz 4000 director pune

Data redundancy and Data In DBMS, the related data resides in the
inconsistency exist in file system. same storage location or the same
File processing system has more information may not be duplicated as a result
data redundancy minimizing data redundancy and reduced
3 data inconsistency. In DBMS there is no
redundancy and no inconsistency.

In file system, data integrity In DBMS, there is no data integrity problem.

problem. Data integrity refers to Example of data integrity: User cannot insert
4. the data accuracy. Here, user can string data in integer column
insert string data in instead of
integer by mistakenly.

In the file system, concurrent DBMS takes care of concurrent access of

access has many problems like data by using some form of locking. For
redirecting the file while other example, there are two transactions T1 and
5.
deleting some information or T2 and data item x. T1(x). Here, x data item
updating some information. Since cannot be accessed by T2 until T1 finished
there is no locking system in the i.e., locking.
file system.

The file system offers lesser Database Management System offers high
6. security. security.

them.
7.

4
Unauthorized access is not Unauthorized access is restricted in DBMS
restricted in file system

8. Multiple user access to the Allow multiple users to access the database
information is difficult to provide at the same time

.
5
View of Data-Data Abstraction
Levels of Abstraction in a DBMS

There are three levels of data abstraction in DBMS which reduce the complexity
of the database. They are

1. Internal Level or Physical Schema

2. Logical Level or Conceptual Schema

3. View Level or External Schema

The pictorial representation of three levels of data abstraction is:

6
.1. Internal Level:

Physical Schema or Internal level is the lowest level of abstraction in the DBMS which
describes how the data is actually stored in the database and it also describes complex
low level data structure and access methods used by database. This internal level deals
with the storage of the data for the whole database system. This is the first or lowest
level of abstraction which describes how a record is actually stored in the system
memory. It is a low-level representation of the database

2. Logical Level:

Logical level or Conceptual Schema is the intermediate level, next in higher level and
also known as conceptual level which describes what data is stored and reveals the
relationships that exists among the stored data.

It tried to describe the entire data. It means what tables to be created and what are the
links are between these tables are mentioned in this level. This is less complicated than
the physical level. Little bit of complexity over here as well. But, it is not that much like at
the physical level. This level is used by database administrators or developer.

Example: Let us take an example where we use the relational model for storing the
data. We have to store the data of a student, the columns in the student table will be
student_name, age, mail_id, roll_no etc. We have to define all these at this level while
we are creating the database.

It is less complex than the physical level. So, overall, the logical level contains tables
(fields and attributes) and relationships among table attributes.

7
3. View Level:
View level or External Schema: When coming to the third level that is the view level and
it is the highest level. This is also called the external level. There are different levels of
views in view level and each view defines only a part of a entire data. It also interacts
with the user since it provides the different views of the same database. It also provides
multiple views for the same database. View level can be used by all the users of the
database.

Example: If the student has a login-id and password in a university system, then as
a student, he can view his marks, attendance, fee structure, etc. But the faculty of the
university will have a different view. He will have options like salary, edit marks of a
student, enter attendance of the students, etc. So, both the student and the faculty
have a different view.

Instances and Schemas:

The instances can be changed by certain CRUD operations, such as like
addition, and deletion of data. It may be noted that any search query will not make
any kind of changes in the instances.

Example:
Let’s say a table teacher in our database whose name is School, suppose the table
has 50 records so the instance of the database has 50 records for now and tomorrow
we are going to add another fifty records so tomorrow the instance has a total of 100
records. This is called an instance.

Schema
Schema is the overall description of the database. The basic structure of how the
data will be stored in the database is called schema.

Example:
Let’s say a table teacher in our database named school, the teacher table requires
the name, dob, and doj in their table so we design a structure as:
Teacher table
name: String
doj: date
dob: date

8
Data Models
Data Model:

Data model means to model the data i.e., to give a shape to the data and to give a
figure to the stored data. A data model makes it easier to understand the meaning of
the data by its figure.

There are various types of data model but the relational model is the most widely used
model.

The different Data Models in DBMS are:

1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Model
5. Object-Relational Model
6. Entity-Relationship Model

1. Hierarchical Model
Hierarchical Model was the first DBMS model and one of the oldest Database Model
The general shape of this model is like an Organizational chart (Example-2). A node
on the chart represents a particular entity. The terms parent and child are used in
describing a hierarchical model. This model organizes the data in the hierarchical tree
structure. The hierarchy starts from the root which has root data and then it expands
in the form of a tree adding child node to the parent node.

.
9
Example: Organization Chart

2. Network Model
This model is the same as the hierarchical model; the only difference is that a record
can have more than one parent. The network model was developed to overcome the
limited scope of hierarchical model. It replaces the hierarchical tree with a graph.

In Network Model, multiple parent-child relationships are used. The network model
uses a network structure, which is a data structure of nodes and branches.

Example-2:

10
The network model for ‘UNIVERSITY’ system is shown above figure, the Mathematics
Department node is associated with ‘Computer Department’ node. Similarly ‘Computer
Lab; and ‘Library’ nodes are associated with both the ‘Mathematics Department’ and
‘Computer Department’ nodes.

3. Relational Model

Relational Model is the most widely used model. In this model, the data is maintained
in the form of a two-dimensional table. All the information is stored in the form of rows
and columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model.

For example student roll number, student name, student course etc., and columns
also known as attributes. An attribute represents the characteristics of a person, place
or thing, for example Salary attribute in a given below example.

11
Popular examples of standard relational databases include Microsoft SQL Server,
Oracle Database, MySQL and IBM DB2.

4. Object-Oriented Model

The object-oriented model is based on a collection of objects. An object contains values

stored in variables within the object. An object also contains code that operates on the
object. This code is called methods. Objects that contain the same types of values and
the same methods are grouped together into classes. A class may be viewed as a
definition for objects.

An object-oriented data model is one of the most developed data models which contain
video, graphical files, and audio.

5. Object-Relational Model
An Object relational model is a combination of a Object oriented model and a Relational
model. So, it supports objects, classes, inheritance etc. just like Object Oriented models
and has support for data types, tabular structures etc. like Relational data model.

One of the major goals of Object relational data model is to close the gap between
relational database and the object oriented database frequently used in many
programming languages such as C++, C#, Java etc.

Example: ORDBMSs include PostgreSQL and Oracle

6. Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In
this model, we represent the real-world problem in the pictorial form to make it easy to
understand. It is also very easy for the developers to understand the system by just

12
looking at the ER diagram. We use the ER diagram as a visual tool to represent an ER
Model.

The following ER diagram syntax has the three components:

1. Entity

2. Attribute

3. Relationship

Example:

.
13
Database Languages

DDL- Data Definition Language

It is a language that allows the user to define the data and their relationship to other
types of data.

The DDL Commands in Structured Query Language are used to create and modify the
schema of the database and its objects. The syntax of DDL commands is predefined for
describing the data. The commands of Data Definition Language deal with how the data
should exist in the database.

Following are the five DDL commands in SQL:

1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command

1. CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other
database objects.

Examples of CREATE Command in SQL

Example 1: This example describes how to create a new database using the CREATE
DDL command.

Create Database Books;

14
Example 2: This example describes how to create a new table using the CREATE DDL
command.

CREATE TABLE Student

(
Roll_No. Int ,
First_Name Varchar (20) ,
Last_Name Varchar (20) ,
Age Int ,
Marks Int ,
);

Example 3: This example describes how to create a new index using the CREATE DDL
command.

CREATE INDEX city ON Employee City;

2. DROP Command
DROP command used to delete/remove the database objects from the SQL database.
We can easily remove the entire table, view, or index from the database using this
DDL command.

Examples of DROP Command in SQL

DROP DATABASE Books;

DROP TABLE Student;

DROP INDEX Index_city;

3. ALTER Command
ALTER command which changes or modifies the existing structure of the database, and
it also changes the schema of database objects.

Examples of ALTER Command in SQL

Example: This example shows how to add a new field to the existing table.

ALTER TABLE Student ADD Father's_Name Varchar(60);

ALTER TABLE table_name MODIFY ( Last_Name varchar(25));

15
4. TRUNCATE Command
TRUNCATE command which deletes or removes all the records from the table but
without structure.

This command also removes the space allocated for storing the table records.

TRUNCATE TABLE Student;

5. RENAME Command
RENAME command which is used to change the name of the database table.

RENAME TABLE Old_Table_Name TO New_Table_Name;

Example

RENAME TABLE Student TO Student_Details ;

This query changes the name of the table from Student to Student_Details.

.
16
DML:

Data Manipulation Language (DML)

It is a language that provides a set of operations to support the basic data manipulation
operation on data held in the database.

We can easily access, store, modify, update and delete the existing records from the database using DML
commands.

Following are the four main DML commands in SQL:

1. SELECT Command

2. INSERT Command

3. UPDATE Command

4. DELETE Command

1. SELECT Command
SELECT is the most important data manipulation command in Structured Query
Language. The SELECT command shows the records of the specified table. It also
shows the particular record of a particular column by using the WHERE clause.

Examples of SELECT Command

Example 1: This example shows all the values of every column from the table.

SELECT * FROM Student;

This SQL statement displays the following values of the student table:

Example 2: This example shows all the values of a specific column from the table.

SELECT Emp_Id, Emp_Salary FROM Employee;

17
This SELECT statement displays all the values of Emp_Salary and Emp_Id column
of Employee table:

SELECT * FROM Student WHERE Stu_Marks = 80;

2. INSERT Command
INSERT command allows users to insert data in database tables.

Example:

INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode,

Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen
21', 'Stavanger', '4006', 'Norway');

3. UPDATE Command
UPDATE command which allows users to update or modify the existing data in
database tables.

UPDATE Table_name SET [column_name1= value_1, ….., column_nameN = value_N

] WHERE CONDITION;

Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the
name of the table whose values you want to update.

Examples of the UPDATE command

Suppose, you want to update the Product_Price of the product whose Product_Id is
P102. To do this, you have to write the following DML UPDATE command:

UPDATE Product SET Product_Price = 80 WHERE Product_Id = 'P102' ;

UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id = 103 AND Stu
_Id = 202;

4. DELETE Command
DELETE is a DML command which allows SQL users to remove single or multiple
existing records from the database tables.

18
We use the WHERE clause with the DELETE command to select specific rows from the
table.

Examples of DELETE Command

DELETE FROM Product WHERE Product_Id = 'P202' ;

DELETE FROM Student WHERE Stu_Marks > 70 ;

Database Access for Applications Programs:

This access facility, known as the application program interface, makes the internal
structure of the database known to application programs. This capability of moving data
between database files and application programs is crucial in implementing the
integration procedure.

A user issues an access request, using some application program or data manipulation
language. The application program determines what data are needed and
communicates the need to the database management system. The DBMS intercepts
the request and interprets it.

The database management system is a bridge between the application program, (that
determines what data are needed and how they are processed), and the operating
system of the computer, which is responsible for placing data on the magnetic storage
devices.

To retrieve data from the database, the following operations are performed internally:

1. A user issues an access request, using some application program or data

manipulation language.

2. The application program determines what data are needed and communicates the
need to the database management system.

There are 2 ways application programs can access the database through ODBC and
JDBC.

19
Database Users and Administrator:
There are two types of database users, Users, and Administrators.

Database Users
Database users are the ones who really use and take the benefits of the database. There
will be different types of users depending on their needs and way of accessing the
database

1. Application Programmers – who interact with the database by means

of DML queries. These DML queries are written in the application programs like
C, C++, JAVA,

2. Sophisticated Users – who write SQL queries to select/insert/delete/update data.

They do not use any application or programs to request the database. They
directly interact with the database by means of a query language like SQL.

3. Stand-alone Users – These users will have a stand-alone database for their
personal use. These kinds of the database will have readymade database
packages which will have menus and graphical interfaces.

4. Naive Users – these are the users who use the existing application to interact with
the database. For example, online library system, ticket booking systems, ATMs
etc

For examples, Railway’s ticket booking users are naive users. Clerks in any bank
is a naive user because they don’t have any DBMS knowledge but they still use
the database and perform their given task.

Database Administrators
A DBA has many responsibilities. A good-performing database is in the hands of DBA.
Database Administrator (DBA) is a person/team who defines the schema and also
controls the levels of database.
The DBA will then create a new account id and password for the user if he/she need to
access the data base. DBA is also responsible for providing security to the data base
and he allows only the authorized users to access/modify the data base.

20
DBA also monitors the recovery and back up and provide technical support.
DBA repairs damage caused due to hardware and/or software failures.

Transaction Management:
Transactions are a set of operations used to perform a logical set of work. It is all the
instructions of a logical operation. A transaction usually means that the data in the
database has changed. One of the major uses of DBMS is to protect the user’s data
from system failures. It is done by ensuring that all the data is restored to a consistent
state when the computer is restarted after a crash. The transaction is any one
execution of the user program in a DBMS. One of the important properties of the
transaction is that it contains a finite number of steps. Executing the same program
multiple times will generate multiple transactions.

Example: Consider the following example of transaction operations to be performed

to withdraw cash from an ATM vestibule.

Steps for ATM Transaction

1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.

.
21
Data Base System Structure:

Storage Manager:
Storage Manager: Storage Manager is a program that provides an interface
between the data stored in the database and the queries received. It is also known
as Database Control System. It maintains the consistency and integrity of the
database by applying the constraints and executing the DCL statements. It is

22
responsible for updating, storing, deleting, and retrieving data in the database.
It contains the following components –
 Authorization Manager: It ensures role-based access control, i.e,. checks whether
the particular person is privileged to perform the requested operation or not.

 Integrity Manager: It checks the integrity constraints when the database is

modified.

 Transaction Manager: It controls concurrent access by performing the operations

in a scheduled way that it receives the transaction. Thus, it ensures that the
database remains in the consistent state before and after the execution of a
transaction.

 File Manager: It manages the file space and the data structure used to represent
information in the database.

 Buffer Manager: It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.

Disk Storage: It contains the following components –

 Data Files: It stores the data.

 Data Dictionary: It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.
 Indices: It provides faster retrieval of data item.

23
The Query Processor:
Query Processor: It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is
received from the DML compiler.
Query Processor contains the following components –
 DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
 DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).
 Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
 Query Optimizer: It executes the instruction generated by DML Compiler.

.
24
MODULE-1 UNIT-2 : NOTES
Syllabus: Beyond ER Design, Entities, Attributes and Entity Sets,
Relationships and Relationship Sets, Additional Features of ER Model
odels

E-R diagrams:
Attributes and Entity Sets:
Attributes:
In DBMS, we have entities, and each entity contains some property about their behavior
which is also called the attribute. In relational databases, we have tables, and each
column contains some entity that has some attributes, so all the entries for that column
should strictly follow the attribute of the entity. Entities define the characteristic property
of the attributes.

Following is the Attributes of an Entity

1. Simple Attribute
2. Composite Attribute
3. Single-valued Attribute
4. Multi-valued Attribute
5. Derived Attribute

25
.
1. Simple Attribute:

It is also known as atomic attributes. When an attribute cannot be divided further, then it is
called a simple attribute.

For example, in a student table, the branch attribute cannot be further divided. It is called a
simple or atomic attribute because it contains only a single value that cannot be broken further.

2. Composite Attribute:

Composite attributes are those that are made up of the composition of more than one
attribute. When any attribute can be divided further into more sub-attributes, then that
attribute is called a composite attribute.

26
For example, in a student table, we have attributes of student names that can be further
broken down into first name, middle name, and last name. So the student name will be
a composite attribute.

Another example from a personal detail table would be the attribute of address. The
address can be divided into a street, area, district, and state.

3. Single-valued Attribute:

Those attributes which can have exactly one value are known as single valued
attributes. They contain singular values, so more than one value is not allowed.

For example, the DOB of a student can be a single valued attribute. Another example is
gender because one person can have only one gender.

4. Multi-valued Attribute:

Those attributes which can have more than one entry or which contain more than one
value are called multi valued attributes.

27
In the Entity Relationship (ER) diagram, we represent the multi valued attribute by
double oval representation.

For example, one person can have more than one phone number, so that it would be a
multi valued attribute. Another example is the hobbies of a person because one can
have more than one hobby.

.5. Derived Attribute:

Derived attributes are also called stored attributes. When one attribute can be derived
from the other attribute, then it is called a derived attribute. We can do some calculations
on normal attributes and create derived attributes.

For example, the age of a student can be a derived attribute because we can get it by
the DOB of the student.

Another example can be of working experience, which can be obtained by the date of
joining of an employee.

In the ER diagram, we represent the derived attributes by a dotted oval shape.

Entity Sets:

An entity set is a group of similar entities and these entities can have attributes.
In terms of DBMS, an entity is a table or attribute of a table in database, so by showing
relationship among tables and their attributes, ER diagram shows the complete logical
structure of a database.

28
An entity set is a collection or set of all entities of a particular entity type at any point in
time. The type of all the entities should be the same.

Example:
 The collection of all the students from the student table at a particular instant of
time is an example of an entity set.
 The collection of all the employees from the employee table at a particular instant
of time is an example of an entity set.

Relationships and Relationship sets

Relationships:
A relationship in DBMS is the way in which two or more data sets are linked, i.e., any
association between two entity types is called a relationship.
Relationships tell us how two or more datasets/entities are linked. It helps in storing
the data in separate tables.
A relationship is represented by a diamond shape in an ER diagram.

Types of relationships between 2 entities

1. One-to-One Relationship
According to this relationship, a single record in Table A is related to a single record in
Table B, and vice-versa is also true.

29
Example: Consider 2 entities -
Person - ID, name, age, address
Passport - passport_ID, passport_no

Only one passport is allowed per person, and each passport belongs to only one person.
Therefore it is a one-to-one relationship.

2. One-to-Many or Many-to-One Relationship

According to this relationship, each record of Table A can be related to one or

more than one record of table B.

Example: Consider 2 entities -

Each customer can have multiple accounts, so it is a one-to-many relationship.

But if we see the other way, many accounts are held by one customer only.
Therefore it is a many-to-one relationship also.

3. Many-to-Many Relationship
According to this relationship, each record of Table A can be related to one or
more than one record of table B and vice versa is also true, i.e., each record of
Table B can be connected to one or more than one record of table A.

30
Example: Consider 2 entities -

Each customer can buy more than 1 product, and many customers can buy one.
Therefore it is a many-to-many relationship.

Relationship sets
A relationship set is a set of relationships of same type. The number of entity sets that
participate in a relationship set is termed as the degree of that relationship set.

31
Additional Features of the ER Model in DBMS:
Three new concepts were added to the existing ER Model, they were:

1. Generalization
2. Specialization
3. Aggregration

1. Generalization:
Generalization is a process of extracting common properties from a set of entities and
creating a generalized entity from it. It is a bottom-up approach, and it helps to reduce
the size and complexity of the schema.
Example: Let us take two low-level entities as Car and Bus, and these two will have
many common attributes and some specific attributes. And We will generalize and link
the common attributes to the newly formed high-level entity named Vehicle.

2. Specialization
Specialization is opposite to Generalization. In this, entity is divided into subentities bases on
their charactertics(distingvishing features). It breaks an entity into multiple entities from higher
level to lower level. It is a top down approach.

32
3. Aggregration
Aggregation refers to the process by which entities are combined to form a single
meaningful entity. The specific entities are combined because they do not make sense
on their own. To establish a single entity, aggregation creates a relationship that
combines these entities. The resulting entity makes sense because it enables the
system to function well.

33
Module-2
Unit-1
SYLLABUS:
Unit-1: The Relational Model

Notes:
Topic-1: Introduction to the Relational Model
The relational model represents how data is stored in Relational Databases. A
relational database consists of a collection of tables, each of which is assigned a
unique name. Consider a relation STUDENT with attributes ROLL_NO, NAME,
ADDRESS, PHONE, and AGE shown in the table.
Table Student
ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

34
ROLL_NO NAME ADDRESS PHONE AGE

4 SURESH DELHI 18

Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence

Advantages of the Relational Model

 Simple model: Relational Model is simple and easy to use in comparison to other
languages.
 Flexible: Relational Model is more flexible than any other relational model present.
 Secure: Relational Model is more secure than any other relational model.
 Data Accuracy: Data is more accurate in the relational data model.
 Data Integrity: The integrity of the data is maintained in the relational model.
 Operations can be Applied Easily: It is better to perform operations in the
relational model.
Topic-2: Integrity Constraints Over Relations:
Integrity constraints are rules that help to maintain the accuracy and consistency of data in a
database. For example, a simple integrity constraint in DBMS might state that all students must
have a valid Roll Number. This would prevent someone from accidentally entering an invalid
roll number into the database.
Integrity constraints can also be used to enforce relationships between tables. For example, if
a student can only have one aadhaar number, then an integrity constraint can be used to ensure
that only one aadhaar number is entered for each student.

Different types of Integrity Constraints

 Domain Constraint
 Entity Integer Constraint

35
 Referential Integrity Constraint
 Key Constraints

Domain Constraint:
A domain constraint is a restriction on the values that can be stored in a column. Strings,
character, time, integer, currency, date etc. Are examples of the data type of domain
constraints.

Entity integrity:
Entity integrity constraints would ensure that null values are not entered into any required
columns. It states that primary key value can't be null. This is because the primary key value is
used to identify individual rows in relation and if the primary key has a null value, then we can't
identify those rows
For example, if you have a column for "roll_number" an entity integrity constraint in DBMS would
ensure that this column cannot contain any null values.

Referential Integrity Constraint:

A referential integrity constraint is a restriction on how foreign keys can be used. A foreign key
is a column in one table that references a primary key in another table.
For example, let's say you have a table of Students and a table of Marks. The "roll_number"
column in the Marks table would be a foreign key that references the "roll_number" column in
the Students table.

Key Constraints:
A key constraint is a rule that defines how data in a column(s) can be stored in a table. A key is
composed of one or more columns whose values uniquely identify each row in the table. There
are several different types of key constraints in DBMS, each with its own specific purpose.

Topic-3: Enforcing Integrity Constraints:

Integrity Constraints are specified when a relation is created and enforced when a relation is
modified. The impact of domain, PRIMARY KEY, and UNIQUE constraints is straightforward: If
an insert, delete, or update command causes a violation, it is rejected.
Every potential Integrity violation is generally checked at the end of each SQL statement
execution, although it can be deferred until the end of the transaction executing the statement.
On the other hand, insertions of Students tuples do not violate referential integrity, and deletions
of Students tuples could cause violations.

36
SQL provides several alternative ways to handle foreign key violations. We must consider three
basic questions:
1. If an Enrolled row is inserted, with a student_id column value that does not appear in
any row of the Students table
In this case, the INSERT command is simply rejected.

2. If a Students row is deleted

The options are:
Delete all Enrolled rows that refer to the deleted Students row.

3. If the primary key value of a Students row is updated

The options here are similar to the previous case. Replace ON DELETE with ON UPDATE

Topic-4: Querying Relational Data:

A relational database query is a question about the data, and the answer consists of a new
relation containing the result. For example, we might want to find all students AGE less than 18
or all students enrolled in perticular course.
The SELECT statement is used to fetch the data from a database table which returns this data
in the form of a result table. These result tables are called result-sets.
Syntax
SELECT column1, column2, ...
FROM table_name;
If you want to select all the fields available in the table, use the following syntax:
Syntax
SELECT * FROM table_name;

The symbol ´*´ means that we retain all fields of selected tuples in the result.
We can retrieve rows corresponding to students who are younger than 18 withthe following SQL
query:

Example
SELECT * FROM Students WHERE age < 18;

The condition age < 18 in the WHERE clause specifies that we want to select only tuples in
which the age field has a value less than 18.

37
In addition to selecting a subset of tuples, a query can extract a subset of the fields of each
selected tuple. we can compute the student_id and First_name of students who are younger
than 18 with the following query:

Example:
SELECT ID,FirstName FROM Students WHERE age < 18;

SELECT data from Multiple Tables

We can also combine information from multiple tables.

Syntax
SELECT table1.column1, table2.column2
FROM table1, table2
WHERE table1.column1 = table2.column1;

SELECT [Link], [Link]

FROM Students AS S, Enrolled AS E
WHERE [Link] = [Link];

Topic-5: Logical Database Design:

The purpose of logical design is to translate the conceptual representation to the logical
structure of the database, which includes designing the relations.

Logical database design is the process of transforming (or mapping) a conceptual schema of
the application domain into a schema for the data model underlying a particular DBMS, such
as the relational or object-oriented data model.

Logical database design is the process of deciding how to arrange the attributes of the entities
in a given business environment into database structures, such as the tables of a relational
database. The goal of logical database design is to create well structured tables that properly
reflect the company's business environment. The tables will be able to store data about the
company's entities in a non-redundant manner and foreign keys will be placed in the tables so
that all the relationships among the entities will be supported.

38
Logical database design helps you define and communicate your business' information
requirements. When you create a logical database design, you describe each piece of
information you need to track and the relationships among, or the business rules that govern,
those pieces of information.
Once you create a logical database design, you can verify with users and management that the
design is complete (that is, it contains all of the data that must be tracked) and accurate (that
is, it reflects the correct table relationships and enforces the business rules).

Topic-6: Introduction to Views:

View in dbms are virtual table obtained as a result of a query in SQL i.e. Structured Query
Language. Whenever the end-user requests the data according to specified needs rather than
complete information of the table.

A view of data in DBMS is created to display the specific information that a user needs, rather
than revealing sensitive portions of the schema that should remain confidential to the owner.
Essentially, the data view in a DBMS presents a selective representation instead of exposing
the complete database. It’s important to note that any modifications made to the database will
also be reflected in the corresponding view.

How to get a view of data in DBMS

Acquiring a view in DBMS involves formulating a set of statements designed to extract specific
elements from the database. These statements then construct a table on the user’s device,
organizing the outcomes in a tabular format. The pseudocode to obtain a data view can be
outlined as follows:

create

view view

select columnName1, columnName2,...

from tableName

where condition;

Topic-7: Destroying Views:

The SQL DROP VIEW statement is used to delete an existing view, along with its definition and
other information. Once the view is dropped, all the permissions for it will also be removed. We
can also drop indexed views with this statement.
39
Suppose a table is dropped using the DROP TABLE command and it has a view associated to
it, this view must also be dropped explicitly using the DROP VIEW command.

Example:

DROP VIEW CUSTOMERS_VIEW1;

Topic-8: Altering Views:

Use the ALTER VIEW command to change the name or the owner of a view or to refresh or
suspend a materialized view. Use the ALTER VIEWS ON command to refresh or suspend all
the materialized views for a base table. Views are read-only. The system does not allow an
insert, update, or delete on a [Link] ALTER VIEW statement changes the characteristics of
a view.

The ALTER VIEW statement modifies an existing view by altering a reference type column to
add a scope. The ALTER VIEW statement also enables or disables a view for use in query
optimization.
The ALTER VIEW command modifies views created using the CREATE VIEW command or a
view projected from a persistent class. The altered view replaces the existing view, so you
cannot modify specific columns in a view.

Topic-9: Destroying Tables:

Drop command delete the table existence completely i.e drop statement destroys the existing
database object of that particular table, index or view.

After dropping a table if you try to use the table then compiler shows an error as ” table or view
does not exist

Topic-10: Altering Tables:

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table

1. Alter command in SQL is used to make modifications to the columns in the existing table.
2. It is used to add columns, delete columns, drop constraints, renaming the columns, changing
the data type and data type size of the column existing in the table.

40
.
MODULE-2: Unit-2
Topic-1: Relational Algebra:
Relational algebra in dbms helps in the design, implementation, and optimization of queries and
provides a foundation for query languages such as SQL. It also helps to ensure that queries
are correct, efficient, and scalable.

Relational Algebra is a procedural query language. Relational algebra mainly provides a

theoretical foundation for relational databases and SQL. The main purpose of using Relational
Algebra is to define operators that transform one or more input relations into an output relation.
Given that these operators accept relations as input and produce relations as output, they can
be combined and used to express potentially complex queries that transform potentially many
input relations As it is pure mathematics, there is no use of English Keywords in Relational
Algebra and operators are represented using symbols.

a) SELECT Operation in Relational Algebra

Select operation chooses the subset of tuples from the relation that satisfies the
given condition mentioned in the syntax of selection. The selection operation is also
known as horizontal partitioning since it partitions the table or relation horizontally.

Notation:

σ c(R)
where ‘c’ is selection condition which is a boolean expression(condition), we can
have a single condition like Roll= 3 or combination of condition like X>2 AND Y<1,
symbol ‘σ (sigma)’ is used to denote select(choose) operator,

σ Place = 'Mumbai' or Salary >= 1000000 (Citizen)

σ Department = 'Analytics'(σLocation = 'NewYork'(Manager))

b) Projection:
This operation selects certain required attributes, while discarding other attributes.
41
Notation –
πA (R)
where ‘A’ is the attribute list, it is the desired set of attributes from the attributes of
relation(R),
symbol ‘π(pi)’ is used to denote the Project operator,
R is generally a relational algebra expression, which results in a relation.

Topic-2 Set Operations:

The SQL Set operation is used to combine the two or more SQL SELECT statements.

Types of Set Operation

1. UNION
2. UNIONALL
3. INTERSECT
4. MINUS

1. Union
o The SQL Union operation is used to combine the result of two or more SQL SELECT
queries.
o In the union operation, all the number of datatype and columns must be same in both
the tables on which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.

Example:

Union SQL query will be:

SELECT * FROM First
UNION
SELECT * FROM Second;

2. Union All
Union All operation is equal to the Union operation. It returns the set without removing
duplication and sorting the data.

42
Example:
Union All query will be like:

SELECT * FROM First

UNION ALL
SELECT * FROM Second;

3. Intersect
o It is used to combine two SELECT statements. The Intersect operation returns the
common rows from both the SELECT statements.
o In the Intersect operation, the number of datatype and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.

Example:

Intersect query will be:

SELECT * FROM First

INTERSECT
SELECT * FROM Second;

4. Minus
o It combines the result of two SELECT statements. Minus operator is used to display the
rows which are present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.

Example

Minus query will be:

SELECT * FROM First

MINUS
SELECT * FROM Second;

43
Topic-3 Renaming:
The rename operator ρ is one of the unary operators in relational algebra and is used to
rename relations in a DBMS. In other words, some relations or attributes may have complex
names and can be changed to make writing queries easier.

Types
Three following types of renaming can be done:

 Renaming a relation
 Renaming an attribute
 Renaming both

Let's look at the differences between them using examples:

Rename a relation
Suppose we have a relation named Students and we want to change it
to FinalYrStudents, the rename operation works as follows:

a base for SQL (Structured Query Language) and is used to rename relations.

Syntax

A rename query looks like the following:

ρnew_relation_name(new_attribute_name_list)(R)

The new_relation_name contains the name we want to replace with the original relation R and
the new_attribute_name_list contains the changed attribute names of R.

ρFinalYrStudents(Students)

Rename an attribute
Suppose we have a relation named Students and we want to change its
attributes StudentID, StudentName to SID and SName, the rename operation works as follows:

ρ(SID,SName)(Students)
The attributes will be ordered as same as the tables and must be renamed in the
same order. If only selective attributes are to be changed, rewrite the original attribute
names of those that are supposed to be unchanged.

44
Rename both
Next, we'll change both the relation name and attributes of
the Students class:
()()ρFinalYrStudents(SID,SName)(Students)

Topic-4: Joins:
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to
combine two or more tables".

In SQL, JOIN clause is used to combine the records from two or more tables in a database.

Types of SQL JOIN

1. INNER JOIN
2. LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN

1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the
condition is satisfied. It returns the combination of all rows from both the tables where the
condition satisfies.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....

2. FROM table1
3. INNER JOIN table2
4. ON table1.matching_column = table2.matching_column;

2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right
table. If there is no matching join value, it will return NULL.

45
Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....

2. FROM table1
3. LEFT JOIN table2
4. ON table1.matching_column = table2.matching_column;

In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the
matched values from the left table. If there is no matching in both tables, it will return NULL.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....

2. FROM table1
3. RIGHT JOIN table2
4. ON table1.matching_column = table2.matching_column;

4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables
have all the records from both tables. It puts NULL on the place of matches not found.

Syntax

1. SELECT table1.column1, table1.column2, table2.column1,....

2. FROM table1
3. FULL JOIN table2
4. ON table1.matching_column = table2.matching_column;

Topic-5: Relational Calculus:

Relational calculus is a non-procedural query language. In the non-procedural query language,
the user is concerned with the details of how to obtain the end results. The relational calculus
tells what to do but never explains how to do.

It is based on Predicate calculus, a name derived from branch of symbolic language. A

predicate is a truth-valued function with arguments. On substituting values for the arguments,

46
the function result in an expression called a proposition. It can be either true or false. It is a
tailored version of a subset of the Predicate Calculus to communicate with the relational
database.

1. Tuple Relational Calculus (TRC)

It is a non-procedural query language which is based on finding a number of tuple variables
also known as range variable for which predicate holds true. It describes the desired information
without giving a specific procedure for obtaining that information. The tuple relational calculus
is specified to select the tuples in a relation. In TRC, filtering variable uses the tuples of a
relation. The result of the relation can have one or more tuples.

For example:

{ [Link] | Author(T) AND [Link] = 'database' }

Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.

2. Domain Relational Calculus (DRC)

The second form of relation is known as Domain relational calculus. In domain relational
calculus, filtering variable uses the domain of attributes. Domain relational calculus uses the
same operators as tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses
Existential (∃) and Universal Quantifiers (∀) to bind the variable. The QBE or Query by example
is a query language related to domain relational calculus.

47
For example:

{< article, page, subject > | ∈ javatpoint ∧ subject = 'database'}

Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.

Module-3
Unit-1:
Syllabus:

Form of Basic SQL Query:

The general form of the SELECT command is:
SELECT select_list FROM source
[ WHERE condition(s) ]
[ GROUP BY expression ]
[ HAVING condition ]
[ ORDER BY expression ] ;
The SELECT statement indicates that you which to query and retrieve information from a
database.

 The SELECT statement indicates that you which to query and retrieve information from
a database. The select_list specifies the type of information (or column names) to

48
retrieve. The keyword ALL or the wildcard character asterisk (*) could be used to signify
all columns in the database. Also, the keyword DISTINCT could be used to discard
duplicate records and retrieve only the unique records for the specified columns.
 The FROM clause is the only required clause in the SELECT statement. The FROM
clause specifies the specific database tables to retrieve data from.
 The WHERE clause limits the results to those records (or rows) that meet some particular
conditions (optional).
 The GROUP BY clause specifies the format of the output. The expression specifies a
column listing such that all rows contained in the specified columns will be aggregated
together (optional).
 The HAVING clause specifies the specific conditions to group by (optional).
 The ORDER BY clause specifies whether to output the query result in ascending or
descending order. The expression specifies a column listing to order by (optional).

Example:
the command would be:

SELECT * FROM source;

SELECT ALL FROM source;

Examples of Basic SQL Queries:

The following SQL query displays all the values of each column from the above Student_records table:
SELECT * FROM Student_Records;

The output of the above query is:

Student_ID First_Name Address Age Percentage Grade

201 Akash Delhi 18 89 A2
202 Bhavesh Kanpur 19 93 A1

49
Example 2:

The following query displays the values of particular column from the
above Student_Record table:

SELECT Student_Id, Age, Percentage, Grade FROM Employee;

Student_ID Age Percentage Grade
201 18 89 A2
202 19 93 A1

Example 3:

The following query shows the record of those employees from the above table whose
Emp_Panelty is 500:

SELECT * FROM Employee_Details WHERE Emp_Panelty = 500;

his SELECT query displays the following table in result:

Employee_Id Emp_Name Emp_City Emp_Salary Emp_Panelty

101 Anuj Ghaziabad 25000 500

103 Vivek Kolkata 35000 500

Example 4:

The following SELECT with GROUP BY query lists the number of cars of the same price:

SELECT COUNT (Car_Name), Car_Price FROM Cars_Details GROUP BY Car_Price;

Car_Number Car_Name Car_Amount Car_Price

2578 Creta 3 1000000

9258 Audi 2 900000

50
Example 5:

The following query sorts the salary of employees in descending order from the above
Employee_Order table:

SELECT * FROM Employee_Order ORDER BY Emp_Salary DESC;

Emp_Id Emp_Name Emp_Salary Emp_City

204 Anuj 90000 Goa

203 Rashet 80000 Jaipur

205 Sumit 50000 Delhi

UNION:
The UNION operator is used to combine the result-set of two or
more SELECT statements.

 Every SELECT statement within UNION must have the same number of columns
 The columns must also have similar data types
 The columns in every SELECT statement must also be in the same order

UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

Example:
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;

51
UNION ALL Syntax
The UNION operator selects only distinct values by default. To allow duplicate values,
use UNION ALL:

SELECT column_name(s) FROM table1

UNION ALL
SELECT column_name(s) FROM table2;

Example:
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers
ORDER BY City;

INTERSECT:
The INTERSECT operator in SQL is used to retrieve the records that are
identical/common between the result sets of two or more tables.

Let us consider the below tables as an example to get a better

understanding −

52
EXCEPT:
The EXCEPT operator in SQL is used to retrieve all the unique records
from the left operand (query), except the records that are present in the
result set of the right operand (query).

For better understanding consider two tables with records as shown in

the following image –

If we perform the EXCEPT operator on the above two tables to retrieve

the names, it will display the distinct records only from the first table

53
which are not in common with the records of the second table.

Introduction to Nested Queries:

A nested query in SQL contains a query inside another query. The outer query will use the
result of the inner query. For instance, a nested query can have two SELECT statements, one
on the inner query and the other on the outer query.

The general syntax of nested queries will be:

SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE]
)
Example: IN

Select all employees who won an award.

SELECT id, name FROM Employees

WHERE id IN (SELECT employee_id FROM Awards);

Correlated Nested Queries:

In correlated nested queries, the inner query uses values from the outer query, and the
execution order is different from that of independent nested queries. First, the outer query
selects the first row. Inner query uses the value of the selected row. It executes its query and
returns a result set.
Correlated subqueries are used for row-by-row processing. Each subquery is executed once

54
for every row of the outer query.

A correlated subquery is evaluated once for each row processed by the parent statement.
The parent statement can be a SELECT, UPDATE, or DELETE statement.

SELECT column1, column2, ....

FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 =
outer.expr2);
A correlated subquery is one way of reading every row in a table and comparing values in
each row against related data. It is used whenever a subquery must return a different result
or set of results for each candidate row considered by the main query. In other words, you
can use a correlated subquery to answer a multipart question whose answer depends on the
value in each row processed by the parent statement.

55
Set Comparison operators: IN, ANY, ALL, EXISTS
Both the ANY and ALL operators can be used with various comparison operators such as =,
<>, !=, >, <, >=, <=.

1. IN
The IN operator allows the specification of a set of expressions (or constants) that are
subsequently used for the query search. This operator can be applied to a subquery
for the same reason—that is, when the result of an inner query contains a set of
values. Example 6.52 shows the use of the IN operator in a subquery.
Get full details of all employees whose department is located in Dallas:
SELECT *
FROM employee
WHERE dept_no IN
(SELECT dept_no
FROM department
WHERE location = 'Dallas');

2. <> ANY
The <> ANY operator returns true if the value is not equal to any value in the list.
SELECT * FROM Orders
WHERE Amount <> ANY (SELECT Amount FROM Orders WHERE Amount IN (100,
200));
This query will return all orders where the amount is not equal to either
100 or 200.

3. = ALL
The = ALL operator returns true if the value is equal to all values in the list.
SELECT * FROM Orders
WHERE Amount = ALL (SELECT Amount FROM OrdersWHERE Amount IN (500,
500));
This query will return all orders where the amount is equal to both 500 and 500 (which
essentially means equal to 500).
when using ALL, the condition must be true when compared to every single result in
the set.

4. EXISTS:
The EXISTS operator is used to test for the existence of any record in a subquery.

56
The EXISTS operator returns TRUE if the subquery returns one or more records.

EXISTS Syntax
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE condition);

EXAMPLE:
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT ProductName FROM Products WHERE [Link] =
[Link] AND Price = 22);

Aggregate Operators:
o Aggregation operators is used to perform the calculations on multiple rows of a single
column of a table. It returns a single value.
o It is also used to summarize the data.

1. COUNT:
o COUNT function is used to Count the number of rows in a database table. It can work on both
numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.
COUNT(*) considers duplicate and Null.

57
Syntax

COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )

Example:
SELECT COUNT(*)
FROM PRODUCT_MAST;

Example:
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;

Example:
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;

Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;

2. SUM:
Sum function is used to calculate the sum of all selected columns. It works on numeric fields
only.

Syntax

SUM()
or
SUM( [ALL|DISTINCT] expression )

58
SELECT SUM(COST)
FROM PRODUCT_MAST;

3. AVG:
The AVG function is used to calculate the average value of the numeric type. AVG function
returns the average of all non-Null values.

Syntax

AVG()
or
AVG( [ALL|DISTINCT] expression )

Example:

SELECT AVG(COST)
FROM PRODUCT_MAST;

4. MAX:
MAX function is used to find the maximum value of a certain column. This function determines
the largest value of all selected values of a column.

Syntax

MAX()
or
MAX( [ALL|DISTINCT] expression )

Example:

SELECT MAX(RATE)
FROM PRODUCT_MAST;

59
5. MIN:
MIN function is used to find the minimum value of a certain column. This function determines
the smallest value of all selected values of a column.

Syntax

MIN()
or
MIN( [ALL|DISTINCT] expression )

Example:

SELECT MIN(RATE)
FROM PRODUCT_MAST;

Null Values::
Null values are special values in DBMS that represent values which are
unknown and are always different from zero value.

NULL values could be possible because at the time of data entry information is not
available. So SQL supports a special value known as NULL which is used to represent
the values of attributes that may be unknown or not apply to a tuple. SQL places a
NULL value in the field in the absence of a user-defined value.

For example, the Apartment_number attribute of an address applies only to addresses

that are in apartment buildings and not to other types of residences.
So, NULL values are those values in which there is no data value in the particular field
in the table.
Importance of NULL Value

 It is important to understand that a NULL value differs from a zero value.

 A NULL value is used to represent a missing value,
 The value unknown. Value exists but is not known
 Value not available

Comparison using Null Values:

To handle comparison with NULL values:
 IS NULL – equals the operation '= NULL' (records with NULL values)

60
 IS NOT NULL – equals the operation '<> NULL' (records without NULL values)

IS NULL:

It is not possible to test for NULL values with comparison operators, such as =, <, or
<>.

We will have to use the IS NULL and IS NOT NULL operators instead. The IS NULL operator
is used to test for empty values (NULL values).

IS NULL Syntax

SELECT column_names
FROM table_name
WHERE column_name IS NULL;

The following SQL lists all customers with a NULL value in the "Address" field:

Example:
SELECT CustomerName, ContactName, Address
FROM Customers
WHERE Address IS NULL;

IS NOT NULL:

The IS NOT NULL operator is used to test for non-empty values (NOT NULL values).

IS NOT NULL Syntax

SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;

The following SQL lists all customers with a value in the "Address" field:

Example
SELECT CustomerName, ContactName, Address
FROM Customers
WHERE Address IS NOT NULL;

Logical Connectives AND, OR and NOT:

Logical operators are used to specify conditions in the structured query language They

61
are also used to serve as conjunctions for multiple conditions in a statement.

The different logical operators are shown below −

AND − Returns TRUE if both component conditions are TRUE. Returns FALSE if either
is FALSE; otherwise returns UNKNOWN.

For example,

select * from emp where job=’manager’ AND deptno=20;

OR − Return TRUE if either component condition is TRUE. Return FALSE if both are
FALSE. Otherwise, return UNKNOWN.

For example,

select * from emp where job=’manager’ OR deptno=20;

NOT − Returns TRUE if the condition is FALSE. Returns FALSE, if it is TRUE. If it is

UNKNOWN, it remains UNKNOWN.

For example,

select * from emp where NOT (job is NULL)

select * from emp where NOT(salary between 2000 AND 5000);

Outer Joins:
Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins: LEFT JOIN returns only unmatched rows from the
left table, as well as matched rows in both tables

Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins:

 LEFT JOIN returns only unmatched rows from the left table, as well as matched rows
in both tables.
 RIGHT JOIN returns only unmatched rows from the right table , as well as matched
rows in both tables.

62
 FULL OUTER JOIN returns unmatched rows from both tables,as well as matched rows
in both tables.

Note: LEFT JOIN is also refered to as OUTER LEFT JOIN. RIGHT JOIN is also refered to
as OUTER RIGHT JOIN. FULL OUTER JOIN is also refered to as OUTER JOIN.

o In the SQL outer JOIN, all the content from both the tables is integrated together.
o Even though the records from both the tables are matched or not, the matching and non-
matching records from both the tables will be considered an output of the outer join in
SQL.

Disallowing Null values:

If you want to disallow null values in a column: Attribute-based schema – allows nulls by default;
add use="required" to the attribute. Element-based schema – disallows nulls by default; the
default is nillable="false" , so the column below is not nullable
To disallow NULL values in any of the columns, add NOT NULL to the definition of each one.

Complex integrity constraints in SQL:

I
ntegrity constraints in SQL are rules that help ensure the accuracy and reliability of
data in the database. They ensure that certain conditions are met when data is
inserted, updated, or deleted. While primary key, unique, and foreign key constraints
are commonly used, SQL allows for more complex constraints through the use of
CHECK and custom triggers.

Using CHECK Constraints

Ensuring a range: You might want a column to only have values within a certain range.
Example:
CREATE TABLE Employees (
ID INT PRIMARY KEY,
Age INT CHECK (Age >= 18 AND Age <= 30)
);

63
Triggers and Active Databases:
Triggers:

A trigger is a procedure which is automatically invoked by the DBMS in response to

changes to the database, and is specified by the database administrator (DBA). A
database with a set of associated triggers is generally called an active database.

Use of trigger

Triggers may be used for any of the following reasons −

 To implement any complex business rule, that cannot be implemented using integrity constraints.
 Triggers will be used to audit the process. For example, to keep track of changes made to a table.
 Trigger is used to perform automatic action when another concerned action takes place.

The syntax to create database trigger is as follows −

CREATE [OR REPLACE] TRIGGER triggername

{BEFORE|AFTER}
{DELETE|INSERT|UPDATE[OF COLUMNS]} ON table
[FOR EACH ROW {WHEN condition]]
[REFERENCE [OLD AS old] [NEW AS new]]
BEGIN
PL/SQL BLOCK
END.

Active Databases:

An active Database is a database consisting of a set of triggers. These databases are

very difficult to be maintained because of the complexity that arises in understanding
the effect of these triggers. In such database, DBMS initially verifies whether the
particular trigger specified in the statement that modifies the database is activated or
not, prior to executing the statement. If the trigger is active then DBMS executes the
condition part and then executes the action part only if the specified condition is
evaluated to true. It is possible to activate more than one trigger within a single
statement. In such situation,

64
Features of Active Database:

1. It possess all the concepts of a conventional database i.e. data modelling

facilities, query language etc.
2. It supports all the functions of a traditional database like data definition, data
manipulation, storage management etc.
3. It must be able to evaluate conditions and to execute actions.

MODULE-3: UNIT-2
Topic-1: Problem Caused by Redundancy

Redundancy in DBMS is having several copies of the same data in the database. Redundancy
in DBMS occurs when the database is not normalized. Redundancy causes insertion, deletion,
and updation anomalies. Redundancy can be avoided by normalizing the database, maintaining
master data

Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then insertion will
not be possible till the time course is decided for the student.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU 1

This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.

Deletion Anomaly:
If the details of students in this table are deleted then the details of the college will also get
deleted which should not occur by common sense. This anomaly happens when the deletion
of a data record results in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other information in the table
as well.

65
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the database
which will be time-consuming and computationally costly.

Topic-2: Decompositions

The process of breaking up or dividing a single relation into two or more sub relations is called
as decomposition of a relation. No information is lost from the original relation during
decomposition. When the sub relations are joined back, the same relation is obtained that was
decomposed.

Types of Decomposition
Decomposition is of two major types in DBMS:

 Lossless
 Lossy

1. Lossless Decomposition
A decomposition is said to be lossless when it is feasible to reconstruct the original relation R
using joins from the decomposed tables. It is the most preferred choice. This way, the
information will not be lost from the relation when we decompose it. A lossless join would
eventually result in the original relation that is very similar.

For example,

Let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider that it is
decomposed into: A1, A2, A3, . . . . An; with instance: a1, a2, a3, . . .. an, If a1 ⋈ a2 ⋈ a3 . . .
. ⋈ an, then it is known as ‘Lossless Join Decomposition’.

Just like the name suggests, whenever we decompose a relation into multiple relational
schemas, then the loss of data/information is unavoidable whenever we try to retrieve the
original relation.

2. Lossy Decomposition
Just like the name suggests, whenever we decompose a relation into multiple relational
schemas, then the loss of data/information is unavoidable whenever we try to retrieve the
original relation.

66
Properties of Decomposition
Decomposition must have the following properties:

1. Decomposition Must be Lossless

2. Dependency Preservation

3. Lack of Data Redundancy

Topic-3: Problem Related to Decomposition

1. Loss of Information.
2. Loss of Functional Dependency. ...
3. Increased Complexity. ...
4. Redundancy.
5. Performance Overhead.

1. Loss of Information

Non-loss decomposition: When a relation is decomposed into two or more smaller relations,
and the original relation can be perfectly reconstructed by taking the natural join of the
decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy
decomposition."

Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it
into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table
using natural joins.

2. Loss of Functional Dependency

Once tables are decomposed, certain functional dependencies might not be preserved, which
can lead to the inability to enforce specific integrity constraints.

Example: If you have the functional dependency `A → B` in the original table, but in the
decomposed tables, there is no table with both `A` and `B`, this functional dependency can't be
preserved.

3. Increased Complexity

67
Decomposition leads to an increase in the number of tables, which can complicate
queries and maintenance tasks. While tools and ORM (Object-Relational Mapping)
libraries can mitigate this to some extent, it still adds complexity.

4. Redundancy

Incorrect decomposition might not eliminate redundancy, and in some cases, can
even introduce new redundancies.

5. Performance Overhead

An increased number of tables, while aiding normalization, can also lead to more
complex SQL queries involving multiple joins, which can introduce performance
overheads.

Topic-4: Reasoning about FDs

Functional dependencies play a vital role in the normalization process in relational database
design. They help in defining the relationships between attributes in a relation and are used
to formalize the properties of the relation and drive the process of decomposition.

1. Trivial Dependency
- If Y is a subset of X, then the dependency X -> Y is trivial.
- For example, in {A, B} -> {A}, the dependency is trivial because A is part of {A, B}.
Example: For attributes A,B,C:

 \( A \rightarrow A \) is a trivial dependency because an attribute always determines

itself.
 \( AB \rightarrow A \) is a trivial dependency because the combined attributes A and B
always determine A as it's a subset.
 \( ABC \rightarrow AC \) is a trivial dependency for the same reason; the combined
attributes A,B, and C always determine A and C.

68
2. Full Functional Dependency
- An attribute functionally depends on a set of attributes, X, and does not functionally depend
on any proper subset of X.
Example: Consider a relation StudentCourses that has the following attributes:

 StudentID (unique identifier for each student)

 CourseID (unique identifier for each course)
 Instructor (name of the instructor teaching the course)
The relation is used to keep track of which student is enrolled in which course and who the
instructor for that course is.
Now, assume we have the following functional dependency:
(StudentID,CourseID) → Instructor
Thus, Instructor is fully functionally dependent on the combined attributes StudentID and
CourseID.

3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
Example: Consider a relation Employees with the following attributes:
EmployeeID (unique identifier for each employee)
EmployeeName
Department (department in which the employee works)
DepartmentLocation (location of the department)
Now, let's consider the following functional dependencies:
EmployeeID → Department
Department → DepartmentLocation
From the above functional dependencies:
An EmployeeID determines the Department an employee works in.
A Department determines its DepartmentLocation.
However, the DepartmentLocation is also dependent on the EmployeeID through Department.
This means the DepartmentLocation has a transitive dependency on EmployeeID via
Department.

69
.
Topic-5: Normal Forms

Normalization is the process of minimizing redundancy from a relation or set of

relations. Redundancy in relation may cause insertion, deletion, and update
anomalies. So, it helps to minimize the redundancy in relations. Normal forms are
used to eliminate or reduce redundancy in database tables.

In database management systems (DBMS), normal forms are a series of guidelines

that help to ensure that the design of a database is efficient, organized, and free from
data anomalies. There are several levels of normalization, each with its own set of
guidelines, known as normal forms.

First, Second, Third & BCNF Normal Forms:

 First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.

The first normal form (1NF) is the first step in normalizing a table by reducing
confusion and redundancy. In 1NF, we remove the redundant columns (columns
with the same name and/or data) and redundant fields (such as a full name field
when we already have first and last names), and add a primary key.

Example 1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has been shown in table 2.

70
 Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that each
column should be directly related to the primary key, and not to other columns.
A relation is said to be in the 2nd Normal Form in DBMS (or 2NF) when it is in the
First Normal Form but has no non-prime attribute functionally dependent on any
candidate key's proper subset in a relation. A relation's non-prime attribute refers
to that attribute that isn't a part of a relation's candidate key.

Example:
Look at the table given below: NOT IN SECOND NORMAL FORM :

CAND_ID SUBJECT_NO SUBJECT_FEE

111 S1 1000

222 S2 1500

111 S4 2000

444 S3 1000

444 S1 1000

222 S5 2000

SECOND NORMAL FORM:

71
Table 1:
CAND_NO SUBJECT_NO

111 S1

222 S2

111 S4

444 S3

444 S1

222 S5

Table 2:
SUBJECT_NO SUBJECT_FEE

S1 1000

S2 1500

S3 1000

S4 2000

S5 2000

 Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.

A given relation is said to be in its third normal form when it's in 2NF but has no
transitive partial dependency. Meaning, when no transitive dependency exists for
the attributes that are non-prime, then the relation can be said to be in 3NF.

Example:
EMPLOYEE_DETAIL table : NOT IN THIRD NORMAL FORM

72
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

222 Harry 201010 UP Noida

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

IN THIRD NORMAL FORM:

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

73
462007 MP Bhopal

 Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures
that each non-key attribute is dependent only on the candidate key.
BCNF is a normal form that applies to relational databases that have a primary key,
which is a column or a combination of columns that uniquely identifies each row in
a table. BCNF states that every non-key column in a table should depend on the
whole primary key, and not on any subset of it.

EXAMPLE: NOT IN BCNF

Stu_
Branch_ Cours
Stu_ID Stu_Branch Stu_Course Number e_No

101 CSE DBMS 1 201

Computer
101 CSE 1 202
Networks

VLSI
102 ECE 3 401
Technology

Mobile
102 ECE Communicati 3 402
on

IN BCNF:

Stu_Branch Table

Stu_ID Stu_Branch

101 CSE

102 ECE

74
Stu_Course Table

Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile Communication B_003 402

Stu_ID to Stu_Course_No Table

Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Topic-6: Lossless Join Decomposition

Lossless-join decomposition is a process in which a relation is decomposed into two or
more relations. This property guarantees that the extra or less tuple generation
problem does not occur and no information is lost from the original relation during the
decomposition. It is also known as non-additive join decomposition.

When the sub relations combine again then the new relation must be the same as the
original relation was before decomposition.

Consider a relation R if we decomposed it into sub-parts relation R1 and relation R2.

75
The decomposition is lossless when it satisfies the following statement −

 If we union the sub Relation R1 and R2 then it must contain all the attributes that are available in the
original relation R before decomposition.
 Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute. The
common attribute must contain unique data.

EXAMPLE:

R (A, B, C)

A B C

12 25 34

10 36 09

12 42 30

It decomposes into the two sub relations −

R1 (A, B)

A B

12 25

10 36

12 42

R2 (B, C)

B C

25 34

36 09

76
B C

42 30

Now, we can check the first condition for Lossless-join decomposition.

The union of sub relation R1 and R2 is the same as relation R.

R1U R2 = R

We get the following result −

A B C

12 25 34

10 36 09

12 42 30

The relation is the same as the original relation R. Hence, the above decomposition is
Lossless-join decomposition.

Topic-7: Dependency Preserving Decomposition

The dependency preservation decomposition is another property of decomposed relational

database schema D in which each functional dependency X -> Y specified in F either
appeared directly in one of the relation schemas Ri in the decomposed D or could be inferred
from the dependencies that appear in some Ri.

Decomposition D = { R1 , R2, R3,,.., ,Rm} of R is said to be dependency-preserving with respect

to F if the union of the projections of F on each Ri , in D is equivalent to F. In other words, R
⊂ join of R1, R1 over X. The dependencies are preserved because each dependency in F
represents a constraint on the database. If decomposition is not dependency-preserving,
some dependency is lost in the decomposition.
Example:
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
A relation R is decomposed into -

77
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and
R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
so, F' = F.
And so, F'+ = F+.

.
Topic-8: Schema Refinement in Database design

Schema Refinement is a technique of organizing the data in the database. It is a systematic

approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies.

The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.

Identifying and clearing the future problems in the database is called schema refinement.
In this refinement main problem is data redundancy. It is avoided by normalization
technique.
 The Basic Goal of Normalization is used to eliminate redundancy.
Redundancy refers to repetition of same data or duplicate copies of same data stored in
different locations.

Topic-9: Multivalued Dependencies

Multivalued Dependency (MVD) is a form of data dependency where two or more attributes,
other than the key attribute, are functionally dependent on each other, but not on the key itself.
Data errors and redundancies may result from Multivalued Dependency.

o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.

78
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

Topic-10: Fourth Normal Form

The Fourth Normal Form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three normal
forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal Form (BCNF)

A relation R is in 4NF if and only if the following conditions are satisfied:

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.

79
A table with a multivalued dependency violates the normalization standard of the Fourth
Normal Form (4NF) because it creates unnecessary redundancies and can contribute to
inconsistent data. To bring this up to 4NF, it is necessary to break this information into two
tables.

.
Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two

courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE

80
21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

MODULE-4: UNIT-1 : NOTES

UNIT-1: Syllabus: Transaction Management
Transaction Concept, Transaction State, Implementation of Atomicity and Durability,
Concurrent Executions, Serializability, Recoverability, Implementation of Isolation,
Testing for Serializability, Lock Based Protocols, Timestamp Based Protocols,
Validation Based Protocols, Multiple Granularity

Topics Covered:

81
Transaction Concept:
Transaction in Database Management Systems (DBMS) can be defined as a set of
logically related operations. It is the result of a request made by the user to access the
contents of the database and perform operations on it. It consists of various operations
and has various states in its completion journey. It also has some specific properties
that must be followed to keep the database consistent.

Operations of Transaction
A user can make different types of requests to access and modify the contents of a
database. So, we have different types of operations relating to a transaction. They are
discussed as follows:

i) Read(X)
A read operation is used to read the value of X from the database and store it in a
buffer in the main memory for further actions such as displaying that value.

ii) Write(X)
A write operation is used to write the value to the database from the buffer in the main
memory. For a write operation to be performed, first a read operation is performed to
bring its value in buffer, and then some changes are made to it,

iii) Commit
This operation in transactions is used to maintain integrity in the database. Due to
some failure of power, hardware, or software, etc., a transaction might get interrupted
before all its operations are completed. a commit operation is performed to the
changes made by a transaction permanently to the database.

iv) Rollback
This operation is performed to bring the database to the last saved state when any
transaction is interrupted in between due to any power, hardware, or software failure.

Transaction State:

82
These are different types of Transaction States :
Active State –
When the instructions of the transaction are running then the transaction is in active
state. If all the ‘read and write’ operations are performed without any error then it
goes to the “partially committed state”; if any instruction fails, it goes to the “failed
state”.

1. Partially Committed –
After completion of all the read and write operation the changes are made in main
memory or local buffer.

2. Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if
failure occurs in making a permanent change of data on Data Base.

3. Aborted State –
After having any type of failure the transaction goes from “failed state” to “aborted
state” and since in previous states, the changes are only made to local buffer or
main memory and hence these changes are deleted or rolled-back.

4. Committed State –
It is the state when the changes are made permanent on the Data Base and the
transaction is complete and therefore terminated in the “terminated state”.

5. Terminated State –
If there isn’t any roll-back or the transaction comes from the “committed state”,
then the system is consistent and ready for new transaction and the old
transaction is terminated.

Implementation of Atomicity and Durability::

83
Shadow Copy Scheme

This scheme is based on making copies of the Database. These copies of Databases are also
known as Shadow Copies. Suppose there is an active transaction T1. Now, A pointer called
Db_pointer is maintained on the disk, which points to the current copy of the Database. The
transaction T1 that want to update the Database will first create a complete copy of the
Database. All further operations are done on new copy and leave that original copy
untouched.

If at any point transaction T1 is aborted, then system delete new copy and the old copy
is not affected. It remain untouched.

if Transaction T1 success then OS (Operating System) will make sure that all the pages
of new copy of Database are written in the disk. Database system update Db_pointer
to point to the new copy of Database. Now, new copy will become current copy of
Database. Also old copy of Database will be deleted. Transaction T1 is said to
committed only if updated Db-pointer is written on disk.

Concurrent Executions:
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users. While working on the
database transactions, there occurs the requirement of using the database by multiple users
for performing different operations, and in that case, concurrent execution of the database is
performed.
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.

Serializability:
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule.
It helps in maintaining the transactions to execute simultaneously without interleaving one
another. In simple words, serializability is a way to check if the execution of two or more
transactions are maintaining the database consistency or not.
84
Types of Serializability
In DBMS, there are different types of serializable. Each type of serializable has some
advantages and disadvantages. The two most common types of serializable are view
serializability and conflict serializability.

1. Conflict Serializability
Conflict serializability is a type of conflict operation in serializability that operates the same data
item that should be executed in a particular order and maintains the consistency of the
database. In DBMS, each transaction has some unique value, and every transaction of the
database is based on that unique value of the database.

2. View Serializability
View serializability is a type of operation in the serializable in which each transaction should
produce some result and these results are the output of proper sequential execution of the data
item. Unlike conflict serialized, the view serializability focuses on preventing inconsistency in
the database. In DBMS, the view serializability provides the user to view the database in a
conflicting way.

Recoverability:
Recoverability is a property of database systems that ensures that, in the event of a failure or
error, the system can recover the database to a consistent state. Recoverability guarantees
that all committed transactions are durable and that their effects are permanently stored in
the database, while the effects of uncommitted transactions are undone to maintain data
consistency.
The recoverability property is enforced through the use of transaction logs, which record all
changes made to the database during transaction processing. When a failure occurs, the
system uses the log to recover the database to a consistent state, which involves either
undoing the effects of uncommitted transactions or redoing the effects of committed
transactions.

Implementation of Isolation:
Isolation is one of the core ACID properties of a database transaction, ensuring that
the operations of one transaction remain hidden from other transactions until
completion. It means that no two transactions should interfere with each other and
affect the other's intermediate state.
Implementing isolation typically involves concurrency control mechanisms. Here are
common mechanisms used:

85
1. Locking Mechanisms
Locking ensures exclusive access to a data item for a transaction. This means that while one
transaction holds a lock on a data item, no other transaction can access that item.

 Shared Lock (S-lock): Allows a transaction to read an item but not write to it.

 Exclusive Lock (X-lock): Allows a transaction to read and write an item. No other
transaction can read or write until the lock is released.

 Two-phase Locking (2PL): This protocol ensures that a transaction acquires all the locks
before it releases any. This results in a growing phase (acquiring locks and not releasing
any) and a shrinking phase (releasing locks and not acquiring any).

2. Timestamp-based Protocols
 Every transaction is assigned a unique timestamp when it starts. This timestamp
determines the order of transactions. Transactions can only access the database if they
respect the timestamp order, ensuring older transactions get priority.

Testing for Serializability:

We can utilize the Serialization Graph or Precedence Graph to examine a schedule’s
serializability. A schedule’s full transactions are organized into a Directed Graph, what
a serialization graph is.

Precedence Graph

It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and
directed edges E = “E1, E2, E3,…, En”. One of the two operations—READ or WRITE—
performed by a certain transaction is contained in the collection of edges. Where Ti ->
Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.

86
Lock Based Protocols:

In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.

There are four types of lock protocols available:

1. Simplistic lock protocol

It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete or
update on it. It will unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol

o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need
locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on all those
data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.

87
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution phase of the transaction into three
parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.

88
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the
transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after
using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.

Timestamp Based Protocols:

Timestamp-based protocols in dbms are used to order the transaction in ascending order of
their creation time. The creation time is the system time or a logical counter.

The transaction which is created first or you can say older transactions are given high priority
over new transactions.

For example, if there are two transactions T1 and T2. T1 enters the system at 008 and T2 enters
the system at 009 then T1 is given priority over T2.

89
Timestamp-based protocols in dbms order the transaction according to their transaction
timestamps. A schedule that is ordered in the serial order of their transaction timestamp is the
only serializable schedule equivalent to the timestamp-ordered-based transaction schedule.

In order to ensure that the conflicting operation occurring in the schedule does not violate the
timestamp ordering two-time stamp values relating to each database item (X) are used :

 W_TS(X) (write timestamp) is the largest timestamp of any transaction that executed
write(X) successfully.
 R_TS(X) (read timestamp) is the largest timestamp of any transaction that executed
read(X) successfully.

Validation Based Protocols:

Validation Based Protocol is also called Optimistic Concurrency Control Technique. This
protocol is used in DBMS (Database Management System) for avoiding concurrency in
transactions. It is called optimistic because of the assumption it makes, i.e. very less
interference occurs, therefore, there is no need for checking while the transaction is executed.

In this technique, no checking is done while the transaction is been executed. Until the
transaction end is reached updates in the transaction are not applied directly to the database.
All updates are applied to local copies of data items kept for the transaction. At the end of
transaction execution, while execution of the transaction, a validation phase checks whether
any of transaction updates violate serializability. If there is no violation of serializability the
transaction is committed and the database is updated; or else, the transaction is updated and
then restarted.

Optimistic Concurrency Control is a three-phase protocol. The three phases for

validation based protocol:

1. Read Phase:
Values of committed data items from the database can be read by a transaction.
Updates are only applied to local data versions.

2. Validation Phase:
Checking is performed to make sure that there is no violation of serializability
90
when the transaction updates are applied to the database.

3. Write Phase:
On the success of the validation phase, the transaction updates are applied to the
database, otherwise, the updates are discarded and the transaction is slowed
down.

Multiple Granularity:

o It can be defined as hierarchically breaking up the database into blocks which can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.

For example: Consider a tree which has four levels of nodes.

o The first level or higher level shows the entire database.

o The second level represents a node of type area. The higher level database consists of
exactly these areas.
o The area consists of children nodes which are known as files. No file can be present in
more than one area.
o Finally, each file contains child nodes known as records. The file has exactly those records
that are its child nodes. No records represent in more than one file.

Hence, the levels of the tree starting from the top level are as follows:

1. Database
2. Area
3. File
4. Record

91
MODULE-4: UNIT-2
UNIT-2: Syllabus: Recovery System
Failure Classification, Storage Structure, Recovery and Atomicity, Log Based Recovery,
Recovery and Concurrent Transactions, Buffer Management, Failure and Loss of Non-
volatile Storage, Advance Recovery Techniques, Remote Backup Systems.

Failure Classification:

To find that where the problem has occurred, we generalize a failure into the following
categories:

1. Transaction failure
2. System crash
3. Disk failure

92
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called as
transaction failure.

Reasons for a transaction failure could be -

1. Logical errors: If a transaction cannot complete due to some code error or an internal
error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because
the database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.

2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.

Fail-stop assumption: In the system crash, non-volatile storage is assumed not

to be corrupted.

3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk storage.

Storage Structure:
The electromagnetic devices used in database systems for data storage are classified as
follows:

1. Primary Memory
2. Secondary Memory
3. Tertiary Memory

93
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the
central processing unit. The primary memory must, in general, function. ..

 The size of these devices is considerably smaller and they are volatile.
 According to performance and speed, the primary memory devices are the fastest
devices, and this feature is in direct correlation with their capacity.
 These primary memory devices are usually more expensive due to their increased speed
and performance.

2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that
can be accessed for storing data that will be needed at a later point in time for various
purposes or database actions. ,

Example:
 Flash Memory: Flash memory, also known as flash storage, is a type of nonvolatile
memory that erases data in units called blocks and rewrites data at the byte level. Flash
memory is widely used for storage and data transfer in consumer devices, enterprise
systems, and industrial applications.

94
 Magnetic Disk Storage: A Magnetic Disk is a type of secondary memory that is a flat disc
covered with a magnetic coating to hold information. It is used to store various programs
and files.

3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data
without being constantly connected to the server or the peripherals. A device of this type is
connected either to a server or to a device where the database is stored from the outside.

 Due to the fact that tertiary storage provides more space than other types of device
memory but is most slowly performing, the cost of tertiary storage is lower than primary
and secondary. .
 The ability to use secondary devices and to delete the contents of the tertiary devices is
similar.

Example:
Optical Storage: Typically data written on CDs and DVDs are examples of Optical
Storage.

 Tape Storage: Tape Storage is a type of storage data where we use magnetic tape to
store data. It is used to store data for a long time and also helps in the backup of data in
case of data loss.

Recovery and Atomicity:

The recovery procedures in DBMS ensure the database's atomicity and durability. If a system
crashes in the middle of a transaction and all of its data is lost, it is not regarded as durable. If
just a portion of the data is updated during the transaction, it is not considered atomic. Data
recovery procedures in DBMS make sure that the data is always recoverable to protect the
durability property and that its state is retained to protect the atomic property. The procedures
listed below are used to recover data from a DBMS,

o Recovery based on logs.

o Recovery through Deferred Update

o Immediate Recovery via Immediate Update

95
The atomicity attribute of DBMS safeguards the data state. If a data modification is performed,
the operation must be completed entirely, or the data's state must be maintained as if the
manipulation never occurred. This characteristic may be impacted by DBMS failure brought on
by transactions, but DBMS recovery methods will protect it.

Log Based Recovery:

Log is nothing but a file which contains a sequence of records, each log record refers
to a write operation. All the log records are recorded step by step in the log file. We
can say, log files store the history of all updates activities.

Log contains start of transaction, transaction number, record number, old value, new
value, end of transaction etc. For example, mini statements in bank ATMs.

Log based recovery techniques

Log based recovery uses one of the techniques −

Deferred database modification

It modifies the database after completion of transaction. The database modification is

deferred or delayed until the last operation of the transaction is executed. Update log
records maintain the new value of the data item.

Recover system uses one operation which is as follows −

 Redo(Ti) − All data items updated by the transaction Ti are set to a new value.

Immediate database modification

It modifies the database after a write operation, database modification is immediately

done when a transaction performs an update/ write operation. Update log records
maintain both old and new values of data items.

The recovery system uses two operations, which are as follows −

 Undo(Ti) − All data items updated by the transaction Ti, are set to old value.
 Redo(Ti) − All data items updated by the transaction Ti are set to a new value.

96
Recovery and Concurrent Transactions:

Whenever more than one transaction is being executed, then the interleaved of logs occur.
During recovery, it would become difficult for the recovery system to backtrack all logs and then
start recovering.

Recovery with concurrent transactions can be done in the following four ways.
1. Interaction with concurrency control
2. Transaction rollback
3. Checkpoints
4. Restart recovery

Interaction with Concurrency Control:

This pertains to how the recovery system interacts with the concurrency control component of
the DBMS.
Recovery must be aware of concurrency control mechanisms such as locks to ensure that,
during recovery, operations are redone or undone in a manner consistent with the original
schedule of transactions.

Transaction Rollback
Sometimes, instead of recovering the whole system, it's more efficient to just rollback a
particular transaction that has caused inconsistency or when a deadlock occurs.
When errors are detected during transaction execution (like constraint violations) or if a user
issues a rollback command, the system uses the logs to undo the actions of that specific
transaction, ensuring the database remains consistent.

Checkpoints
Checkpointing is a technique where the DBMS periodically takes a snapshot of the current state
of the database and writes all changes to the disk. This reduces the amount of work during
recovery.
During the checkpoint, ongoing transactions might be temporarily suspended, or their logs
might be force-written to the stable storage, depending on the implementation.

97
Restart Recovery
In the case of a system crash, the recovery manager uses the logs to restore the database to
the most recent consistent state. This process is called restart recovery.
The redo phase then ensures that all logged updates of committed transactions are reflected in
the database. Lastly, the undo phase rolls back the transactions that were active during the
crash to ensure atomicity.

Buffer Management:

A buffer is a memory location used by a database management system (DBMS) to

temporarily hold data that has recently been accessed or updated in the database. This
buffer, often referred to as a database buffer, acts as a link between the programs
accessing the data and the physical storage devices.

Purpose of the Database Buffer

A crucial part of a DBMS is the database buffer. It does a variety of things, like as

Reducing Disk I/O Operations

Data is read and written to and from the disc by a DBMS. In general, disc I/O operations
take longer than memory operations. Data that is often accessed is kept in memory by
the database buffer, which improves speed by lowering the need for disc I/O
operations.

Improving Data Access Time

The database buffer reduces the amount of time it takes to retrieve frequently used
data by storing it in memory.

Managing Data Consistency

Data consistency is managed by the database buffer as well. When an application alters
data in the buffer, the buffer labels the modified data as "dirty." When there is enough
room in the buffer or when the data is no longer required, the buffer sends the
modifications back to the disc.

98
Supporting Concurrency Control

Concurrency management is also supported by the database buffer. The buffer makes
sure that the data is consistent and that all apps may access it without interfering with
one another when several applications are using the same data at once.

Failure and Loss of Non-volatile Storage:

The basic scheme is to dump the entire content of the database to stable storage periodically—
say, once per day. For example, we may dump the database to one or more magnetic tapes. If
a failure occurs that results in the loss of physical database blocks, the system uses the most
recent dump in restoring the database to a previous consistent state.

Once this restoration has been accomplished, the system uses the log to bring the database
system to the most recent consistent state.

More precisely, no transaction may be active during the dump procedure, and a procedure
similar to checkpointing must take place:

1. Output all log records currently residing in main memory onto stable storage.

2. Output all buffer blocks onto the disk.

3. Copy the contents of the database to stable storage.

4. Output a log record <dump> onto the stable storage.

To recover from the loss of nonvolatile storage, the system restores the database to disk by
using the most recent dump. Then, it consults the log and redoes all the transactions that
have committed since the most recent dump occurred. Notice that no undo operations need to
be executed.

A dump of the database contents is also referred to as an archival dump, since we can
archive the dumps and use them later to examine old states of the database. Dumps of a
database and checkpointing of buffers are similar. .

Remote Backup Systems:

99
Remote backup systems provide a wide range of availability, allowing the transaction
processing to continue even if the primary site is destroyed by a fire, flood or
earthquake.
Data and log records from a primary site are continuously backed up into a remote
backup site.
One can achieve ‘wide range availability’ of data by performing transaction processing
at one site, called the ‘primary site’, and having a ‘remote backup’ site where all the
data from the primary site are duplicated.
The remote site is also called ‘secondary site’.
The remote site must be synchronized with the primary site, as updates are performed
at the primary.
In designing a remote backup system, the following points are important.
a) Detection of failure: It is important for the remote backup system to detect when the
primary has failed.

b) Transfer of control: When the primary site fails, the backup site takes over the
processing and becomes the new primary site.

c) Time to recover: If the log at the remote backup becomes large, recovery will take a
long time.

d) Time to commit: To ensure that the updates of a committed transaction are

durable, a transaction should not be announced committed until its log records have
reached the backup site.

MODULE-5: UNIT-1
UNIT-1: Syllabus: Storage and Indexing
Data on External Storage, File Organization and indexing, Cluster indexes, Primary and
Secondary indexes, Index Data Structures, Hash Based indexing, Tree Based indexing,
Comparison of File Organizations, Indexes and Performance.

Topics:
Data on External Storage:

100
An external storage device, also referred to as auxiliary storage and secondary storage, is a
device that contains all the addressable data storage that is not inside a computer's main
storage or memory.

An external storage device can be removable or nonremovable, temporary or permanent, and

accessible over a wired or wireless network.

External storage enables users to store data separately from a computer's main or primary
storage and memory at a relatively low cost. It increases storage capacity without having to
open up a system.

External storage often stores information that's accessed less frequently by applications running
on a desktop, laptop, server or mobile device, such as an Android or iOS smartphone or tablet.

For PCs, an external storage device often consists of stationary or portable hard disk drives
(HDDs) or solid-state drives (SSDs), possibly attached through a USB or wirelessly.

For enterprises, an external storage device can serve as primary storage connected to servers
through Ethernet or Fibre Channel switches, or as secondary storage for backup and archiving
purposes. External storage offers HDD, all-flash and hybrid storage arrays for block-based, file-
based or object storage, or a mix of these three protocols known as unified storage. Storage
area networks (SANs) for block-level storage and network-attached storage (NAS) devices for
file-based storage are examples of external storage.

File Organization and indexing:

The following are the different types of file organization:

File organization and indexing are fundamental concepts in database management systems
(DBMS) that deal with efficient storage, retrieval, and management of data.

File Organization
File organization refers to the arrangement of data on storage devices. The method chosen can
have a profound effect on the efficiency of various database operations.

101
Common methods of file organization include:

Sequential (or Serial) File Organization

In sequential file organization, records are stored in sequence, one after the other, based on a
key field. This key field is a unique identifier for records, ensuring that they have some order.
The records are inserted at the end of the file, ensuring the sequence is maintained.

Features of Sequential File Organization

 Ordered Records: Records in a sequential file are stored based on a key field.
 Continuous Memory Allocation: The records are stored in contiguous memory locations.
 No Direct Access: To access a record, you have to traverse from the first record until you find
the desired one.

 Less Overhead: There's no need for complex algorithms or mechanisms to store records.

Direct (or Hashed) File Organization

In hash file organization, a hash function is used to compute the address of a block (or bucket)
where the record is stored. The value returned by the hash function using a record's key value
is its address in the database.

Features of Hash File Organization

 Hash Function: A hash function converts a record's key value into an address.
 Buckets: A bucket typically stores one or more records. A hash function might map multiple
keys to the same bucket.
 No Ordering of Records: Records are not stored in any specific logical order.

Indexed File Organization

Indexed file organization is a method used to store and retrieve data in databases. It is designed
to provide quick random access to records based on key values. In this organization, an index
is created which helps in achieving faster search and access times.

Features Indexed File Organization:

 Primary Data File: The actual database file where records are stored.
 Index: An auxiliary file that contains key values and pointers to the corresponding records in the
data file.

102
 Multi-level Index: Sometimes, if the index becomes large, a secondary (or even tertiary) index
can be created on the primary index to expedite searching further.

Indexed Sequential Access Method (ISAM)

ISAM is a popular method for indexed file organization. In ISAM:

 The primary file is stored in a sequential manner based on a primary key.

 There's a static primary index built on the primary key.
 Overflow areas are designated for insertion of new records, which keeps the main file in
sequence. Periodically, the overflow area can be merged back into the main file.

Indexing
Indexing involves creating an auxiliary structure (an index) to improve data retrieval times. Just
like the index in the back of a book, a database index provides pointers to the locations of
records.

Structure of Index
We can create indices using some columns of the database.

|-------------|----------------|
| Search Key | Data Reference |
|-------------|----------------|

 The search key is the database’s first column, and it contains a duplicate or copy of the table’s
candidate key or primary key. The primary key values are saved in sorted order so that the
related data can be quickly accessible.
 The data reference is the database’s second column. It contains a group of pointers that point to
the disk block where the value of a specific key can be found.

Cluster indexes
Clustering indexing is a database indexing technique that is used to physically arrange
the data in a table based on the values of the clustered index key. This means that the
rows in the table are stored on disk in the same order as the clustered index key. With
a clustered index, the database can more efficiently retrieve data because it doesn’t
have to scan the entire table to find the data it needs. Instead, it can use the clustered
index to quickly locate the data, resulting in faster query execution times and improved
overall performance.

103
Advantages
 Improved Query Performance: Clustering indexing results in faster query
performance, as the data is stored in a way that makes it easier to retrieve the
desired information.
 Reduced Disk Space Usage: Clustering indexing reduces the amount of disk
space required to store the index. This is because the index contains only the
information necessary to retrieve the data, rather than storing a copy of the data
itself.
 Better Handling of Complex Queries: Clustering indexing provides better
performance for complex queries that involve multiple columns.
 Improved Insert Performance: Clustering indexing can result in improved insert
performance, as the database does not have to update the index every time a new
record is inserted.
 Improved Data Retrieval: Clustering indexing can also improve the efficiency of
data retrieval operations. In a clustered index, the data is stored in a logical order,
which makes it easier to locate and retrieve the data.

.
Primary and Secondary indexes
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the
records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.

Secondary indexes

In secondary indexing, to reduce the size of mapping, another level of indexing is

introduced. In this method, the huge range for the columns is selected initially so that
the mapping size of the first level becomes small. Then each range is further divided
into smaller ranges. The mapping of the first level is stored in the primary memory, so
that address fetch is faster. The mapping of the second level and actual data are stored
in the secondary memory (hard disk).

For example:

104
o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at this
level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using
the address 110, it goes to the data block and starts searching each record till it gets 111.

Index Data Structures:

Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers holding
the address of the disk block where the value of the particular key can be found.

Hash Based indexing:

105
Hashing in DBMS is a technique to quickly locate a data record in a database
irrespective of the size of the database. For larger databases containing thousands
and millions of records, the indexing data structure technique becomes very inefficient
because searching a specific record through indexing will consume more time. This
doesn’t align with the goals of DBMS, especially when performance and date retrieval
time are minimized. So, to counter this problem hashing technique is used. In this
article, we will learn about various hashing techniques.
Types of Hashing in DBMS

There are two primary hashing techniques in DBMS.

1. Static Hashing
In static hashing, the hash function always generates the same bucket’s address. For
example, if we have a data record for employee_id = 107, the hash function is mod-5
which is – H(x) % 5, where x = id. Then the operation will take place like this:

2. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that
frequently changes data sets. This method offers us a way to add and remove data
buckets on demand dynamically. This way as the number of data records varies, the
buckets will also grow and shrink in size periodically whenever a change is made.

Tree Based indexing:

Tree-based indexing maintains the order of the search-key values. It is thus applicable
to attributes that are numeric, and hence it can be used to address range queries (and
also equality queries, when the range query interval is reduced to one value)
B-tree indexing is the process of sorting large blocks of data to make searching through
that data faster and easier. A B-tree stores data such that each node contains keys in
ascending order. Each of these keys has two references to another two child nodes.
The left side child node keys are less than the current keys, and the right side child
node keys are more than the current keys. Its search time is O(log(n)).
EXAMPLE:

106
Comparison of File Organizations

Sequential (or Serial) File Organization

Heap File

Heap File Organization works with data blocks. In this method records are inserted at the
end ofthe file, into the data blocks. No Sorting or Ordering is required in this method. If a data
block isfull, the new record is stored in some other block, Here the other data block need not
be the verynext data block, but it can be any block in the memory. It is the responsibility of
DBMS to storeand manage the new records.

Direct (or Hashed) File Organization

107
Indexed File Organization

Indexed Sequential Access Method (ISAM)

ISAM is a popular method for indexed file organization. In ISAM:

 The primary file is stored in a sequential manner based on a primary key.

 There's a static primary index built on the primary key.

 Overflow areas are designated for insertion of new records, which keeps the main file in sequence.
Periodically, the overflow area can be merged back into the main file.

Indexes and Performance:

Indexing is used to optimize the performance of a database by minimizing the number of disk
accesses required when a query is processed. The index is a type of data structure. It is used
to locate and access the data in a database table quickly.
A database index is a special data structure that allows quick access to specific pieces of
information without having to read all data stored in a particular table. They ensure database
performance in transactional environments
Modern databases store terabytes or petabytes of data and are a key resource for
organizational operations and decision-making. A database index is a special data structure
that allows quick access to specific pieces of information without having to read all data stored
in a particular table. They ensure database performance in transactional environments.

.
UNIT-2: Syllabus: Tree Structured Indexing
108
Indexed Sequential Access Method s (ISAM), B+ Trees, A Dynamic Index Structure

Indexed Sequential Access Method (ISAM):

ISAM method is an advanced sequential file organization. In this method, records are stored in
the file using the primary key. An index value is generated for each primary key and mapped
with the record. This index contains the address of the record in the file.

If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

Advantages:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.

Disadvantages:
o This method requires extra space in the disk to store the index value.

109
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.

B+ Trees:
A B+ Tree is a type of self-balancing tree structure commonly used in databases and file
systems to maintain sorted data in a way that allows for efficient insertion, deletion, and search
operations. B+ Trees are an extension of B-Trees but differ mainly in the way they handle leaf
nodes, which contain all the key values and point to the actual records.

B+ Tree of order `n` has the following properties:

1. Every node has a maximum of `n` children.

2. Every node (except the root) has a minimum of `n/2` children.

3. The tree is perfectly balanced, meaning that all leaf nodes are at the same level.

4. All keys are stored in the leaf nodes, and the internal nodes act as 'guides' to locate the leaf nodes
faster.

Operations on B+ Trees:

1. Search: Starts at the root and traverses down the tree, guided by the key values in each node,
until it reaches the appropriate leaf node.

2. Insert: Inserts a new key-value pair and then reorganizes the tree as needed to maintain its
properties.

3. Delete: Removes a key-value pair and then reorganizes the tree, again to maintain its properties.

Dynamic Index Structure:

A dynamic index structure is an index that automatically adjusts its structure as the underlying
data grows, shrinks, or otherwise changes. Unlike static index structures, which are fixed in size
and organization, dynamic index structures reorganize themselves in response to changes in
the data they index.

110
The purpose of indexing is to provide an efficient way to search from a large collection of the
database. In order to generate meaningful retrieval results, recent retrieval systems have
incorporated users' relevance feedback to modify the retrieval process.
This makes them more flexible and scalable, capable of handling varying workloads and data
distributions efficiently.
Dynamic index structures are crucial for databases that are expected to evolve over time,
especially when it comes to insertion, deletion, and updating of records. They maintain balanced
tree structures, self-adjust, and optimize, ensuring that the database operations remain efficient
even as the data grows or changes.

Examples of Dynamic Index Structures

B-Trees and B+-Trees: These are perhaps the most well-known examples of dynamic index
structures. B-Trees and B+-Trees automatically split or merge nodes to maintain a balanced
tree, ensuring that search, insert, and delete operations take logarithmic time. They are widely
used in various types of databases and filesystems.

111

Common questions