0% found this document useful (0 votes)
11 views134 pages

Overview of Database Management Systems

The document provides an overview of Database Management Systems (DBMS), including their historical development, applications across various sectors, and differences from traditional file systems. It discusses key concepts such as data independence, levels of abstraction, and the structure of a DBMS, along with various database models like relational and object-oriented models. Additionally, it outlines the database design process and the use of Entity-Relationship diagrams for conceptual design.

Uploaded by

faziyamila
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views134 pages

Overview of Database Management Systems

The document provides an overview of Database Management Systems (DBMS), including their historical development, applications across various sectors, and differences from traditional file systems. It discusses key concepts such as data independence, levels of abstraction, and the structure of a DBMS, along with various database models like relational and object-oriented models. Additionally, it outlines the database design process and the use of Entity-Relationship diagrams for conceptual design.

Uploaded by

faziyamila
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATABASE MANAGEMENT SYSTEM (23CY502)

UNIT– I
Database System Applications: A Historical Perspective, File Systems versus a DBMS, the
Data Model, Levels of Abstraction in a DBMS, Data Independence, Structure of a DBMS
Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes,
and Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model,
Conceptual Design With the ER Model

1. INTRODUCTION
Data: Data is a piece of information. Data can exist in a variety of forms:
 As numbers or text on pieces of paper
 As bits and bytes stored in computer memory
 As facts stored in a person's mind.

Data is raw information and it does not give correct meaning. The processed data becomes
Information and it gives correct meaning.

Database: Database is a collection of inter-related data which is used to retrieve, insert,


delete and manipulate the data efficiently.

Database Management System (DBMS): The software which is used to


manage database is called Database Management System (DBMS).

Examples of popular DBMS:

 MySql  Microsoft Access


 Oracle  IBMDB2

Department of CSE( Cyber Security), NRCM Page 1


DATABASE MANAGEMENT SYSTEM (23CY502)

2. A HISTORICAL PERSPECTIVE
From the earliest days of computers, storing and manipulating data have been a major application
focus. The first general-purpose DBMS called the Integrated Data Store (IDS) was designed by
Charles Bachman in the early1960s. It formed the basis for the network data model.
In the late 1960s, IBM developed the Information Management System (IMS). This formed
the basis for an alternative data representation framework called the hierarchical data model.

In 1970, Edgar Codd, proposed a new data representation framework called the relational data
model. In a relational data model, the data is stored in the form of table containing rows and
columns. This became very famous database model. The SQL (Structured Query Language) is
the standard query language used to access relational databases.
Several vendors (e.g., IBM's DB2, Oracle 8, etc) developed data warehouses. A Data
warehouse collects data from several databases and this data is used for carrying out specialized
analysis.
In mid 90s, DBMSs have entered the Internet Age. All the database vendors are added
features to their DBMS aimed at making it more suitable for deployment over the Internet.
Database management continues to gain more popularity and more data is brought online to
access through computer networking.
Today the field is being driven by exciting visions such streaming data (youtube, vimeo, etc)
as interactive video (flash, wire wax etc), multimedia databases (facebook, instagram, gaana etc),
digital libraries (DELNET, Shodh ganga, etc). Thus the study of database systems could prove to
be richly rewarding in more.

3. DATA BASE APPLICATIONS


We use Database Management Systems in almost all application sectors. They are:

1. Telecom: A database is required to keep track of the information regarding calls history,
network usage, customer details, generating monthly bills, maintaining balances on
prepaid calling cards etc. Without the database systems it is hard to maintain that huge
amount of data that keeps updating every millisecond. Ex: Airtel, IDEA, Jio, etc
2. Banking System: A database stores bank customer’s information, maintain day to day

Department of CSE( Cyber Security), NRCM Page 2


DATABASE MANAGEMENT SYSTEM (23CY502)

credit and debit transactions, generate bank statements etc. Ex: SBI, HDFC, etc
3. Online shopping: The online shopping websites store the product information, your
addresses and preferences, credit details and provide you the relevant list of products
based on your query. Ex: Amazon, Flipkart etc.
4. Airlines: Passenger details, reservation in formation along with flight schedule is stored
in database. Eg: Air India, Indigo, etc

5. Education sector: Database systems are used in schools, colleges and universities tostore
and retrieve the data regarding student details, staff details, course details, exam details,
attendance details, fees details etc. Ex: JNTUH, IITB, etc
6. Sales: To store customer information, stock details and invoice details a database is
needed. Ex: Reliance Fresh, D-Mart etc.
7. Human resources: For information about employees, salaries, payroll taxes, and benefits
and for generation of paychecks a database is required.
8. Credit card transactions: For purchases on credit cards and generation of monthly
statements.
9. Stock market: For storing information about holdings, sales, and purchases of stocks;
also for storing real-time market data to enable online trading.

4. DIFFERENCE BETWEEN FILE SYSTEM AND DBMS

File System DBMS


A file system is a software that DBMS is a software used to create and
Definition
manages the data files in a computer manage databases.

Operations such as storing, Op


retrieving and searching are done
Operations erations such as storing, retrieving and
manually in a file system. Therefore,
searching data is easier in DBMS
it is difficult to manage data.
because it allows using SQL query
language.

Department of CSE( Cyber Security), NRCM Page 3


DATABASE MANAGEMENT SYSTEM (23CY502)

Data Data Inconsistency is more in file Data Inconsistency is less in DBMS.


Consistency system.

Data Data Redundancy is more in file Data Redundancy is less in a DBMS.


Redundancy system.

Backup and Backup and recovery process is not DBMS has a sophisticated backup and
Recovery efficient in files system. recovery techniques.
Process

Concurrent Concurrent access to the data in the DBMS takes care of Concurrent access
access file system has many problems using some form of locking.
User can locates the physical
Physical In DBMS, user is unaware of physical
address of the files to access data in
address address where data is stored.
File System.

Security File system provides less security to DBMS provides more security to the
the data as compared to DBMS. data.

FAT, NTFS and Ext are some MySQL, MS-Access, Oracle, and DB2
Example examples of file systems. are some examples of DBMS.

5. DBMS DATABASE MODELS


A Database model defines the logical design and structure of a database. It explains how data
will be stored, accessed and updated in a DBMS. The different DBMS data models are:
 Network Model
 Hierarchical Model
 Entity-relationship Model
 Relational Model
 Object oriented data model

Network Data Model


Network model has the entities which are organized in a graphical representation and some
entities in the graph can be accessed through several paths. The data in this model is represented
as collection of records and the relationship among data are represented by links.

Department of CSE( Cyber Security), NRCM Page 4


DATABASE MANAGEMENT SYSTEM (23CY502)

Figure:NetworkModel

Department of CSE( Cyber Security), NRCM Page 5


DATABASE MANAGEMENT SYSTEM (23CY502)

Hierarchical Model
Hierarchical database model organizes data into a tree-like-structure, with a single root, to which
all the other data is linked. In this model, a child node will only have a single parent node.

Figure:HierarchicalData Model

Entity-Relationship Model

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. ER Model is best used for the conceptual design of a database. While formulating
real-world scenario into the database model, it depend on two important things. They are:
 Entity and their attributes
 Relationships among entities

SID SName CID CName

Student Enroll Course

Relational Model

The most popular data model in DBMS is the Relational Model. The relational model contains a
set of tables (relations). Each table has a specified number of columns but can have any number
of rows.
Admission No Name Age Class
1001 Ram 15 9
1002 Ajay 14 9
1003 Jhon 14 9
1004 Akbar 15 10

Department of CSE( Cyber Security), NRCM Page 6


DATABASE MANAGEMENT SYSTEM (23CY502)

Object oriented Data Model


Object oriented data model defines a database as a collection of objects with associated attributes
and methods. This model can incorporates multimedia, such as images, audio, video. The object-
oriented database model is the best known post-relational database model, since it incorporates
tables, but isn’t limited to tables. Such models are also known as hybrid database models.

6. LEVELS OFABSTRACTION IN A DBMS


Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding
irrelevant details from user is called data abstraction. We have three levels of abstraction.

View Level

LogicalLevel

PhysicalLevel

10. Physical level: This is the lowest level of data abstraction. It describes how data is actually

stored in database. It deals with physical memory storage details of records. These details are
often hidden from the programmers.
11. Logical level: This is the middle level of 3-level data abstraction architecture. It describes

what data is stored in database. This level gives details about each attribute data type and size,
the relationship among attributes and defined constraints (Primary key, foreign key etc) on the
table. The programmers generally work at this level because they are aware of such things
about database systems.

Department of CSE( Cyber Security), NRCM Page 7


DATABASE MANAGEMENT SYSTEM (23CY502)

12. View level: This is the highest level of data abstraction. This level describes the user

interaction with database system. At this level, user enters the query to get the answer. Many
users may require different sets of fields from a table. Therefore there exist many view levels.

7. DATA INDEPENDENCE
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level without requiring changing the schema at the next higher level. Data
independence helps you to keep data separated from all programs that make use of it.

In DBMS there are two types of data independence

1. Physical data independence


2. Logical data independence.

Physical Data Independence: Physical data independence is the ability to change the
internal schema without having to change the conceptual schema. That is, if we do any changes
in the storage side of the database system server, then the Conceptual structure of the database
will not be affected. For example, in case we want to change or upgrade the storage system itself
−suppose we want to replace hard-disks with SSD−it should not have any impact on the logical
data or schemas.
Logical Data Independence: Logical data independence is the ability to change the
conceptual schema without having to change the external schema. That is, if we do any changes
in the logical view of the data, then the user view/ external view of the data should not be
affected.

8. STRUCTUREOFADBMS

The DBMS accepts SQL commands generated from a variety of user interfaces such as web
forms, applications, SQL interface and etc. When a user issues a query, the parsed query is
presented to a query optimizer, which uses information about how the data is stored to produce
an efficient execution plan for evaluating the query. An execution plan is a blueprint for
evaluating a query. It executes these plans against the database, and returns the answers to the
user.

Department of CSE( Cyber Security), NRCM Page 8


DATABASE MANAGEMENT SYSTEM (23CY502)

QueryEval
uationEngi
ne

Figure:DBMSStructure
 DBMS consists of a Query Evaluation engine which accepts commands from the front end
applications like web forms, SQL interfaces and evaluates the query to retrieve the requested
data.
 Query Evaluation engine consists of the following components
o Parser: It parses the received SQL commands.
o Operator evaluator: It evaluates the operators used in the query.
o Plan executor: It designs a plan to obtain the result.
o Optimizer: It optimizes the query to improve the process of retrieving the result ant
data.
 File and access methods: It is responsible for the abstraction of file structures stored and for
creating indexes on the files for faster access.
 Buffer Manager: The purpose of buffer manager is to move pages in and out from a disk to
main memory.
 Disk-Space Manager: It manages space on the disk by providing empty space for new
requests, deleting space allocated for existing files which are deleted by the user.
 Transaction Manager and lock manager: It is responsible for maintaining concurrency of
the data, when accessed by multiple users.

Department of CSE( Cyber Security), NRCM Page 9


DATABASE MANAGEMENT SYSTEM (23CY502)

 Recovery manager: It is responsible for maintaining log files and supports crash recovery.
When a system crashes recovery manager is responsible for bringing the system to a safe
state.

9. DATABASE DESIGN
The database design process can be divided into six steps.
i. Requirements Analysis: The very first step in designing a database application is to gather
information from different stake holders such as management, employees and end users. The
development team conducts discussions with different user groups, study the current
operating environment, analyze any available documentation on existing applications and
gather all of the types of information that to be recorded in the database. The gathered
information is documented properly.
ii. Conceptual Database Design: The information gathered in the requirements analysis step is
used to develop Entity Relationship (ER) model. The ER model facilitates discussion among
all the people involved in the design process, even those who have no technical background.
iii. Logical Database Design: The task in this stage is to convert the ER model into relational
schemas. Each entity and each relationship is converted into a relation or a table.
iv. Schema Refinement: The fourth step in database design is to analyze the collection of
relations in our relational database schema to identify potential problems, and to refine it.
This process is called normalization.
v. Physical Database Design: In this step, the database design is refined to ensure that it meets
desired performance criteria and satisfies the expected workload. This step may simply
involve building indexes on some tables and clustering some tables, or it may involve a
substantial redesign of parts of the database schema obtained from the earlier design steps.
vi. Application and Security Design: Foreachrole(manager/accountant/clerk),some part of the
database is accessible and other part of the database is not accessible. The software developer
should enforce these accessing rules while developing the applications (using application
languages like java) to access data using DBMS.
Realistically, all above six design steps are repeated until the design is satisfactory to complete
database design.

Department of CSE( Cyber Security), NRCM Page 10


DATABASE MANAGEMENT SYSTEM (23CY502)

10. ERDIAGRAMS
An entity-relationship (ER) diagram is a graphical representation of entities and their
relationships to each other, typically used to the organization of data within databases. An entity-
relationship (ER) diagram is also called as an entity relationship model.

Component of ER Diagram

Symbol Name Description

Entity / An entity may be any object, class, person


Strong entity or place.

Weak entities depend on some other entity


type. They don't have primary keys, and
Weak entity
have no meaning in the diagram without
their parent entity.

Relationships are associations between or


Relationship
among entities.

Department of CSE( Cyber Security), NRCM Page 11


DATABASE MANAGEMENT SYSTEM (23CY502)

Symbol Name Description

Weak Weak Relationships are connections


relationship between a weak entity and its owner.

Attributes are characteristics of an entity.


Attribute The attribute is used to describe the
property of an entity.

A key attribute is the unique characteristic


Key Attribute
of the entity. It represents a primary key.

Multi Multi valued attributes are those that are


valued can take on more than one value.
attribute
Derived attributes are attributes whose
Derived
value can be calculated from other
attribute
attribute values.

An attribute that composed of many other


Composite
attributes is known as a composite
attribute
attribute.

Types of relationship areas follows:

The cardinality of a relationship is the number of instances of entity B that can be associated
with entity A. Based on the cardinality; the relationships are classified into four types. They are:
a. One-to-One Relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship.
Example: A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship: When only one instance of the entity on the left, and more than
one instance of an entity on the right associates with the relationship then this is known as a one-
to-many relationship.

Department of CSE( Cyber Security), NRCM Page 12


DATABASE MANAGEMENT SYSTEM (23CY502)

Example: Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship: When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the relationship then it is known as a many-
to-one relationship.
Example: Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship: When more than one instance of the entity on the left, and more
than one instance of an entity on the right associates with the relationship then it is known as a
many-to-many relationship.
Example: Employee can assign by many projects and project can have many employees.

Entity Set and Relationship Set

IID IName SID SName

Instructor Advice Student

Entity Set: An Entity set is a set of entities of the same type that share the same properties.
The above diagram contains two entities; Instructor and Student. In the below figure, Instructor
entity contains six different instructor values (rows) called as Instructor entity set and Student
entity contain seven different student values (rows) called as Student entity set.

Department of CSE( Cyber Security), NRCM Page 13


DATABASE MANAGEMENT SYSTEM (23CY502)

Figure: Entity set Instructor and Student

Relationship Set: A relationship set is a set of relationships of the same type. In the below
figure one instructor can advice many students but every student is advised by only one
instructor. This relationship is called as many-to-one relationship.

Figure: Relationship set advisor

11. ADDITIONAL FEATURES OF THE ER MODEL


N-array relationship
In an n-ary relationship, then shows the number of entities in the relationship. It can be
anything but the most popular relationships are unary, binary and ternary relationship.

Department of CSE( Cyber Security), NRCM Page 14


Unary Relationship: When there is a relationship between two entities of the same type, it is
known as a unary or recursive relationship. This means that the relationship is between different
instances of the same entity type.

For example, an employee can supervise multiple employees. The role of one employee is HOD
and the role and other employees is faculty. That is, one HOD supervises many faculties.

Binary Relationship: When there is a relationship between two different entities, it is known as
a binary relationship.

Each employee only has a single ID card. Hence this is a one to one binary relationship where 1
employee has 1 ID card.

Ternary Relationship: When there is a relationship between three different entities, it is known
as a ternary relationship. An example of a ternary relationship can be shown as follows:

In this example, there is a ternary relationship between Doctor, Patient and Medicine.

Department of CSE( Cyber Security), NRCM Page 15


Weak Entity
A Weak entity is the one that depends on its owner entity for its existence. A weak entity is
denoted by the double rectangle. Weak entity does not have the primary key. The primary
key of a weak entity is a composite key formed from the primary key of the strong entity and
partial key of the weak entity.

EID EName DName Address

Employee Depends Dependent

There can be an employee without a dependent in the Company but there will be no record of the
Dependent in the company systems without any association with an Employee .

Generalization
Generalization is a bottom-up approach in which two or more lower-level entities combines to
form a new higher-level entity. In generalization, the generalized entity of higher level can also
combinewithentitiesofthelower-leveltomakefurtherhigher-levelentity. It is like a super class and
subclass system, but the only difference is that it uses the bottom-up approach. In this process,
the common attributes of two or more lower level entities are given to higher level entity

Department of CSE( Cyber Security), NRCM Page 16


Specialization
Specialization is opposite to Generalization. It is a top-down approach in whichone higher level
entity can be broken down into two or more lower level entity.

Aggregation
Aggregation is a process when relation between two entities is treated as a single entity.

In the diagram above, the relationship between Center and Course together, is acting as an
Entity, which is in relationship with another entity Visitor. Now in real world, if a Visitor or a
Student visits a Coaching Center, he/she will never enquire about the center only or just aboutthe
course, rather he/she will ask enquire about both.

12. CONCEPTUAL DESIGN WITH THE ER MODEL


The document prepared in the requirement analysis phase is used to generate ER Model by
following below six steps:

Department of CSE( Cyber Security), NRCM Page 17


 Find the entities: Look for general nouns in requirement specification document which are
of business interest to business users.
 Identify relevant attributes: Identify all attributes related to eachentity.
 Find the key attributes for every entity: Identifythe attribute or set of attributes which can
identify each entity instance uniquely.
 Find the relationships: Identify the natural relationship and their cardinalities between all
possible combinations of the entities.
 CompleteE-Rdiagram: Draw E-Rdiagramalongwithallattributesandentities.
 Reviewyourresultswithyourbusinessusers: Show the completed ER diagram to your
business user and make necessary changes.

PROBLEM: UNIVERSITY CASE STUDY


A University has many departments. Each department has a name and location. Each
department has multiple instructors; one among them is the head of the department. Every
instructor has a name, mobile number and room number. An instructor belongs to only one
department. Each department offers multiple courses, each of which is taught by a single
[Link] course has unique course number, name, duration and pre-requisite course. A
student may enroll for many courses offered by different departments. Every student has a ID,
name and date of birth.

SOLUTION
Step1: Identify theEntities
1. DEPARTMENT
2. COURSE
3. INSTRUCTOR
4. STUDENT

Step 2:Identify all relevant attributes


1. Forthe "Department"entity, the relevant attribute are"Department Name"is"Location".
2. For the "Course" entity, the relevant attributes are "Course Number" are "Course Name",
"Duration" and "Pre Requisite".
3. Forthe"Instructor"entity, the relevant attribute sare"InstructorName"are"RoomNumber"
and "Telephone Number".
4. Forthe"Student"entity, the relevant attributes are"StudentNumber"are"StudentName" and
"Date of Birth".

Department of CSE( Cyber Security), NRCM Page 18


Step3: Identifythekey attributes
1. DName (DepartmentName) which identifies the department uniquely will be the key
attribute for "DEPARTMENT" entity.
2. STUDENT# (Student Number) which identifies the student entity uniquely will be
thekey attribute for "STUDENT" entity.
3. IName(InstructorName)isthekeyattributefor "INSTRUCTOR" entity.
4. COURSE#(CourseNumber)is thekeyattributeforCOURSEentity.

STEP4: Find relationships.


Wecan derivethefollowing relationships:
1. The department offers multiple courses and each course belongs to only one department.
So the cardinality between department and course is one to many.

1 N
Department Offers Course

2. Onecourseisenrolledbymultiplestudentsandalsoonestudentenrollsformultiple courses. So
the relationship is many to many.

M Enrolled N
Course Student
by

3. One department has multiple instructors and also one instructor belongs to one and only
one department. So the relationship is one to many.

1 Has N
Department Instructor

4. Each department has one "Head of Department" and one Instructor is Department" for
only one department, hence the relationship is one to one.

1 Headed N
Department Instructor
by

5. One course is taught by only one instructor but one instructor teaches many courses,
hence the relationship between course and instructor is many to one.

N Taught 1
Course Instructor
by

The relationship between instructor and student need NOT be defined in the [Link]
reasons are as follows:
1. There is no business significance of this relationship.
2. We can always derive this relationship indirectly through course and instructor, and
course and students.

Department of CSE( Cyber Security), NRCM Page 19


Step5: Complete E-R diagram
After considering all the above mentioned guidelines one can generate the E-R Model for the
university database as shown in Figure.
DeptName Location

Instructor

1 1 1

Offers Headed Has


by
PreRequisite

N 1
N
Course#
Instructor N Taught 1 Student
by
Duration
N
Course Name InstructorName Room#

Enrolled Telephone#
by

Student

Student# Student Name Dateof Birth

13. DESIGN CHOICES IN CONCEPTUAL DESIGN


a. Shoulda concept be modeled as an entity or an attribute?
b. Shoulda concept be modeled as an entity or a relationship?
c. Identifying relationships: Binary or ternary? Aggregation?

Entity vs. Attribute


• Should address be an attribute of Employees or an entity (related to Employees)?
• Depends up on how we want to use address information, and the semantics of the data:

Department of CSE( Cyber Security), NRCM Page 20


• If we have several addresses per employee, address must be an entity
• If the structure of address is important (plotNo, street, city, state, country and
pinCode values are compulsory for each address) then, address must be modeled
as an entity.
• Otherwise address can be modeled as an attribute.

Binary [Link] Relationship


• A relationship can also have attributes.
• If an employee work in a department for a period time, then it can be modeled as
givenbelow.
• This is a binary relationship diagram

• If an employee work in a department for two or more periods , then it should be


remodeled as given below.
• Then it becomes as a ternary relationship diagram

When to use aggregation?


When an entity maintains a common relationship with two or more entities, not
individually then aggregation need to be used.

Department of CSE( Cyber Security), NRCM Page 21


MULTIPLE CHOICE QUESTIONS

1. An entity set that does not have sufficient attributes to form a primary key is a
a) Strong entity set b)Variant setc)Weak entity set d)Variable set
2. In the relational model,cardinality is termed as:
a) [Link] tuples b)no. of attributes c)[Link] tables d) [Link] constraints
3. In a relational model,relations are termed as
a) Tuples. b) Attributes c)Tables. d) Rows.
4. In an E-R diagram attributes are represented by
a) rectangle b) square c)ellipse d) triangle.
5. An abstraction concept for building composite object from their component to bject is
called:
a).Specializationb).Normalization c).Generalization d).Aggregation

Department of CSE, NRCM Page 22


UNIT– II
Introduction to the Relational Model: Integrity constraint over relations, enforcing integrity
constraints, querying relational data, logical data base design, introduction to views,
destroying/altering tables and views. Relational Algebra, Tuple relational Calculus, Domain
relational calculus.

1. RELATIONAL MODEL
Relational data model is the most popular data model used widely around the world for data
storage. In this model data is stored in the form of tables.

Relational Model Concepts

Table is also called Relation. Let the below table name be SUDENT_DATA

Attribute / Column / Field

Degree=No ofcolumns=4

htno Name age city


501 Amar 19 Hyderabad
502 Akbar 18 Warngal Tuple / Row / Record

503 Antony 19 Karimnagar Cardinality=Noofrows=3

Table: In relational model the data is saved in the form of tables. A table has two properties
rows and columns. Rows represent records and columns represent attributes.

Attribute: Each column in a Table is an attribute. Attributes are the properties that define a
relation. e.g., HTNO, NAME, AGE, CITY in the above relation.

Tuple: Every single row of a table is called record or tuple.

Relation Schema: It represents the name of the relation (Table) with its attributes.

Eg., STUDENT_DATA( htno, name, age, city)


Department of CSE, NRCM Page 23
Degree: The total number of attributes in the relation is called the degree of the relation.

Cardinality: Total number of rows present intheTable.


2. INTEGRITY CONSTRAINT
 Integrity constraints are a set of rules that the database should not violate.
 Integrity constraints ensure that authorized changes (update deletion, insertion) made to
the database should not affect data consistency.
 Integrity constraints may apply to attribute or to relationships between tables.

TYPES OF INTEGRITY CONSTRAINTS


The integrity constraints supported by DBMS are:
1. Domain Integrity Constraint
2. Entity Integrity Constraint
3. Referential Integrity Constraint
4. Key Constraints
Integrityconstraint

Domain EntityIntegrity Referential Key


constraint constraint Integrityconstraint constraint

 Domain Constraint: These are attribute level constraints. An attribute can only take
values which lie inside the domain [Link]: If a constrain AGE > 0 is applied on
STUDENT relation, inserting negative value of AGE will result in failure. If the domain of
AGE is defined as integer, inserting an alphabet in age column is not accepted.
Example:
ID NAME SEMESTER AGE
1001 TOM I 18
1002 JHONSON IV 20
1003 KATE VI 21
1004 JHON II 19
1005 MORGAN II A
Not allowed. Because AGE is an integer attribute

 Entity integrity constraints: The entity integrity constraint states that primary key
value can't be null. This is because the primary key value is used to identify individual rows
in relation. A table can contain a null value other than the primary key field.
Department of CSE, NRCM Page 24
Example: Let ID be the primary key in the below table.

ID NAME SEMESTER AGE


1001 TOM I 18
1002 JHONSON IV 20
KATE VI 21

Not allowed. Because primary key can’t be NULL value.


 Referential Integrity Constraints: It is also called as foreign key constraint. A
referential integrity constraint is specified between two tables. In this type of constraints,
if a foreign key in Table 2 refers to the Primary Key of Table 1, then every value of the
Foreign Key in Table 2 must be null or be available in Table 1.
Example: Department Table(Table1)
PrimaryKey
Dept_No Dept_Name
05 CSE
02 EEE
04 ECE
PrimaryKey
Relationship
Employee Table (Table 2)
EID NAME AGE Dept_No Foreign Key
1001 TOM 45 04
Not allowed as Dept_No 01. Because 01
1002 JHONSON 38 01
value is not present as a primary key in
1003 KATE 54 05 Table1.Dept_Noisaforeignkeydefined
1004 MORGAN 29 02 inTable2.

 Key Constraints: A Key Constraintis a statement that a certain minimal subset of the
fields of a relation is a unique identifier for a [Link] are 4 types ofkey constraints.
Theyare

i. Candidate key: The candidate keys in a table are defined as the set of keys that is
minimal and can uniquely identify any data row in the table.
ii. Primary key: It can uniquely identify any data row of the table. The primary key is
one of the selected candidate key.
iii. Super key: Super Keyis the superset of primary key. The super key contains a set of
attributes, including the primarykey, which can uniquelyidentify any data row in the
table.

Department of CSE, NRCM Page 25


iv. Foreign key: It is a key used to link two tables together. A FOREIGN KEY is a field
(or collection of fields) in one table that refers to the PRIMARY KEY in another
table.

CompositeKey: If any single attribute of a table is not capable of being the key i.e it
cannot identify a row uniquely, then we combine two or more attributes to form akey.
This is known as a composite key.
Secondary Key: Only one of the candidate keys is selected as the primary key. The
rest of them are known as secondary keys.

3. ENFORCING INTEGRITY CONSTRAINTS


Database Constraints are declarative integrity rules of defining table structures. They include the
following 7 constraint types:

1. Data type constraint: This defines the type of data, data length, and a few other
attributes which are specifically associated with the type of data in a column.
2. Default constraint: This defines what value the column should use when no value has
been supplied explicitly when inserting a record in the table.
3. Nullability constraint: This defines that if a column is NOT NULL or allow NULL
values to be stored in it.
4. Primary key constraint: This is the unique identifier of the table. Each row must have a
distinct value. The primary key can be either a sequentially incremented integer number
or a natural selection of data that represents what is happening in the real world (e.g.
Social Security Number). NULL values are not allowed in primary key values.
5. Unique constraint: This defines that the values in a column must be unique and no
duplicates should be stored. Sometimes the data in a column must be unique even though
the column does not act as Primary Key of the table. Only one of the values can be
NULL.
6. Foreign key constraint: This defines how referential integrity is enforced between two
tables.
7. Check constraint: This defines a validation rule for the data values in a column so it is a
user-defined data integrity constraint. This rule is defined bythe user when designing the
column in a table.

Department of CSE, NRCM Page 26


4. LOGICAL DATABASE DESIGN
1. Each entity in the ER model will become a table and all attributes of that entity will become
columns of the table. Key attribute of the entity will become primary key in the table.
Example:
EName salary since DeptID DName

EmpID Employee Works_In Department Location

Employee Department
EmpID EName salary DeptID DName Location

CREATETABLEEmployee ( CREATETABLEDepartment (
EmpID NUMBER(3), DeptID NUMBER(3),
ENameVARCHAR(20), DNameVARCHAR(15),
Salary NUMBER(5), LocationVARCHAR(15),
PRIMARYKEY(EmpID) PRIMARYKEY(DeptID)
); );

2. Each relationship in the ER model will become a table. Key attributes of participating entities
in the relationship will become columns of the table. If the relationship has any attributes,
then they also will become columns of the table.
Example: From the above ER diagram, theWorks_In relationship converted as
Employee Department
EmpID EName Salary DeptID DName Location

Works_In
EmpID DeptID since

CREATETABLEWorks_In (
EmpIDNUMBER(3),
DeptIDNUMBER(3),
Since DATE,
PRIMARYKEY(EmpID, DeptID),
FOREIGNKEY(EmpID)REFERENCESEmployee(EmpID),
FOREIGNKEY(DeptID)REFERENCESDepartment(DeptID),
);

Department of CSE, NRCM Page 27


3. Any multi-valued attribute is converted into new table. The primary key of the entity will be
added as column in the new table.

name age HouseNo street

city
address
HTNO Student state

Student Address
HTNO name age HTNO houseNo street city state

CREATETABLEStudent ( CREATETABLEAddress
HTNOCHAR(10),name (
VARCHAR(20), HTNOCHAR(10),
ageNUMBER(2), houseNo NUMBER(3),
PRIMARYKEY(HTNO) street VARCHAR(20),
); cityVARCHAR(15), state
VARCHAR(15),
PRIMARYKEY(HTNO),
FOREIGNKEY(HTNO)REFERENCESStudent(HTNO),
);
4. Each weak entity is converted into a table with all its attributes as columns and primary key
of the strong entity acts as a foreign key in this table.

CName NOofCredits SectionNo location

CourseID Course has


Section

Course Section
CourseID CName NOofCredits CourseID SectionNo location

CREATETABLECourse ( CREATETABLESection (
CourseIDNUMBER(2), SectionNoCHAR(2),
CName VARCHAR(20), CourseIDNUMBER(2),
NOofCreditsNUMBER(2), location VARCHAR(15),
PRIMARYKEY(CourseID) PRIMARYKEY(CourseID,SectionNo),
); FOREIGNKEY(CourseID)REFERENCES Course(CourseId)
);

Department of CSE, NRCM Page 28


5. INTRODUCTION TO VIEWS
A view is virtual tables whose rows are not explicitly stored in the database but are
computed as needed from a view definition. They are used to restrict access to the database or to
hide data complexity. A view contains rows and columns, just like a real table. Creating a view
does not take any storage space as only the view query is stored in the data dictionary and the
actual data is not [Link] tables referred in the views areknown as Base tables. Views do not
contain data of their own. They take data from the base tables.

Thereasonsforusingviewsare

 Security is increased-sensitive information can be excluded from a view.


 Views can represent a subset of the data contained in a table.
 Views can join and simplify multiple tables into a single virtual table.
 Views take very little space to store; the database contains only the definition of a
view, not a copy of all the data it presents.
 Different views can be created on the same basetable for different categories of users.

Creating Views syntax: CREATE VIEW view_name AS


SELECT column_list
FROM table_name[WHERE condition];
Examples: Consider the below given employees table. employees(eid,name,salary,experience)
employees
eid ename salary Experience
101 Jhon 20000 2
105 Sam 18000 2
108 Ram 30000 4

If we want to hide the salary column from accessing a group of users, then we can create view on
employees table as follows.
CREATE VIEWemp AS
SELECTeid,ename,experience FROM employees;
emp
eid Name Experience
101 Jhon 2
105 Sam 2
108 Ram 4

Department of CSE, NRCM Page 29


The view emp is a virtual table. The data in the emp table is not saved in the database but
collectedfrom employees [Link]
operations (INSERT, DELETE, UPDATE) on a view just like on a table but under some
restrictions.

When can insertion, delete or update performed on view?

 The view is defined from one and only one table.


 The view must include the PRIMARY KEY of the base table.
 The base table columns which are not part of view should not have NOTNULL constraint.
 The view should not have any field made out of aggregate functions.
 The view must not have any DISTINCT clause in its definition.
 The view must not have any GROUPBY or HAVING clause in its definition.
 The view must not have any SUBQUERIES in its definitions.
i. Inserting Rows into a View: A new row can be inserted into a view in a similar way as you
insert them in a table. When an insert operation performed on view, first a new row is
inserted into the base table and the same is reflected in the view.
ii. Deleting Rows into a View: A row(s) can be deleted from a view in a similar way as you
delete them from a table. When an delete operation performed on view, first row(s) is/are
deleted from the base table and the same is reflected in the view.
iii. Updating Rows into a View: A row(s) can be updated in a view in a similar way as you
update them in a table. When an update operation performed on view, first data is updated in
the base table and the same is reflected in the view.
iv. Dropping/Destroying View: Whenever you do not need the view anymore, we can destroy
the view by using DROP command. The syntax is very simple and is given below −

DROPVIEW view_name;
Example:DROPVIEWemp;

6. RELATIONAL ALGEBRA
Relational Algebra is procedural query language, which takes Relation as input and
generates relation as output. Relational algebra mainly provides theoretical foundation for
relational databases and SQL.
Department of CSE, NRCM Page 30
OperatorSymbol OperatorName Explanation
Projection Selectcolumnnames
Selection Selectrow values

Renaming Renameatablenameorexpression results


Union Performunion operation
Intersection Performintersectionoperation
- Setdeference Performsetdifference operation
Cartesianproduct Everyrowoffirsttableisjoinedwithevery row
of second table
Join Jointwo tables based onsome condition

i. Select Operation(σ):Itselectstuplesthatsatisfythegivenpredicatefromarelation.
Notation:σp(r)

where σ stands for selecting tuples (rows) and r stands for relation (table) name. p is
prepositional logic formula which may use connectors like and, or, and not. These termsmay
use relational operators like = ,≠ ,≥ ,<,> ,≤ .
Example1:σsubject="database"(Books)
Output:Selects rowswhosesubjectis'database'frombooks table.
Example2:σsubject="database"andprice="450"(Books)
Output:Selects rowsfrombookswheresubjectis'database'and'price'is450.
Example3:σsubject="database"andprice="450"oryear>"2010"(Books)
Output:Selectsrowsfrombookswheresubjectis'database'and'price'is450orthose books
published after 2010.

ii. Project Operation(∏):Itprojectscolumn(s)thatsatisfyagiven predicate.


Notation:∏A1,A2,…An(r)
whereA1,A2,Anarecolumn(attribute)[Link] automatically
eliminated in the output.
Example:∏subject,author (Books)
Displayvalues fromcolumnssubjectand authorfrom therelation Books.

iii. Union Operation(∪):[Link]


rows from two given relations.
Notation:rUs

Department of CSE, NRCM Page 31


Where r and s are either database relations or relation result set (temporary relation). rU s
returns a relation instance containing all tuples that occur in either relation instance r or
relation instance s(or both). For a union operation to be valid, the following conditions must
hold:
 Rands must have the same number of attributes.
 Attribute domains must be compatible in rands.

Example:∏author(Books)∪∏author(Articles)
Output: Projects the names of the authors who have either written a book or an article or
both.

iv. Intersection Operation(∩):It performs intersection operation between two given


relations . It collect only rows which are common in the two given relations.
Notation:R∩S

R ∩ S returns a relation instance containing all tuples that occur in both R and S. The
relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
∏author(Books)∩∏author(Articles)
Output: Projects the names of the authors who have written bothbook and an article.

v. Set Difference (−): It finds tuples which are present in one relation but not in the second
relation.
Notation: r−s
Finds all the tuples that are present in r but not ins.
Example: ∏author(Books)−∏author(Articles)
Output− Provides the name of authors who have written books but not articles.

vi. Cartesian Product (Χ): It returns a relation instance whose schema contains all the
fields of table-1 (in the same order as they appear in table-1) followed by all the fields of
table-2. It combines every row in first table with every row in the second table.

Notation: r Χ s
Where rand sare relations and their output will be defined as :r Χs={ qt|q∈r and t ∈s}
vii. Natural join( ):The most general version of the join operation accepts a join condition
C and a pair of relation instances as arguments and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:

Department of CSE, NRCM Page 32


R cS =σc(RXS)
Thus is defined to be a cross-product followed by a [Link] that thec ondition c can
refer to attributes of both R and S.
Note: If thecondition c in R cS contain equal operator, then itis called equi-join

viii. Natural Join( ): In this case, we can simply omit the join condition; the default is that
the join condition is a collection of equalities on all common fields. We call this special case
as natural join, and it has the nice propertythat the result is guaranteed not to have two fields
with the same name.

ix. Rename Operation (ρ): The results of relational algebra are also relations but without
any name. The rename operation allows us to rename the output relation. 'rename' operation
is denoted with small Greek letter rho ρ.
Notation:ρ(temp,E)

Where the result of expression E is saved with name of temp.

x. Division (/): Consider two relation instances A and B in which A has (exactly) two fields x
and y and B has just one field y, with the same domain as in A. We define the division
operation A / B as the set of all x values (in the form of unary tuples) such that for every y
value in (a tuple of) B, there is a tuple (x,y) in A.
Example:

SNO PNO SNO


PNO A/B1 S1
S1 P1 B1
P2 S2
S1 P2
S1 P3 PNO S3
A B2 P2 S4
S1 P4
S2 P1 P4
P1

S2 P2 SNO
PNO A/B2
S3 P2 S1
B3 P2 S4
S4 P2
P4
S4 P4 SNO
A/B3
S1

SampleQueries: We present a number of sample queries using the following schema:

Sailors (sid:integer,sname:string,rating:integer,age: real)


Boats (bid: integer, bname: string, color: string) Reserves
(sid: integer, bid: integer, day: date)

Department of CSE, NRCM Page 33


The key fields are underlined, and the domain of each field is listed after the field
name. Thus sid is the key for Sailors, bid is the key for Boats, and all three fields together
form thekeyforReserves. Fields in an instanceofoneoftheserelations will bereferred to by
name, or positionally, using the order in which they are listed above.

(Q1)Find the names of sailors who have reserved boat 103.


This query can be written as follows:
πsname((σbid=103Reserves) Sailors)
We first compute the set of tuples in Reserves with bid = 103 and then take the natural
join of this set with Sailors. This expression can be evaluated on instances of Reservesand
Sailors. Evaluated on the instances R2 and S3, it yields a relation

(Q2)Find the names of sailors who have reserved are d boat.


πsname((σcolor=′red′Boats) Reserves Sailors
This query involves a series of two joins. First we choose (tuples describing) red boats. Then,
we join this set with Reserves (natural join, with equality specified on the bid column) to
identify reservations of red boats. Next, we join the resulting intermediate relation with Sailors
(natural join, with equality specified on the sid column) to retrieve the names of sailors who
have rnade reservations for red boats. Finally, we project the sailors' names.

(Q3)Find the colors of boats reserved by Lubber.


πcolor((σsname=‘Lubber’Sailors) Reserves Boats)

(Q4)Find thenames ofsailors whohavereserved atleast oneboat.


πsname(Sailors Reserves)
The join of Sailors and Reserves creates an intermediate relation in which tuples consist of a
Sailors tuple 'attached to' a Reserves tuple. A Sailors tuple appears in (some tuple of) this
intermediate relation only if at least one Reserves tuple has the same sid value, that is, the sailor
has made some reservation.

(Q5)Find the names of sailors who have reserved a red or a green boat.
ρ(Tempboats,(σcolor=′red′Boats)U(σcolor=′green′Boats))
πsname(Tempboats Reserves Sailors)

Department of CSE, NRCM Page 34


We identify the set of all the rows that are either red or green from boats table. We rename this
[Link]’[Link], we join
with Sailors to find the names of Sailors with those sids.

(Q6)Find the names of sailors who have reserved a red and a green boat
ρ(Tempboats2,(σcolor=′red′Boats)∩(σcolor=′green′Boats))
πsname(Tempboats2 Reserves Sailors)

However, this solution is incorrect-it instead tries to compute sailors who have reserved a boat
[Link];thisquerywillalwaysreturnanempty answer set.
The right answer is

ρ(T empred, πsid((σcolor=′red′Boats) Reserves))


ρ(Tempgreen,πsid((σcolor=′green′Boats) Reserves))
πsname((Tempred ∩ Tempgreen) Sailors)
Thetwotemporaryrelationscomputethesidsofsailors,andtheirintersection identifiessailors who
have reserved both red and green boats.
(Q7)Find the names of sailors who have reserved at least two boats.

ρ(Reservations, πsid,sname,bid(Sailors Reserves))

ρ(Reservationpairs(1→sid1,2→sname1,3→bid1,4→

sid2,5→sname2,6→bid2),Reservations×Reservations)

πsname1σ(sid1=sid2)∩ (bid1=bid2)Reservationpairs

First, we compute tuples of the form (sid, sname, bid), where sailor sid has made a reservationfor
boat bid; this set of tuples is the temporary relation Reservations. Next we find all pairs of
Reservations tuples where the same sailor has made both reservations and the boats involved are
distinct. Here is the central idea: To show that a sailor has reserved two boats, we must find two
Reservations tuples involving the same sailor but distinct boats. Finally, we project the names of
such sailors.

(Q8)Find the sids of sailors with age over 20 who have not reserved ared boat.
πsid(σage>20Sailors)−πsid((σcolor=′red′Boats) Reserves Sailors)
This query illustrates the use of the set-difference operator. Again, we use the fact that sid is

Department of CSE, NRCM Page 35


The key for [Link] first identify sailors aged over 20 instances and then discard those who
have reserved a red boat to obtain the answer.

(Q9)Find the names of sailors who have reserved all boats.

The use of the word all(orevery)is a good indication that the division operation might be
applicable:
ρ(Tempsids,(πsid,bidReserves)/(πbidBoats))
πsname(Tempsids Sailors)
(Q10)Find the names of sailors who have reserved all boats called Interlake.
ρ(Tempsids, (πsid,bid Reserves)/(πbid(σbname=′Interlake′ Boats)))
πsname(Tempsids Sailors)

7. RELATIONAL CALCULUS
Relational calculus is an alternative to relational algebra. In contrast to the algebra, which is
procedural, the calculus is nonprocedural, or declarative, in that it allows us to describe the set
of answers without being explicit about how they should be computed.

TupleRelationalCalculus
Tuple Relational Calculus is a non-procedural query language unlike relational [Link]
Calculus provides only the description of the query but it does not provide the methods to solve
it. Thus, it explains what to do but not how to do.

In Tuple Relational Calculus, a query is expressed as {t|P(t)}

Where t=resulting tuples,P(t)=known as Predicate and these are the conditions that are used to
fetch t. Thus, it generates set of all tuples t, such that Predicate P(t) is true for t.

P(t) may have various conditions logically combined with OR(∨),AND(∧),NOT(¬). It


also uses quantifiers:
∃t ∈r(Q(t))=”there exists”a tuple int in relationr such that predicate Q(t) is true.
∀t∈r(Q(t))=Q(t)is true“for all”tuples in relation r.

Department of CSE, NRCM Page 36


(Q12)Findthenamesandagesof sailorswitharatingabove7.

{P| ∃S ∈Sailors([Link]>7∧[Link]=[Link] ∧[Link] =[Link])}

This queryillustrates a useful convention: P is considered to be a tuple variable with exactly


two fields, which are called name and age,.

(Q13)Findthesailorname,boatid,andreservationdateforeachreservation
{P|∃R∈Reserves S∈Sailors

([Link]= [Link]∧[Link] =[Link]∧[Link] = [Link]∧[Link] =[Link])}

(Q1)Findthenamesofsailorswhohavereservedboat103.(similarquestionQ1from relational
algebra)

{P|∃S∈Sailors∃R∈Reserves([Link]=[Link]∧[Link]=103∧[Link]∧[Link])}

This query can be read as follows: “Retrieve all sailor tuples for which there exists a tuple in
Reserves, having the same value in the sid field, and with bid = 103.”

(Q2)Findthenamesofsailorswhohavereservedaredboat. (similarquestionQ2from relational


algebra)
{P| ∃S ∈Sailors R∈Reserves([Link] =[Link] ∧[Link]=[Link]

∧∃B∈Boats([Link]=[Link]∧[Link]=′red′))}

Thisquerycanbereadasfollows:“Retrieveallsailortuples Sforwhichthereexisttuples Rin Reserves


and B in Boats such that [Link] = [Link], [Link] = [Link], and [Link] =′red′.”

(Q7)Findthenamesofsailorswhohavereservedatleasttwoboats. (similarquestionQ7 from


relational algebra)

{P|∃S∈Sailors∃R1∈Reserves∃R2∈Reserves([Link]=[Link]
[Link] =[Link] ∧[Link]≠ [Link]∧[Link]= [Link])}

(Q9)Findthenamesofsailorswhohavereservedallboats. (similarquestionQ9from relational


algebra)
{P|∃S∈Sailors ∀B∈Boats

(∃R∈Reserves([Link]=[Link]∧[Link] =[Link]∧[Link] =[Link]))}

Department of CSE, NRCM Page 37


(Q14)Find sailorswhohavereserved allred boats.
{S| S∃Sailors ∈∀ B∈Boats
([Link]=′red′=>(∃R∈Reserves([Link]=[Link]∧[Link]=[Link])))}

Domain Relational Calculus


A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute whosedomain
is the setofintegers).
ADRCqueryhastheform{ 〈 x1, x2, . . . , xn 〉 | p(〈 x1,x2,.. ., xn〉 )}
where eachxiis either a domain variableor a constant andp(〈 x1,x2,.. ., xn〉) denotes a
DRC formula whose only free variables are the variablesamong the xi, 1 ≤ i ≤ n. The result of
this query is the set of all tuples 〈x1, x2,.. .,xn〉 for which the formula evaluates to true.
A DRC formula is defined in a manner very similar to the definition of a TRC formula.
The main difference is that the variables are now domain variables. Let op denote an operatorin
theset {<, >, =, ≤, ≥ , ≠} and let X and Ybedomain variables. Anatomicformulain DRC is one of
the following:
 (x1, x2, . . . , xn)∈Rel, where Rel is a relation with n attributes; each xi, 1 ≤ i ≤ n is
either a variable or a constant
 X op Y
 X opconstant,orconstantop X

A formula is recursively defined to be one of the following, where P and q are themselves
formulas and p(X) denotes a formula in which the variable X appears:
 anyatomic formula
 ┐p,P/\q, P V q, orp=>q
 ∃X(p(X)), whereX isadomain variable
 ∀X(p(X)),whereX isa domain variable

(Q1)Find the names of sailors who have reserved boat 103.


{ (N ) |∃ I,T,A (〈I,N, T,A〉∈ Sailors

∧ ∃ Ir,Br,D(〈Ir,Br,D〉∈ Reserves∧ Ir= I∧ Br= 103) )}

Department of CSE, NRCM Page 38


(Q2)Find the names of sailors who have reserved a red boat.

{〈N〉 |∃ I,T, A(〈I,N,T,A〉∈ Sailors

∧ ∃〈I,Br,D〉∈ Reserves∧∃〈Br,BN,′red′〉∈ Boats)}

(Q7)Find the names of sailors who have reserved at least two boats.
{〈N〉 |∃ I,T, A(〈I,N,T,A〉∈ Sailors∧

∃ Br1, Br2, D1, D2 (〈I,Br1, D1〉∈ Reserves

∧〈I,Br2, D2〉∈ Reserves∧Br1 ≠ Br2)

(Q9)Find thenames of sailors who have reserved all boats.


{〈N〉 |∃ I,T,A(〈I,N,T,A〉∈ Sailors∧

∀ B,BN,C(¬(〈B,BN,C〉∈ Boats) V

(∃〈Ir,Br,D〉∈ Reserves(I = Ir∧ Br = B))))}

Department of CSE, NRCM Page 39


UNIT– III
SQL: QUERIES, CONSTRAINTS, TRIGGERS: form of basic SQL query, UNION, INTERSECT,
and EXCEPT, Nested Queries, aggregation operators, NULL values, complex integrity constraints in
SQL, triggers and active databases. Schema Refinement: Problems caused by redundancy,
decompositions, problems related to decomposition, reasoning about functional dependencies, FIRST,
SECOND, THIRD normal forms, BCNF, lossless join decomposition, multi-valued dependencies,
FOURTH normal form, FIFTH normal form.

1. SQLCOMMANDS
Structured Query Language (SQL) is the database language used to create a database and
to perform operations on the existing database. SQL commands are instructions used to
communicate with the database to perform specific tasks and queries with data. These SQL
commands are categorized into five categories as:

i. DDL:Data Definition Language


ii. DML:Data ManipulationLanguage
iii. DQL:Data QueryLanguage
iv. DCL: Data ControlLanguage
v. TCL: Transaction ControlLanguage.

SQL commands

DDL DML DQL DCL TCL


DataDefinition DataManipulation DataQuery DataControl Transaction
Language Language Language Language ControlLanguage

CREATE INSERT GRANT COMMIT


SELECT

ALTER DELETE REVOKE ROLLBACK

DROP SAVEPOINT
UPDATE

TRUNCATE

Department of CSE, NRCM Page 40


i. DDL(Data Definition Language) :DDL or Data Definition Language consists of the
SQL commands that can be used to define the database schema. It simply deals with
descriptions of the database schema and is used to create and modify the structure ofdatabase
objects in the database. The DQL commands are:

 CREATE: It is used to create the database or its objects (like table, index, function,
views, store procedure and triggers).
 DROP: It is used to delete objects from the database.
 ALTER: It is used to alter the structure of the database.
 TRUNCATE: It is used to remove all records from a table, including all
spaces allocated for the records are removed.
ii. DQL (Data Query Language): DML statements are used for performing queries on the
data within schema objects. The purpose of DQL Command is to get data from some schema
relation based on the query passed to [Link] DQL commands are:
 SELECT–is used to retrieve data from the database.

iii. DML (Data Manipulation Language):The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation Language
and this includes most of the SQL statements. The DML commands are:

 INSERT–is used to insert data into a table.


 UPDATE– is used to update existing data with in a table.
 DELETE–is used to delete records from a database table.
iv. DCL (Data Control Language): DCL includes commands which mainly deal with the
rights, permissions and other controls of the database system. The DCL commands are:
 GRANT-givesuser’s access privileges to database.
 REVOKE-withdraw user’s access privileges given by using the GRANT command.
v. TCL (transaction Control Language):TCL commands deals with the transaction with
in the database. The TCL commands are:
 COMMIT–commitsa Transaction.
 ROLLBACK–rollbacks a transaction in case of any error occurs.
 SAVEPOINT–sets a save point within a transaction.

2. DDL COMMANDS
DDL or Data Definition Language consists of the SQL commands that can be used to define

Department of CSE, NRCM Page 41


the database schema. It simply deals with descriptions of the database schema and is used to
create and modify the structureof database objects in the [Link] DQL commands are:

CREATE: It is used to create the database or its objects like table, index, function, views,
store procedure and triggers.

a) The ‘CREATE DATABASE’ Statement: This statement is used to create a database.

Syntax:CREATE DATABASE Database_Name;

Example: CREATE DATABAS EEmployee;

It creates Employee database.

b) The ‘CREATE TABLE’ Statement: This statement is used to create a table.


Syntax:
CREATE TABLE TableName(
Column1 datatype(size)[column_constraint],
Column2 datatype(size)[column_constraint],
....
ColumnN datatype(size)[column_constraint],
[table_constraint]
[,table_constraint]
);

Note: The content in the squarebrackets indicates it is [Link] not required,you can skip it.

Column constraints
o PRIMARYKEY //Use only,If one column name as primarykey.
o NOTNULL //It does not accept NULLvalue in that column.
o DEFAULTvalue //It storede fault value in that column,if no value is inserted
o UNIQUE //It allows to store only unique values in the column

Table constraints

o PRIMARY KEY(column_name1,column_name2,…)
Use it,If one column name or multiple column names acts as primarykey.
o UNIQUE(column_name1,column_name2,…)
Use it,if one column name o rmultiple column names should contain unique values.
If multiple column names are used, then for each row,it consider values from all the columns
mentioned to decide the uniqueness, but not column wise.
o FOREIGN KEY(column_name1) REFERENCES other_table_name(column_name2)
It is used to link data from one table to other table.
o CHECK(condition)
It does not allow inserting value(s), if the condition is not satisfied. The condition may
alsocontain multiple column names.

Department of CSE, NRCM Page 42


Example1:Creating table without any constraints
CREATE TABLE Employee_Info (
EmployeeID int, EmployeeName
varchar(20), PhoneNumber
numeric(10), City varchar(20),
Country varchar(20)
);
Example2:Using PRIMARYKEY and NOTNULL as column constraints
CREATE TABLE Departments (
DeptID int PRIMARY KEY, DeptName
varchar(20)NOTNULL, Hod varchar(20),
Location varchar(20)
);
Example 3: Using PRIMARY KEY, NOT NULL, UNIQUE and DEFAULT as column constraints and FOREIGN
KEY as table constraint.
CREATE TABLE Students_Info (
HallTicketNo int PRIMARYKEY, Name
varchar(20)NOT NULL,
Mobile numeric(10)NOTNULL UNIQUE, DepartmentID int,
City varchar(20)DEFAULT‘Hyderabad’,
FOREIGNKEY(DepartmentID)REFERENCES Departments(DeptID)
);

Example4:Using NOTNULL,UNIQUE as column constraints and PRIMARYKEYand CHECK as table constraints.


CREATE TABLE Voter_list (
VoterID numeric(10),
Adhaar Nonumeric(12)NOTNULL UNIQUE, Name
varchar(20)NOT NULL,
Age int,
Mobile numeric(10)UNIQUE, City
varchar(20),
PRIMARYKEY(VoterID),
CHECK(AGE>18)
);

c) The ‘CREATETABLEAS’ Statement:You can also create a table from another existing
table. The newly created table also contains data of existing table.
Syntax: CREATE TABLE NewTableName AS(SELECT Column1,column2,...,ColumnN
FROM ExistingTableName
WHERE[condition]);

Example:CREATE TABLE Example Table AS(SELECT EmployeeName,PhoneNumber FROM Employee_Info);

Department of CSE, NRCM Page 43


ii. DROP:This statement is used to drop an existing table or a database.
a) The ‘DROP DATABASE’ Statement: This statement is used to drop an existing
database. When you use this statement, complete information present in the database will
be lost.
Syntax: DROP DATABASE DatabaseName; .

Example: DROP DATABASE Employee;

b) The‘DROPTABLE’Statement:This statement is used to drop an existing [Link]


you use this statement, complete information present in the table will be lost.
Syntax: DROP TABLE TableName; .

Example: DROP TABLE Employee;

iii. TRUNCATE: This command is used to delete the information present in the table but
does not delete the table. So, once you use this command, your information will be lost, but
not the table.
Syntax: TRUNCATE TABLE TableName; .

Example: TRUNCATE TABLE Employee_Info;

iv. ALTER: This command is used to add, delete or modifycolumn(s) in an existing table. It
can also be used to rename the existing table and also to rename the existing column name.
a) The‘ALTERTABLE’ with ADD column:You can use this command to add a new
column to the existing table.
Syntax: ALTER TABLE TableName ADD
Column Name Datatype;

Example:Adding Blood Group column to the Employee_Info table


ALTER TABLE Employee_Info ADD
Blood Group varchar(10);
b) The ‘ALTER TABLE’ with DROP column: You can use this command to remove
acolumn from the existing table.
Syntax: ALTER TABLE TableName
DROP ColumnName;
Example: Removing BloodGroup column from the Employee_Info table
ALTER TABLE Employee_Info DROP
BloodGroup;

Department of CSE, NRCM Page 44


c) The ‘ALTER TABLE’ with MODIFY COLUMN: This statement is used to change
the data type or size of data type of an existing column in a table.

Syntax: ALTER TABLE TableName


MODIFY COLUMN ColumnName Datatype;

Example 1: Changing the size of column ‘EmployeeName’ in table ‘Employee_info’ from


20 to 30.

ALTER TABLE Employee_Info


MODIFY EmployeeName varchar(30);

Example 2: Changing the data type of column ‘EmployeeID’ in the table ‘Employee_info’
from int to char(10).

ALTER TABLE Employee_Info MODIFY


Employee IDchar(10);

d) The‘ALTER TABLE’ with CHANGE column name: This statement is used to change
the column name of an existing column in a table.

Syntax: ALTER TABLE TableName


CHANGE COLUMNOldColumnNameNewColumnName;

Example1: Changing the column name ‘EmployeeName’ to ‘EmpName’in table ‘Employee_info’.

ALTER TABLE Employee_Info


CHANGE COLUMN EmployeeName EmpName;

e) The ‘ALTER TABLE’ with RENAME table name: This statement is used to change
the table name in the database.

Syntax: ALTER TABLE OldTableName


RENAME TO NewTableName;

Example: Changing the table name from‘Employee_Info’to ‘Employee_Data’.

ALTER TABLE Employee_Info RENAME


TO Employee_Data;

3. DML COMMANDS: The SQL commands that deals with the manipulation of data
present in the database belong to DML or Data Manipulation Language and this includes
most of the SQL statements. The DML commands are:
i. INSERT: This statement is used to insert new record (row) into the table.

Department of CSE, NRCM Page 45


Syntax: INSERT INTO TableName[(Column1,Column2,...,ColumnN)]
VALUES (value1, value2,..., valueN);

Example1:
INSERT INTO Employee_Info(EmployeeID,EmployeeName,PhoneNumber,City,Country) VALUES ('06', 'Sanjana',
'9921321141', 'Chennai', 'India');

Example2: When inserting all column values as per their order in the table, you can omit
column names.
INSERT INTO Employee_Info
VALUES('07','Sayantini','9934567654','Pune','India');

ii. DELETE: This statement is used to delete the existing records in a table.

Syntax: DELETE FROM TableName


WHERE Condition;

Example:
DELETE FROM Employee_Info WHERE
EmployeeName='Preeti';

Note: If where condition is not used in DELETE command, then all the rows data will be deleted. If used
only rows which satisfies the condition are deleted.

iii. UPDATE: This statement isused to modify the record values already present in the table.

Syntax: UPDATETableName
SETColumn1=Value1,Column2=Value2,... [WHERE
Condition];

Example:

UPDATE Employee_Info
SET EmployeeName='Jhon',City='Ahmedabad' WHERE
EmployeeID = 1;

Note:If where condition is not used in UPDATE command, then in all the rows Employee Name changes to
'Jhon' and City name changes to 'Ahmedabad'.Ifusedonly rows which satisfies the condition are
updated.

4. DQL COMMAND:The purpose of DQL Command is to get data from one or more
tables based on the query passed to it.
i. SELECT: This statement is used to select data from a database and the data returned
isstored in a result table, called the result-set.

Department of CSE, NRCM Page 46


Syntax: SELECT[DISTINCT]*/Column1,Column2,...ColumN
FROM TableName
[WHEREsearch_condition]
[GROUP BY column_names
[HAVINGsearch_condition_for_GROUP_BY]
[ORDER BY column_name ASC/DESC] ;

Example1:SELECT * FROM table_name;

Example2:
SELECT EmployeeID,EmployeeName
FROM Employee_Info;

The ‘SELECT with DISTINCT’ Statement: This statement is used to display only
different unique values. It mean it will not display duplicate values.

Example: SELECT DISTINCT PhoneNumber FROM Employee_Info;

The ‘ORDER BY’ Statement: The ‘ORDER BY’ statement is used to sort the required
results in ascending or descending order. The results are sorted in ascending order by
default. Yet, if you wish to get the required results in descending order, you have to use
the DESC keyword.

Example

/*Selectallemployeesfromthe'Employee_Info'tablesortedby City*/

SELECT*FROMEmployee_Info
ORDER BY City;

/*Selectallemployeesfromthe'Employee_Info'tablesortedby City in Descending


order */

SELECT*FROMEmployee_Info
ORDER BY City DESC;

/*Selectallemployeesfromthe'Employee_Info'tablesortedby City and


EmployeeName. First it sort the rows as per city, then sort by employee name */

SELECT * FROM Employee_Info


ORDERBYCity,EmployeeName;

/*Selectallemployeesfromthe'Employee_Info'tablesortedby
CityinDescendingorderandEmployeeNameinAscendingorder:*/

SELECT*FROMEmployee_Info
ORDERBYCityASC,EmployeeNameDESC;

Department of CSE, NRCM Page 47


AGGREGATE FUNCTIONS:
The SQL allows summarizing data through a set of functions called aggregate functions. The
commonly used aggregate functions are: MIN(), MAX(), COUNT(), SUM(),AVG().

MIN() Function: The MIN function returns the smallest value of the selected column in a table.

Syntax:SELECT MIN(ColumnName)
FROM TableName
WHERECondition;

Example: SELECT MIN(EmployeeID) FROM


Employee_Info;

MAX( ) Function:The MAX function returns thelargest value of the selected column in a table.

Syntax:SELECT MAX(ColumnName)
FROM TableName
WHERECondition;
Example:
SELECT MAX(Salary)ASLargestFees
FROM Employee_Salary;

COUNT()Function:The COUNT function returns the number of rows which match the specified criteria.

Syntax: SELECT COUNT(ColumnName) FROM


TableName WHERECondition;
Example:
SELECT COUNT(EmployeeID)
FROM Employee_Info;

SUM() Function:The SUM function returns the total sum of a numeric column that you
choose.

Syntax: SELECT SUM(ColumnName)


FROM TableName
WHERECondition;
Example:
SELECT SUM(Salary)
FROMEmployee_Salary;

Department of CSE, NRCM Page 48


AVG() Function:The AVG function returns the average value of a numeric column that you choose.

Syntax: SELECT
AVG(ColumnName)
FROM TableName
WHERECondition;
Example:
SELECT AVG(Salary)
FROMEmployee_Salary;

The‘GROUPBY’Statement: This ‘GROUPBY ’statement is used with the aggregate functions to


group the result-set by one or more columns.

Example:

--Tolistthenumberofemployeesfromeachcity.

SELECT COUNT(EmployeeID),City
FROM Employee_Info
GROUPBYCity;

The‘HAVING’Clause:The‘HAVING’ clause must be used SQL along with GROUPBY clause only. It
is similar to the WHERE clause.

Example

/*&nbsp;&nbsp;To list the number of employees in each city. The employees should be sorted
high to low and only those cities must be included who have more than 5 employees:*/

SELECTCOUNT(EmployeeID),City
FROM Employee_Info
GROUPBY City
HAVINGCOUNT(EmployeeID)>2;

Operators in SQL:

The different setof operators available in SQL are as follows:

 Arithmetic operators
 Bitwise operators
 Comparison operator
 Compound operator
 Logicaloperator

Letus look into each one of them, one by one.

Department of CSE, NRCM Page 49


Arithmetic Operators:
Operator Description
% Modulus [A %B]
/ Division [A /B]
* Multiplication[A * B]
– Subtraction[A– B]
+ Addition [A +B]

Bitwise Operators:
Operator Description
^ BitwiseExclusiveOR(XOR) [A^ B]
| BitwiseOR [A|B]
& BitwiseAND[A &B]

Comparison Operators:
Operator Description
<> NotEqual to [A <>B]
<= Lessthan or equalto [A<= B]
>= Greaterthanor equalto[A>= B]
< Lessthan [A<B]
> Greaterthan[A>B]
= Equalto [A =B]

Compound Operators:
Operator Description
|*= BitwiseORequals [A |*=B]
^-= BitwiseExclusiveequals [A^-=B]
&= BitwiseAND equals[A&= B]
%= Moduloequals [A%=B]
/= Divideequals [A /=B]
*= Multiplyequals[A*= B]
-= Subtractequals[A-= B]
+= Addequals[A+= B]

Logical Operators: The Logical operators present in SQL are as follows:AND,OR,NOT, BETWEEN, LIKE, IN,
EXISTS, ALL, ANY.

Department of CSE, NRCM Page 50


AND Operator: This operator is used to filter records that rely on more than one condition. This
operator displays the records, which satisfy all the conditions separated by AND, and give the output
TRUE.

Syntax:

SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERECondition1ANDCondition2ANDCondition3...;

Example:

SELECT*FROMEmployee_Info
WHERECity='Mumbai'ANDCity='Hyderabad';</pre>

OR Operator: This operator displays all those records which satisfy any of the conditions separated by
OR and give the output TRUE.

Syntax:SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERECondition1ORCondition2ORCondition3...;

Example:

SELECT*FROMEmployee_Info
WHERECity='Mumbai'ORCity='Hyderabad';

NOT Operator: The NOT operator is used, when you want to display the records which do not satisfy a
condition.

Syntax: SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERENOTCondition;

Example:

SELECT * FROM Employee_Info


WHERE NOT City='Mumbai';

NOTE:You can also combine the above three operators and write a query as follows:
SELECT * FROM Employee_Info
WHERE NOT Country='India' AND (City='Bangalore 'OR City='Hyderabad');

BETWEEN Operator: The BETWEEN operator is used, when you want to select values within a given
range. Since this is an inclusive operator, both the starting and ending values are considered.
Syntax:
Department of CSE, NRCM Page 51
SELECTColumnN
ame(s) FROM
TableName
WHEREColumnNameBETWEENValue1ANDValue2;

Example:
SELECT * FROM Employee_Salary
WHERE Salary BETWEEN 40000 AND 50000;

LIKE Operator

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column of a table.
There are mainly two wildcards that are used in conjunction with the LIKE operator:

 % :Itisused tomatches 0ormorecharacter.


 _:Itis usedtomatchesexactlyonecharacter.

Syntax

SELECTColumnName(s
) FROM TableName
WHEREColumnNameLIKEpattern;

Refer to the following table for the various patterns that you can mention with the LIKE operator.

LikeOperatorCondition Description
WHERE CustomerName LIKE ‘v% Finds any values that start with “v”
WHERE CustomerName LIKE ‘%v’ Finds any values that end with“v”
WHERE CustomerName LIKE ‘%and%’ Finds any values that have“and”in any position
Finds any values that h0ave“q”in the second
WHERE CustomerName LIKE ‘_q%’
position.
Finds any values that start with“u” and are at
WHERE CustomerName LIKE ‘u_%_%’
least 3 characters in length
Finds any values that start with“m”and end
WHERE ContactName LIKE ‘m%a’
with “a”

Example:
SELECT * FROM Employee_Info
WHEREEmployeeNameLIKE'S%';

IN Operator:This operator is used for multiple OR [Link] allows you to specify multiple values
in a WHERE clause.

Syntax:SELECTColumnName(s) FROM
TableName
WHERE ColumnNameIN(Value1,Value2...);
Department of CSE, NRCM Page 52
Example:
SELECT * FROM Employee_Info
WHERE City IN('Mumbai','Bangalore','Hyderabad');
NOTE:You can also use IN while writing Nested Queries.

EXISTS Operator: The EXISTS operator is used to test if a record exists or not.
Syntax:
SELECTColumn
Name(s) FROM
TableName WHERE
EXISTS
(SELECT ColumnName FROM TableNameWHERE condition);

Example:
SELECT City
FROM Employee_Info
WHERE EXISTS (SELECTCity
FROM Employee_Info
WHERE EmployeeId=05 AND City='Kolkata');

ALL Operator: The ALL operator is used with a WHERE or HAVING clause and returns TRUEif all
of the subquery values meet the condition.

Syntax:SELECTColumnName(s)
FROM TableName
WHERE ColumnName operator ALL
(SELECT ColumnName FROM TableName WHEREcondition);

Example:
SELECTEmployeeName
FROM Employee_Info
WHEREEmployeeID=ALL(SELECTEmployeeID
FROMEmployee_Info
WHERECity='Hyderabad');

ANYOperator:Similar to the ALL operator,the ANY operator is also used withaWHEREor HAVING
clause and returns true if any of the subquery values meet the condition.

Syntax:
SELECTColumnN
ame(s) FROM
TableName
WHERE ColumnNameoperator ANY
(SELECTColumnNameFROMTableNameWHEREcondition);

Example:
SELECTEmployeeName
FROM Employee_Info
Department of CSE, NRCM Page 53
Aliases Statement:Aliases are used to give a column /table a temporary name and only exists for
duration of the query.

Syntax:/*[Link] used in the table, it display


alias name. */

SELECTColumnNameASAliasName
FROM TableName;
Example:

SELECTEmployeeIDASID,EmployeeNameASEmpName
FROM Employee_Info;

5. NESTED QUERIES
Nested queries are those queries which have an outer query and inner [Link], basically,
the subquery is a query which is nested within another query.

OUTERQUERY SUBQUERYorINNERQUERY

SELECT EmployeeName,PhoneNumber
FROM Employee_Info
WHERE City IN( SELECT City
FROM Office
WHERE County=‘INDIA’);
First the inner query gets executed and the result will be used to execute the outer query.

6. SETOPERATIONS:UNION,INTERSECT,EXCEPT
There are mainly three set operations:UNION, INTERSECT,[Link] can refer to the image below
to understand the set operations in SQL.

Department of CSE, NRCM Page 54


i. UNION:This operator is used to combine the result-set of two or more SELECT statements.
Syntax: SELECTColumnName(s)FROMTable1WHEREcondition UNION
SELECTColumnName(s)FROMTable2WHEREcondition;

ii. INTERSECT:This clause used to combine two SELECTstatements and return the
intersection of the data-sets of both the SELECT statements.
Syntax: SELECTColumnName(s)FROMTable1WHEREcondition
INTERSECT
SELECTColumnName(s)FROMTable2WHEREcondition;

iii. EXCEPT:This operator returns those tuples that are returned by the first SELECT
operation, and are not returned by the second SELECT operation.

Syntax: SELECT ColumnName(s)FROM Table1 WHERE condition


EXCEPT
SELECT ColumnName(s)FROM Table2 WHERE condition;

Note: UNION, INTERSECT or EXCEPT operations are possible if and only if first SELECT
query and second SELECT query produces same no of columns in same order, same
column names and data type. Otherwise it gives an error.

7. JOINS
JOINS are used to combine rows from two or more tables, based on a related column betweenthose
tables. The following are the types of joins:

 INNER JOIN:This join returns those records which have matching values in both
thetables.
 FULL JOIN:This join returns all those records which either have a match in the left or
the right table.
 LEFT JOIN:This join returns records from the left table, and also those records which
satisfy the condition from the right table.
 RIGHTJOIN:Thisjoinreturnsrecordsfromtherighttable,andalsothoserecords which
satisfy the condition from the left table.

Department of CSE, NRCM Page 55


Refer to the image below.

Department of CSE, NRCM Page 56


Let’sconsider the below Technologies and the Employee_Infotable, to understand the syntax of
joins.

Employee_Info
EmployeeID EmployeeName PhoneNumber City Country
01 Shravya 9898765612 Mumbai India
02 Vijay 9432156783 Delhi India
03 Preeti 9764234519 Bangalore India
04 Vijay 9966442211 Hyderabad India
05 Manasa 9543176246 Kolkata India

Technologies
TechID EmpID TechName ProjectStartDate
1 01 DevOps 04-01-2019
2 03 Blockchain 06-07-2019
3 04 Python 01-03-2019
4 06 Java 10-10-2019

INNER JOIN or EQUI JOIN: This is a simple JOIN in which the result is based on matched data as
per the equality condition specified in the SQL query. This join is used mostly. NATURAL JOIN is a
type INNER JOIN. We can also use it. It also gives same result.
Syntax

SELECT
ColumnName(s) FROM
Table1
INNER JOIN Table2 ON [Link]=[Link];

Example

SELECT T. TechID, [Link], [Link]


FROM Technologies T
INNER JOIN Employee_InfoEONT.EmpID=[Link];

TechID EmployeeID EmployeeName


1 01 Shravya
2 03 Preeti
3 04 Vijay

Department of CSE, NRCM Page 57


FULL OUTER JOIN: The full outer join returns a result-set table with the matched data of two
table then remaining rows of both left table and right table with missing values are filled with NULL
values.

Syntax
SELECTColumnName(s
) FROM Table1
[Link]=[Link];

Example

[Link],[Link],[Link]
FROM Employee_Info E
[Link]=[Link];

EmployeeID EmployeeName TechID


01 Shravya 1
02 Vijay NULL
03 Preeti 2
04 Vijay 3
05 Manasa NULL
06 NULL 4

LEFT JOIN: The left outer join returns a result-set table with the matched data from the two tables
and then the remaining rows of the left table with null for the right table's columns.
Syntax:
SELECTColumnName(s)
FROM Table1
[Link]=[Link];

Example:
[Link],[Link],[Link]
FROM Employee_Info E
[Link]=[Link];

EmployeeID EmployeeName TechID


01 Shravya 1
02 Vijay NULL
03 Preeti 2
04 Vijay 3
05 Manasa NULL

Department of CSE, NRCM Page 58


RIGHT JOIN: The right outer join returns a result-set table with the matched data from the two
tables being joined, then the remaining rows of the right table and null for the remaining left table's
columns.
Syntax:
SELECTColumnName(s)
FROM Table1
[Link]=[Link];

Example:
[Link],[Link],[Link] FROM
Employee_Info E
[Link]=[Link];

EmployeeID EmployeeName TechID


01 Shravya 1
03 Preeti 2
04 Vijay 3
NULL NULL 4

8. TRIGGERS
A trigger is a stored procedure in database which automatically invokes when ever a special
event in the database occurs. For example, a trigger can be invoked when a row is inserted into a
specified table or when certain table columns are being updated. So, a trigger can be invoked
either BEFORE or AFTER the data is changed byINSERT, UPDATE or DELETEstatement.
Refer to the image below.

Department of CSE, NRCM Page 59


Syntax:

CREATE TRIGGER[TriggerName]
[BEFORE | AFTER]
{INSERT|UPDATE|DELETE}
on[TableName]
[FOREACHROW]
[TriggerBody]

Explanation of syntax:

 Create trigger[trigger_name]:Creates or replaces an existing trigger with the


trigger_name.
 [before|after]:This specifies when the trigger will be executed.
 {insert|update|delete}: This specifies the DML operation.
 on[table_name]:This specifies the name of the table associated with the trigger.
 [foreachrow]:This specifies a row-level trigger,i.e.,the trigger will be executed for
each row being affected.
 [trigger_body]:This provides the operation to be performed as trigger is fired.

BEFOREandAFTERofTrigger:
BEFORE triggers run the trigger action before the triggering statement is run.

AFTER triggers run the trigger action after the triggering statement is run.

EXAMPLE:

CREATE TRIGGER nb BEFORE INSERT ON accounts FOR EACH ROW /*Event*/ Begin
IF:[Link]<0THEN /*Condition*/
DBMS_OUTPUT.PUT_LINE('BALANCE IS NAGATIVE..'); /*Action*/
END IF;
End;

A trigger called ‘nb’ is created to alert the user when inserting account details with negative balance
value in to accounts table. Before inserting, the trigger is activated if the condition istrue. When a trigger
activated, the action part of the trigger is get executed.

9. NORMALIZATION
 Normalization is the process of minimizing the redundancy from a relation or set of
relations.
 It is used to eliminate the Insertion,Update and Deletion Anomalies.
 Normalization divides the larger table into the smaller table and links them using
relationship.
 Normalization is done with the help of different normal form.

Department of CSE, NRCM Page 60


The inventor of the relational model Edgar Codd proposed the theory of normalization with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third
Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal
[Link] software industry, they are using only up to third normal form and sometimes Boyce-
Codd Normal Form.

TheProblemofredundancy

Redundancy means having multiple copies of same data in the [Link] problem arises when a
database is not normalized. Redundancy leads the following problems.

 WastageofMemory:Disk spaceis wasted dueto storingsame copymultiple times.


 Storage cost increases: When multiple copies of same data is stored, need more disk
space and storage cost increases.
 Update anomaly: When Address of student is stored at several places; a change in the
address must be made in all the places. Changing the address at some places and leaving
other places leads to inconsistency problem.
 Insertion Anomaly: The nature of a database may be such that it is not possible to add a
required piece of data unless another piece of unavailable data is also added. Forexample,
a librarydatabase cannot store the details of a new student until that student has taken
atleast one book from the library.
 Deletion Anomaly: When some data is deleted, it also deletes other data automatically.
For example, deleting a book details from a library database, it also delete the student
details who have taken the book previously.

10. 1NF(FIRST NORMAL FORM)


A relation (table)is said tobein first normal form if and only if:

 Each table cell contains only atomic values(single value).


 Each record needs to be uniquely identified by the primary key.

Department of CSE, NRCM Page 61


1NF Example:

HTNO FIRSTNAME LASTNAME MOBILE


9999988888
501 Jhansi Rani
7777799999
8888888881
502 Ajay Kumar
7897897897
503 Priya Verma 9898989898

Theabovetableis not in 1NFbecause 501 and 502 is havingtwo valuesin mobilecolumn. If we add a new
column as alternative mobile number to the above table, then for 503 alternative mobile number is
[Link], if a student has ‘n’ mobile numbers, then adding ‘n’ extra column is meaningless. It is
better to add extra rows. If we add extra row for each 501 and 502 then the table looks like

HTNO FIRSTNAME LASTNAME MOBILE


501 Jhansi Rani 9999988888
501 Jhansi Rani 7777799999
502 Ajay Kumar 8888888881
502 Ajay Kumar 7897897897
503 Priya Verma 9898989898

But the above table violates primary key constraint. Therefore instead of adding either columns or rows,
the best solution is to split the table into two tables as shown below. If we do as shown below, if a
student having ‘n’ number of mobile numbers also can be added.
HTNO FIRST LAST HTNO MOBILE
NAME NAME 501 9999988888
501 Jhansi Rani 501 7777799999
502 Ajay Kumar 502 8888888881
502 7897897897
503 Priya Verma 503 9898989898

11. 2NF (SECOND NORMAL FORM)

A relation is said to bein 2-NFifand only if

 It should be in 1-NF(First Normal Form)


 There should not be any partial functional dependencies

2NF Example:

Department of CSE, NRCM Page 62


HTNO Name DOB DeptNo DeptName Location
501 Jhansi 30-10-1998 05 CSE A-Block
502 Ajay 24-12-1999 05 CSE A-Block
410 Priya 12-03-2000 04 ECE B-Block
120 Rahul 30-10-1998 01 CIVIL C-Block
415 Smitha 18-06-1999 04 ECE B-Block

The above table is not in 2NF because there exist partial function dependencies. HTNO is a key attribute
in the above table. If every non-key attribute fully dependent on key attribute, then we say it is fully
functional dependent. Consider the below diagram. {Name, DOB, DeptNo, DeptName, Location}
depends on HTNO. But {DeptName, Location} also depends on DeptNo.

Name

DOB DeptName

DeptNo
DeptNo

HTNO Location
DeptName

Location

It is clear that DeptName and Location not only depends upon HTNO but also on DeptNo. So, there
exists partial function dependency. This partial functional dependency can be removed by splitting the
above table into two tables as follows.

HTNO Name DOB DeptNo DeptNo DeptName Location


501 Jhansi 30-10-1998 05 05 CSE A-Block
502 Ajay 24-12-1999 05
410 Priya 12-03-2000 04 04 ECE B-Block
120 Rahul 30-10-1998 01 01 CIVIL C-Block
415 Smitha 18-06-1999 04

12. 3NF(THIRD NORMAL FORM)


A relation (table)is in third normal form if and only if it satisfies the following conditions:

 It is in second normal form


 There is no transitive functional dependency

Department of CSE, NRCM Page 63


Transitive functional dependency means, we have the following relationships in the table: A is
functionallydependent on B (A→B), and Bis functionallydependent on C (B→C). In this case, C is
transitively dependent on A via B (A→B and B→C mean A→B→C implies A→C).

3NFExample:

Consider the following book details table example:

BOOK_DETAILS
BookID GenreID GenreType Price
1 1 Gardening 250.00
2 2 Sports 149.00
3 1 Gardening 100.00
4 3 Travel 160.00
5 2 Sports 320.00

The above table is not in 3NF because there exist transitive [Link] the table able,
BookIDdeterminesGenreID {BookID→GenreID}
GenreIDdeterminesGenreType. {GenreID→GenreType}
BookIDdeterminesGenreTypeviaGenreID. {BookID→GenreType}
It implies that transitive functional dependency is existing and the structure does not satisfy third
normal [Link] bring this table into third normal form,we split the table into two as follows:
BOOK_DETAILS
BookID GenreID Price
1 1 250.00 GENRE_DETAILS
2 2 149.00 GenreID GenreType
3 1 100.00 1 Gardening
4 3 160.00 2 Sports
5 2 320.00 3 Travel

13. BOYCE CODD NORMAL FORM(BCNF)

A relation (table) is said to be in the BCNF if and only if it satisfy the following conditions:

 It should be in theThird Normal Form.


 For any functional dependencyA→B, A should be as uperkey.

 In simple words, it means, that for a dependency A → B, A cannot be a non-prime


attribute, if B is a prime attribute.

Department of CSE, NRCM Page 64


Example:Below we have a Patient table of a hospital.A patient can go to hospital many times to take
treatment. On a single day many patients can take treatment.
PatientID Name EmailID AdmittedDate Drug Quntity
101 Ram ram@[Link] 30/10/1998 A-10 10
102 Jhon jho@[Link] 30/10/1998 X-90 10
101 Ram ram@[Link] 10/06/2001 X-90 20
103 Sowmya sam@[Link] 05/03/2002 Y-30 15
102 Jhon jho@[Link] 05/03/2002 A-10 15

In the above table,{PateintID,AdmittedDate}acts as [Link] if we know the


EmailIDvalue, wecan find PatientID value.

ThatisEmailID → PatientID.

In the above dependency, EmailId is non-prime attribute and PatientID is a primeattribute. Therefore
the above table is not in BCNF. In order to bring the table into BCNF, we split it into two tables as
shown below.

PatientID Name AdmittedDate Drug Quntity PatientID EmailID


101 Ram 30/10/1998 A-10 10 101 ram@[Link]
102 Jhon 30/10/1998 X-90 10 102 jho@[Link]
101 Ram 10/06/2001 X-90 20 103 sam@[Link]
103 Sowmya 05/03/2002 Y-30 15
102 Jhon 05/03/2002 A-10 15

In other words we can also define BCNF as there should not be any overlapping between candidate keys. If you
consider the original table (before splitting), we can get two candidate keys {PateintID, AdmittedDate}
and{EmailID, AdmittedDate}.

As there exist overlapping in the candidate keys,the table is not in BCNF. To bring it into BCNF, we split into
two tables as shown above.

Department of CSE, NRCM Page 65


14. 4-NF(FOURTH NORMAL FORM)
A relation is said to be in 4-NF if and only if it satisfies the following conditions

 It should be in the ThirdNormalForm.


 The table should not have anyMulti-valuedDependency.

WhatisMulti-valuedDependency?
A table is said to have multi-valued dependency, if the following three conditions are true.

i. A table should have at-least 3 columns for it to have a multi-valued dependency.


ii. For any dependencyA →B,if there exists multiple value of B for a single value of A,
then the table may have multi-valued dependency. It is represented as A →→ B.
iii. In a relation R(A,B,C),if there is a multi-valued dependency between A and B,thenB
And C should be independent of each other.

If all these three conditions are true for any relation (table), then it contains multi-valued
[Link]-valueddependencycanbe explainedwithanexample. LettheRelationR containing
three columns A, B, C and four rows s, t, u, v.

A B C
s a1 b1 c1
t a1 b1 c2
u a1 b2 c1
v a1 b2 c2
If s(A) =t(A)=u(A)=v(A)
s(B)=t(B) ands(B) =v(B)
s(C)=u(C)andt(C)=v(C), then thereexist multi-valueddependency.

Example:Consider the below college enrolment table with columns HTNO,Subject and
Hobby.

Department of CSE, NRCM Page 66


501

Department of CSE, NRCM Page 67


Subject
Java C#

Cricket Dancing
Hobby

Department of CSE, NRCM Page 68


502

Department of CSE, NRCM Page 69


Hobby As shown in the above figure, if501 opted forsubjects likeJavaand C# and hobbies of 501 are Cricket and Dancing.
Similarly, If 502 opted for subjects like Python andAndroid and hobbies of 501 are Chess and Singing, then it can be written
into a table with three columns as follows:
HTNO Subject Hobby
501 Java Cricket
501 Java Dancing
501 C# Cricket
501 C# Dancing
502 Python Chess
502 Python Singing
502 Android Chess
502 Android Singing

Asthereexistmultivalueddependency,theabovetableisdecomposedintotwotablessuch that

HTNO Subject HTNO Hobby


501 Java 501 Cricket
501 C# 501 Dancing
502 Python 502 Chess
502 Android 502 Singing

Now these tables(relations)satisfy the fourth normalform.

15. 5NF
A relation is said to be in 5-NFif and onlyif it satisfies the followingconditions

 ItshouldbeintheFourth NormalForm.
 Thetableshould not haveanyjoin Dependencyand joiningshould belossless.

5NF is also known as Project-join normal form (PJ/NF).

A table is decomposed into multiple small tables to eliminate redundancy, and when we re-join the
decomposed tables, there should not be any loss in the original data or shold not create any new data. In
simple words, joining two or more decomposed table should not lose records nor create new records.

Department of CSE, NRCM Page 70


Example:Consider a table which contains a record of Subject,Professor and Semesterin three
columns. The primarykeyis the combination of all three columns. No column itself is not a candidate
key or a super key.

In the table,DBMS is taught by Ravindar and Uma Rani in semester4, DS by Sindhusha and Venu in
sem 3. In this case, the combination of all these fields required to identify valid data.

So to make the table into 5NF,we can decompose it intot here relations,

Subject Professor Semester


C Srilatha 2
DBMS Ravindar 4
DS Sindhusha 3
DBMS UmaRani 4
CN Srikanth 5
DS Venu 3
WT Srinivas 5

The above table is decomposed into three tables as follows to bringit into 5-NF.

Subject Professor Semester Professor Semester Subject


C Srilatha 2 Srilatha 2 C
DBMS Ravindar 3 Ravindar 4 DBMS
DS Sindhusha 5 Sindhusha 3 DS
DBMS UmaRani 3 UmaRani 5 CN
CN Srikanth 5 Srikanth 5 WT
DS Venu 2 Venu
WT Srinivas 5 Srinivas

16. LOSS LESS JOIN DECOMPOSITION


Decomposition of a relation R into R1 and R2 is lossless-join decomposition if at least one of the
following functional dependencies are in F+ (Closure of functional dependencies)
R1∩R2→R1 OR
R1∩R2→R2

 ConsiderarelationR whichis decomposedinto subrelationsR1 and R2.


 ThisdecompositioniscalledlosslessjoindecompositionwhenwejoinR1andR2and if we
get the same relation R that was decomposed.
 Forlossless joindecomposition, wealways have:R1⋈R2

Department of CSE, NRCM Page 71


Example 1:Consider the following relation R( A , B , C ). Let this relationis decomposed
into two sub relations R1( A , B ) and R2( B , C )

R R1 R2
A B C A B B C
1 2 1 decompose→ 1 2 and 2 1
2 5 3 2 5 5 3
3 3 3 3 3 3 3

Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have:R1⋈ R2 = R .Now, if we perform the natural join ( ⋈) of the sub relations R 1
and R2 , we get
A B C
1 2 1
Thisrelationis sameastheoriginalrelation R.
2 5 3
3 3 3

Thus, we conclude that the above decomposition is lossless join decomposition. This is
because the resultant relation after joining the sub relations is same as the decomposed
relation. No extraneous tuples (rows) appear after joining of the sub-relations.

Example 2:Considerthe following relation R( A , B , C ). Let this relation is decomposed


into two sub relations R1( A , C ) and R2( B , C )

R R1 R2
A B C A C B C
1 2 1 decompose→ 1 1 and 2 1
2 5 3 2 3 5 3
3 3 3 3 3 3 3

Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have:R1⋈ R2 = R .Now, if we perform the natural join ( ⋈) of the sub relations R 1
and R2 , we get
A B C
1 2 1
ThisrelationisnotsameastheoriginalrelationR.
2 5 3
2 3 3
3 5 3
3 3 3

Thus, we conclude that the above decomposition is not lossless join decomposition. This is
because the resultant relation after joining the sub relations is not same as the decomposed
relation. Extraneous tuples (rows) appear after joining of the sub-relations.
Department of CSE, NRCM Page 72
PROBLEMS
ConsiderarelationR isdecomposed intotwo sub relationsR1andR2.

 Ifall the followingconditions satisfy, thenthedecomposition is lossless.


 If anyof theseconditions fail,then thedecomposition is lossy.

Condition-01:Unionofboththesubrelationsmustcontainalltheattributesthatarepresent in the
original relation R.
R1∪R2= R
Condition-02:Intersectionofboththesubrelationsmustnotbenull. Inotherwords,there must
be some common attribute which is present in both the sub relations.
R1∩R2 ≠∅
Condition-03:IntersectionofboththesubrelationsmustbeasuperkeyofeitherR1orR2or both.

************************************************************************

Problem-01:ConsiderarelationschemaR(A,B,C,D)withthefunctionaldependencies A → B
and C → D. Determine whether the decomposition of R into R1 ( A , B ) and R2 ( C , D ) is
lossless or lossy.

Solution:To determine whether the decomposition is lossless or lossy, we will check all the
conditions one by one. If any of the conditions fail, then the decomposition is lossy otherwise
lossless.
Condition-01:Accordingto condition-01,union ofboththe subrelations must containallthe
attributes of relation R. So, we have:
R1( A, B)∪R2( C , D ) = R(A , B, C , D )
Clearly, union of the sub relations contains all the attributes of relation R. Thus, condition-01
satisfies.
Condition-02:According to condition-02, intersection of both the sub relations must not be
null. So, we have-
R1( A, B)∩R2( C , D ) = Φ
Clearly, intersection of the sub relations is [Link], condition-02 fails. Thus, we conclude that
the decomposition is lossy.

************************************************************************
Problem-02:ConsiderarelationschemaR(A,B,C,D)withthefollowingfunctional dependencies
A→ B B→ C C→D D→ B

Department of CSE, NRCM Page 73


Determinewhetherthe decompositionof RintoR1( A , B ), R2(B , C) and R3(B ,D )is lossless or
lossy.
Solution:

StrategytoSolve:Whenagivenrelationisdecomposedintomorethantwosubrelations, then
 Considerany onepossiblewaysinwhichtherelationmighthavebeendecomposed into those
sub relations.
 First,dividethe givenrelationintotwosub relations.
 Then,dividethesubrelationsaccordingtothesubrelationsgiveninthequestion. As a
thumb rule, remember-
Anyrelation can bedecomposed onlyinto two sub relations at a time.

ConsidertheoriginalrelationR wasdecomposedintothe givensubrelationsas shown:

Decomposition ofR(A,B, C,D) intoR'(A, B, C)and R3(B,D)-

Todeterminewhetherthedecompositionis losslessor lossy,


 Wewill check all the conditions onebyone.
 Ifanyof theconditions fail,then thedecompositionis lossyotherwise lossless.

Condition-01:According to condition-01, union of both the sub relations must contain all the
attributes of relation R. So, we have
R‘( A, B, C )∪R3( B, D ) = R (A , B, C , D )
Clearly, union of the sub relations contains all the attributes of relation R. Thus, condition-01
satisfies.

Condition-02:According to condition-02, intersection of both the sub relations must not


benull. So, we have
R‘( A, B, C )∩R3(B,D ) = B
Clearly,intersectionofthesubrelations [Link],condition-02 satisfies.

Condition-03:According to condition-03, intersection of both the sub relations must be


thesuper key of one of the two sub relations or both. So, we have-
R‘( A, B, C )∩R3(B,D ) = B
Department of CSE, NRCM Page 74
Now, the closure of attribute B is: B+={B,C,D} So,
 Attribute‘B’cannotdetermineattribute‘A’ofsubrelation R’.
 Thus,itisnotasuper keyofthe sub relation R’.
 Attribute‘B’ candetermineall theattributesofsubrelation R3.
 Thus,it is a super keyofthe sub relation R3.

Clearly,intersectionofthesubrelationsisasuperkeyofoneofthesubrelations. So,
condition-03 satisfies. Thus, we conclude that the decomposition is lossless.

Decomposition of R'(A, B,C) intoR1(A, B)andR2(B,C)-


Todeterminewhetherthedecompositionis losslessor lossy,
 Wewill check all the conditions onebyone.
 Ifanyof the conditions fail, then thedecomposition is lossyotherwise lossless.

Condition-01: According to condition-01, union of both the sub relations must contain all the
attributes of relation R’. So, we have
R1( A, B)∪R2( B, C ) = R’(A, B, C )
Clearly, union of the sub relations contain all the attributes of relation R’. Thus, condition-01
satisfies.
Condition-02: According to condition-02, intersection of both the sub relations must not
benull.
So, wehave
R1( A, B)∩R2( B, C) = B
Clearly,intersectionofthesubrelations [Link],condition-02 satisfies.

Condition-03: According to condition-03, intersection of both the sub relations must be


thesuper key of one of the two sub relations or both. So, we have
R1( A, B)∩R2( B, C) = B
+
Now, the closure of attribute B is: B ={B,C,D} So,
 Attribute‘B’cannotdetermineattribute‘A’ofsubrelation R1.
 Thus,itis nota superkeyofthe sub relation R1.
 Attribute‘B’ candetermineall theattributesofsubrelation R2.
 Thus,it is a super keyofthe subrelation R2.

Clearly,[Link], condition-03
satisfies. Thus, we conclude that the decomposition is lossless.

Department of CSE, NRCM Page 75


17. CLOSURE OF AN ATTRIBUTE SET:

The set of all those attributes which can be functionally determined from an attribute
set is called as a closure of that attribute set. Closure of attribute set {X} is denoted as {X}+.

StepstoFindClosureofanAttributeSet:Followingstepsarefollowedtofindthe closure of an
attribute set:
Step-01:Addtheattributescontainedintheattributesetforwhichclosureisbeing calculated to the
result set.
Step-02:Recursivelyaddtheattributestotheresultsetwhichcanbefunctionallydetermined from the
attributes already contained in the result set.

Question1: ConsiderarelationR(A,B,C,D,E,F,G)withthefunctional dependencies


A→ BC, BC→DE, D→ F, CF→ G
Findtheclosureof{A}, {D} and{B,C}attributesandattributesets
Solution:
Closureof attributeA:
+
A ={A}
={ A , B, C } (UsingA→BC )
={ A , B, C , D , E} (UsingBC →DE)
={ A , B, C , D , E,F} (UsingD→F)
={ A , B, C , D , E,F,G } (UsingCF→ G)
Thus,
A+={A , B , C , D, E ,F,G}
ClosureofattributeD:
+
D ={D}
={ D , F}( UsingD→ F)
WecannotdetermineanyotherattributeusingattributesDandFcontainedintheresultset. Thus,
D+= {D ,F}
Closureof attributeset{B, C}
+
{ B, C } ={ B, C}
={ B, C , D , E} (UsingBC→DE)
={ B, C , D , E,F} (UsingD→F)
={ B, C , D , E,F, G } ( UsingCF→ G)
Thus,
{B , C }+= {B , C , D,E , F,G}

Department of CSE, NRCM Page 76


Question2:Considerthegivenfunctionaldependencies
AB→ CD AF→ D DE→F C→G F→ E G→ A
Findtheclosureof {A,B}, {A, F},{B, G}and {C, F}

Solution:
Closureof {A, B}:
+
{AB} ={ A ,B }
={ A , B, C , D } (UsingAB→ CD)
={ A , B, C , D ,G } (UsingC→ G)
Thus,{AB}+={ A , B, C , D , G}

Closureof {C, F}:


+
{CF} ={ C , F }
={ C , F, G } (UsingC→ G)
={ C , E , F, G } (Using F→ E )
={ A , C , E , E , F} (UsingG→A)
={ A , C , D , E , F,G } (UsingAF→ D)
Thus,{ CF}+={ A, C , D , E, F,G }

Closureof {B, G}:


+
{BG } ={B, G }
={ A , B,G } (UsingG→A)
={ A , B, C , D ,G } (UsingAB→ CD)
Thus,{ BG }+={ A,B , C , D , G}

Closureof {B, G}:


+
{AF} ={ A ,F }
={ A , D, F} (UsingAF→ D )
={ A , D, E ,F} (Using F→ E )
+
Thus,{ AF} ={ A ,D , E, F}

18. FINDING THE KEYS USING CLOSURE

Super Key:
Iftheclosureresultofanattributesetcontainsalltheattributesoftherelation, then that attribute set is
called as a super key of that relation.
 Thus,wecansay,“Theclosureofasuperkeyistheentirerelationschema.”
Example:In the above example (Question 1),
 Theclosureof attribute A is theentirerelation schema.
 Thus,attributeA is asuper keyfor that relation.

Department of CSE, NRCM Page 77


CandidateKey:
If there exists no subset of an attribute set whose closure contains all the attributes of
the relation, then that attribute set is called as a candidate key of that relation.
Example:Intheaboveexample(Question1),
 Nosubsetof attributeA contains alltheattributesofthe relation.
 Thus,attributeA is also a candidatekeyfor that relation.

Finding Candidate Keys From a Relation:


Wecan det erminethecandidatekeys ofa givenrelation usingthe followingsteps-

Step-01:Determineallessentialattributesofthegiven relation

 EssentialattributesarethoseattributeswhicharenotpresentonRHSofanyfunctional
dependency.
 [Link] cannot
be determined by other attributes.

Example:LetR(A,B,C,D,E,F)bearelationschemewiththefollowingfunctional
dependencies: A → B,C → DandD → E.

The RHS of all the above functional dependencies contain only B, D and E. The
attributeswhicharenotpresentonRHSofanyfunctionaldependencyareA,CandF. So,
essential attributes are: A, C and F.

Step-02:Determiningallnon-essentialattributesusingessentialattributes

 [Link] can be
determined by using essential attributes.
 Now,followingtwocasesarepossible-
 Case-01:Ifallessentialattributestogethercandetermineallremainingnon-essential
attributes, then
o Thecombinationofessentialattributesis thecandidatekey.
o Itis the onlypossible candidatekey.
 Case-02:Ifallessentialattributestogethercannotdetermineallremainingnon-
essential attributes, then-

 Thesetofessentialattributesandsomenon-essentialattributeswillbethecandidate key(s).
 Inthiscase,multiplecandidatekeysarepossible.
 Tofindthecandidatekeys,wecheckdifferentcombinationsofessentialandnon-
essential attributes.

Department of CSE, NRCM Page 78


PROBLEMS ON FINDING CANDIDATE KEYS

Problem-01:LetR=(A,B,C,D,E,F)bearelationschemewiththefollowing
dependencies:C→F,E→A,EC→DandA→B. [Link], determine the
total number of candidate keys and super keys.

Solution:Wewillfindcandidatekeysof thegivenrelationinthefollowingsteps-
Step-01:Determineallessentialattributesofthegiven relation.

 Essentialattributesofthe relationare: C andE.


 So,attributes C and Ewill definitelybeapart ofeverycandidate key.

Step-02:Now,wewillcheckiftheessentialattributestogethercandetermineallremaining non-
essential attributes. To check, we find the closure of CE. So,

{ CE }+ ={ C , E }
={ C, E , F} (UsingC→F)
={ A , C , E , F} (UsingE→A)
={ A , C , D , E , F} (UsingEC→ D)
={ A , B, C , D , E,F} (UsingA→B)
[Link],CEistheonly possible
candidate key of the relation.

TotalNumberofSuper Keys-

Therearetotal6attributes inthegiven relationofwhich-

 Thereare2essentialattributes-Cand E.
 Remaining4 attributesarenon-essential attributes.
 Essential attributes will be definitelypresent in every key.
 Non-essential attributesmayor maynot bepresent ineverysuperkey.

C E ABDF

Essential attributes Non-Essentialattributes

So,numberofsuperkeyspossible=2x 2x 2x 2=[Link],totalnumberofsuperkeys possible = 16.

Problem-02:ConsidertherelationschemeR(E,F,G,H,I,J,K,L,M,N)andtheset
offunctionaldependencies: {E, F} → { G}, { F} → {I, J}, { E, H} → { K,L},
{ K} → { M} and {L}→ {N}.Determine thecandidatekey(s).

Department of CSE, NRCM Page 79


Solution:Wewillfind candidatekeys ofthegivenrelation inthe following steps-
Step-01:Determineallessentialattributesofthegiven relation.

 Essentialattributesof the relationare-E,Fand H.


 So,attributes E, Fand Hwill definitelybeapart of everycandidatekey.

Step-02:

 We will check if the essential attributes together can determine all remaining on-
essential attributes.
 To check,we find the closure of EFH.

So, we have-

{EFH }+={E , F,H}


={ E , F, G ,H } (UsingEF → G)
={ E , F, G ,H ,I, J} (UsingF→IJ)
={ E , F, G ,H ,I, J, K ,L} (UsingEH→ KL)
={ E , F, G ,H ,I, J, K ,L, M } (Using K→ M)
={ E , F, G,H,I, J,K,L, M, N}( UsingL→ N)
We conclude that EFH can determine all the attributes of the given relation. So, EFH is the
only possible candidate key of the relation.

*****************************ALLTHE BEST *******************************

Department of CSE, NRCM Page 80


UNIT– IV
Transaction Management: Transaction Concept, Transaction State, Implementation of
Atomicity and Durability, Concurrent Executions, Serializability, Recoverability,
Implementation of Isolation, Testing for serializability, Lock Based Protocols, Timestamp Based
Protocols, Validation- Based Protocols, Multiple Granularity, Recovery and Atomicity, Log–
Based Recovery, Recovery with Concurrent Transactions.

1. TRANSACTION
Definition: A transaction is a single logical unit consisting of one or more database access
operation.
Example: Withdrawing 1000 rupees fromATM.
The following set of operations are performed to withdraw1000 rupees from database

i. Read current balance from Database (Let say 5000 rupees)


ii. Deduct1000 from current balance (5000 – 1000 =4000) oneTransaction
iii. Update current balance in Database (4000rupees)

 Every transaction is executed as a single unit.


 If the database operations do not update the database but only retrieve data, this type of
transaction is called a read-only transaction.
 A successful transaction can change the database from one consistent state to another
Consistent state.
 DBMS transactions must satisfy ACID properties (atomic, consistent, isolatedand durable).

2. ACID PROPERTIES
ACID properties are used for maintaining the integrityof database during transaction processing.
ACID stands for Atomicity, Consistency, Isolation, and Durability.

 Atomicity: This property ensure that either all of the tasks of a transaction are performed or
none of them. In simple words it is referred as “all or nothing rule”.

Each transaction is said to be atomic if when one part of the transaction fails, the entire

Department of CSE, NRCM Page 81


transaction fails. When all parts of the transaction completed successfully, then the transaction
said to be success. (“all or nothing rule” )

Example:Transferring $100from accountAtoaccount B.


(Assumeinitially, accountAbalance =$400andaccountBbalance=700$.)
Transferring$100from account A to account B has two operations
a) Debiting100$fromA’s balance ($400-$100 =$300)
b) Crediting100$to B’sbalance ($700+$100= $800)
Let’s say first operation (a) passed successfully while second (b) failed, in this case A’s balance
would be300$ while B would behaving700$ instead of 800$. This is unacceptablein abanking
[Link]
process both the operations. TheAtomicitypropertyensuresthat.

ii. Consistency: The consistency property ensures that the database must be in consistent state
before and after the transaction. There must not be any possibility that some data is incorrectly
affected by the execution of a transaction.
For example, transfering funds from one account to another, the consistency property ensures
that the total values of funds in both the accounts is the same before and end of the transaction.
i.e., Assume initially, A balance = $400 and B balance = 700$.
ThetotalbalanceofA +B =1100$(Beforetransferring100$fromA toB) The total
balance of A + B = 1100$(After transferring 100$ from A to B)

iii. Isolation:Foreverypairoftransactions,oneofthetransactionsshouldnotstartexecution
before the other transaction execution completed, if they use some common data variable. That
is, if the transaction T1 is executing and using the data item X, then transaction T2 should not
start until the transaction T1 ends, if T2 also use same data item X.
Forexample,TransactionT1: Transfer 100$from accountAto accountB
TransactionT2:Transfer150$ fromaccountBtoaccountC
Assumeinitially,A balance=Bbalance=Cbalance=$1000

TransactionT1 TransactionT2
10:00AM ReadA’sbalance ($1000) ReadB’s balance ($1000)
10:01AM Abalance=ABalance –100$ (1000-100=900$) Bbalance=B Balance –150$ (1000-150= 850$)
10:02AM ReadB’s balance ($1000) ReadC’sbalance ($1000)
10:03AM Bbalance=B Balance+100$ (1000+100= 1100$) Cbalance=C Balance+150$ (1000+150=1150$)

Department of CSE, NRCM Page 82


10:04AM WriteA’sbalance (900$) WriteB’sbalance (850$)
10:05AM WriteB’sbalance (1100$) WriteC’sbalance (1150$)
10:06AM COMMIT COMMIT

After completion of Transaction T1andT2,Abalance=900$,Bbalance=1100$,Cbalance


=1150$. But B balance should be 950$. The B balance is wrong due to execution of T1 and T2
parallel and in both the transactions, Account B is common. The last write in account B is at
10:05 AM, so that B balance is 1100$ (write in account B at 10:04 AM is overwritten).

iv. Durability: Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery-management
[Link],assume
account A balance = 1000$. If A withdraw 100$ today, then the A balance = 900$. After two
days or a month, A balance should be 900$, if no other transactions done on A.

3. STATES OF TRANSACTION
A transaction goes through many different states throughout its [Link] states are called
as transaction states. They are:

Active State:
 Thisis the firststate in the lifecycleof a transaction.
 Oncethetransactionstarts executing,thenit issaidto beinactivestate.
 During this state it performs operations like READ and WRITE on some data items. All
the changes made by the transaction are now stored in the buffer in main memory. They
are not updated in database.

Department of CSE, NRCM Page 83


 From active state, a transaction can go into either a partially committed state or a failed
state.

Partially Committed State:


 When the transaction executes its last statement,then the transaction is said to be in
partially committed state.
 Still,all the changes made by the transaction are stored in the buffer in main memory,but
they are not updated in the database.
 From partially committed state, a transaction can go into one of two states, a committed
state or a failed state.

Committed State:
 After all the changes made by the transaction have been successfully updated in the
database, it enters into a committed state and the transaction is considered to be fully
committed.
 After a transaction has entered the committed state, it is not possible to roll back (undo)
the transaction. This is because the system is updated into a new consistent state and the
changes are made permanent.
 Theonlywaytoundothechangesisbycarryingoutanothertransactioncalled as compensating
transaction that performs the reverse operations.

Failed State:
 When a transaction is getting executed in the active state or partiallycommitted state and
some failure occurs due to which it becomes impossible to continue the execution, it
enters into a failed state.

Aborted State:
 After the transaction has failed and entered into a failed state, all the changes made by it
have to be undone.
 To undo the changes made by the transaction, it becomes necessary to rollback the
transaction.
 After the transaction has rolledback completely,it enters into an abortedstate.

Terminated State:
 Thisis thelast state inthelifecycleof a transaction.
 After entering the committed state or aborted state,the transaction finally enters into a
Department of CSE, NRCM Page 84
Terminated state where its lifecycle finally comes to an end.

4. TYPES OF SCHEDULES–SERIALIZABILITY
In DBMS,schedules may be classified as

i. Serial Schedules:
 All the transactions execute serially one after the other.
 When one transaction executes,no other transaction is allowed to execute.
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 A=A+500
Write(A) Write(A)
Read(B) COMMIT
B=B+100 Read(A)
Write(B) A=A-100
COMMIT Write(A)
Read(A) Read(B)
A=A+500 B=B+100
Write(A) Write(B)
COMMIT COMMIT

In schedule 1, after T1 completes its execution, transaction T2 executes. So, schedule-1 is a Serial
Schedule. Similarly, in schedule-2, after T2 completes its execution, transaction T1 executes. So,
schedule -2 is also an example of a Serial Schedule.

ii. Non-Serial Schedules:


 In non-serial schedules,multiple transactions execute concurrently.
 Operationsofall/someofthetransactionsareinter-leaved ormixedwitheachother.
 Somenon-serialschedulesmayleadtoinconsistencyofthedatabase andmayproduce

Department of CSE, NRCM Page 85


wrong results.

Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 Read(A)
Write(A) A=A-100
Read(A) Write(A)
A=A+500 A=A+500
Read(B) Read(B)
B=B+100 B=B+100
Write(B) Write(B)
COMMIT COMMIT
Write(A) Write(A)
COMMIT COMMIT

In schedule-1 and schedule-2, the two transactions T1andT2 executing [Link] operations of
T1 and T2 are interleaved. So, these schedules are Non-Serial Schedule.

iii. SerializableSchedules:

 A non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’


transactions, then it is called as a serializable schedule.
 In other words, the results produced bythe transactions in a serial schedule areequal to
theresultproducedbythesametransactionsinsomenon-serialschedule,thenthatnon- serial
schedule is called as serializability.
 Serializableschedules behaveexactlysameas serial schedules.
 Even though, Serial Schedule and Serializable Schedule produce same result, there are
some differences they are
Serial Schedules SerializableSchedules
Concurrency is not allowed. Thus, all the
[Link],multiple
transactionsnecessarily execute serially one
transactions can execute concurrently.
after the other.
It leads to less resource utilization and CPU Itimprovesbothresourceutilizationand CPU
throughput. throughput.
Serial Schedules are less efficient as SerializableSchedulesarealwaysbetter than
compared to serializable schedules. serial schedules.

Serializability is mainly of two types. They are:


 Conflict Serializability
Department of CSE, NRCM Page 86
 View Serializability
Conflict Serializability:If a given non-serial schedule can be converted into a serial schedule by
swapping its non-conflicting operations, then it is called as a conflict serializable schedule.
Two operations are called as conflicting operations if all the following conditions hold true
(1) Both the operations belong to different transactions
(2) Both the operations are on the same data item
(3) Atleast one of the two operations is a write operation

Schedule–1 Schedule–2 Schedule- 3 Schedule- 4


T1 T2 T1 T2 T1 T2 T1 T2
Read(A) Read(A) Write(B) Write(B)
Read(A) Write(A) Read(A) Write(B)

InSchedule-1,onlyrule(1)&(2)aretrue,butrule(3)[Link],theoperationsarenotconflict. In Schedule -2, rule


(1), (2) & (3) are true. So, the operations are conflict.
InSchedule-3,onlyrule(1)&(3)aretrue,butrule(2)[Link],theoperationsarenotconflict. In Schedule -4, rule
(1), (2) & (3) are true. So, the operations are conflict.

Testing of Conflict Serializability:Precedence Graph is used to test the Conflict Serializability of


a schedule. The algorithm to draw precedence graph is
(1) Draw anodeforeachtransactioninScheduleS.
(2) IfTa readsX valuewritten byT b, thendraw arrowfrom T b → Ta.
(3) If Tb writesXvalue afterithasbeenreadbyTa,thendraw arrowfromTa→Tb.
(4) IfTawritesXafterTb writesX,thendraw arrow from Tb →Ta.
Iftheprecedencegraphhasnocycle,[Link] precedence
graph contains a cycle, then S is not conflict serializable.

Problem-01:Checkwhetherthe givenschedule Sisconflictserializableornot.


S:R1(A),R2(A),R1(B),R2(B),R3(B),W1(A),W2(B)
Solution:
Giventhat S: R1(A), R2(A) ,R1(B) , R2(B),R3(B) ,W1(A) ,W2(B).
Theschedule fortheaboveoperations is

Department of CSE, NRCM Page 87


Schedule-1
T1 T2 T3
Read(A)
Read(A)
Read(B)
Read(B)
Read(B)
Write(A)
Write(B)
List all the conflicting operations and determine the dependency between the transactions
(Thumbruletofindconflictoperations:ForeachWrite(X)inTa,makeapairwitheachRead(X)andWrite(X)inTb.
Theorderisimportantineachpairi.e.,forexample,Read after WriteonX orwriteafterreadonX in thegiven schedule.)

 R2(A), W1(A) (T2→T1)


 R1(B), W2(B) (T1→T2)
 R3(B), W2(B) (T3→T2)
Drawtheprecedence graph:

There exists a cycle in the above [Link],the schedule S is not conflict serializable.

Problem-02:Check whether the given schedule S is conflict serializable schedule.

Schedule– S
T1 T2 T3 T4
Read(X)
Write(X)
COMMIT
Write(X)
COMMIT
Write(Y)
Read(Z)
COMMIT
Read(X)
Read(Y)
COMMIT

Solution: List all the conflicting operations to determine the dependency between transactions.

R2(X),W3(X) (T2→T3)
W3(X),W1(X) (T3→T1)
W3(X),R4(X) (T3→T4)
R2(X),W1(X) (T2→T1)
W1(X),R4(X) (T1→T4)

Department of CSE, NRCM Page 88


W2(Y), R4(Y) (T2→T4)

Draw the precedence graph:

There exists nocycle in the precedence [Link], the schedule S is conflict serializable.

View Serializability:Two schedules S1andS2 are said to be view equivalent if both of them
satisfy the following three rules:

(1) InitialRead:The first read operation on each data item inboth the schedule must be same.
 For each data itemX, If first read onXisdonebytransactionTainscheduleS1,thenin schedule2 also
the first read on X must be done by transaction T a only.
(2) UpdatedRead:Itshouldbesameinboththe schedules.
 IfRead(X)ofTafollowedby Write(X)ofT bin scheduleS1,thenin scheduleS2also,Read(X) of T a must
follow Write(X) of Tb ..
(3) Finalwrite:Thefinal writeoperationon each dataitem inboth theschedulemust besame.
 ForeachdataitemX,ifXhasbeenupdatedatlastbytransaction T iinscheduleS1,thenin schedule S2 also,
X must be updated at last by transaction T i.

ViewSerializabilityDefinition:If agivenscheduleisviewequivalenttosomeserial
schedule,thenitiscalled asaviewserializableschedule.
Note:Everyconflictserializablescheduleis also viewserializableschedulebut not vice-versa

Problem03:Checkwhetherthe givenscheduleSisviewserializableornot
Schedule–1
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)

Solution:

Department of CSE, NRCM Page 89


F
o
r
t
h
e

S
c
h
e
d
u
l
e
-
1
(
S
1
)

S
c
h
e
d
u
l
e
-
2
(
S
2
)

Department of CSE, NRCM Page 90


T1 T2 T1 T2
Read(A) Read(A)
Write(A) Write(A)
Read(A) Read(B)
Write(A) Write(B)
Read(B) Read(A)
Write(B) Write(A)
Read(B) Read(B)
Write(B) Write(B)
Nowlet us check whether thethreerules of view-equivalent satisfyor not.
Schedule-1(S1) Schedule-2(S2)
T1 T2 T1 T2
1 1
Read(A) Read(A)
Write(A) Write(A)
2 Read(A) Read(B) 1
3
Write(A) Write(B)
Read(B) 1 2 Read(A) 3
Write(B) Write(A)
2 Read(B) 3
2 Read(B) 3
Write(B) Write(B)
Rule1: Initial Read
First Read(A)isbyT1inS1andinS2alsothefirstRead(A) isbyT1only.
First Read(B)isbyT1in S1andinS2alsothef irstRead(B) isbyT1only.
Rule2:UpdatedRead
Write(A)ofT1isreadbyT2inS1 andinS2alsoWrite(A)ofT1isreadbyT2 Write(A)of
T1 is read byT2 in S1and in S2alsoWrite(A)of T1 is read byT2
Rule3:Final Write
Thefinal Write(A)is byT2in S1andin S2 alsothefinal Write(A)is byT2 only
The final Write(B) is by T2 in S1 and in S2 also the final Write(B) is by T2 only
Conclusion: Hence, all the threerules are satisfied in this example, which means Schedule S1and
S2 are view equivalent. Also, it is proved that schedule S2 is the serial schedule of S1. Thus we
can say that the S1 schedule is a view serializable schedule.

Note:Other way of solving it is, if weare able to prove that S1 is conflict serializable, then S1 isalso view serializable. (Refer
conflict serializable problems. Every conflict serializable schedule is also view serializable but not vice-versa.)

5. IMPLEMENTATION OF ATOMICITY AND DURABILITY


The recovery-management component of a DBMS supports atomicityand durabilitybya variety
of schemes. The simplest scheme to implement it is Shadow copy.

Department of CSE, NRCM Page 91


Shadowcopy: Inshadow-copyscheme,
 Atransactionthatwantstoupdatethedatabasefirstcreatesacompletecopyofthe database.
 Allupdates aredoneonthenewdatabasecopy,leavingtheoriginal copy,untouched.
 If at any point the transaction has to be aborted, the system simply deletes the new copy.
The old copy of the database has not been affected.

 If the transaction complete successfully, then the database system updates the pointer db-
pointer to point to the new copy of the database; the new copy then becomes the original
copy of the database. The old copy of the database is then deleted. Figure below depicts
the scheme, showing the database state before and after the update.

Figure:Shadowcopytechniqueforatomicityand durability

6. RECOVERABILITY
During execution, if any of the transaction in a schedule is aborted, then this may leads
the database into inconsistence state. If anything goes wrong, then the completed operations in
the schedule needs to be undone. Sometimes, these undone operations may not possible. The
recoverability of schedule depends on undone operations.

If a transaction reads a data value that is updated by an uncommitted transaction,


thenthis type of read is called as a dirty read.

Irrecoverable Schedule:In a schedule, if a transaction Ta performs a dirty read operation


fromothertransactionTbandTacommitsbeforeTbthensuchascheduleisknownas an Irrecoverable
Schedule.
Example:Considerthefollowingschedule
T1 T2
Read(A)
Write(A)
|

Department of CSE, NRCM Page 92


| Read(A) //DirtyRead
| Write(A)
| COMMIT
|
ROLLBACK
Here,
 T2performsa dirtyreadoperation.
 T2commitsbeforeT1.
 T1failslaterandrollbacks.
 Thevalue that T2 read now stands to beincorrect.
 T2cannot recoversinceithas alreadycommitted.

Recoverable Schedules:In a schedule, if a transaction Taperforms a dirty read operation


from other transaction Tband Tacommit operation delayed till Tbcommit, then such a schedule is
known as an Irrecoverable Schedule.

Example:Considerthefollowingschedule-

T1 T2
Read(A)
Write(A)
|
| Read(A) //DirtyRead
| Write(A)
|
COMMIT

COMMIT//Delayed

Here,
 T2performsa dirtyreadoperation.
 Thecommitoperationof T2is delayedtillT1commitsorroll backs.
 T1commitslater.
 T2isnow allowedto commit.
 Incase,T1wouldhavefailed,T2hasachancetorecoverbyrollingback.

Checking Whether a Schedule is Recoverable or


Irrecoverable:Check if there exists any dirty read operation.
 Iftheredoes not exist anydirtyread operation, then theschedule is surelyrecoverable.
 Ifthereexists anydirtyread operation, then
If the commit operation of thetransaction performingthe dirtyread occurs before
thecommitorabortoperationofthetransactionwhichupdatedthevalue,thenthe schedule is
Department of CSE, NRCM Page 93
irrecoverable.
If the commit operation of the transaction performing the dirty read is delayed till the commit
or abort operation of the transaction which updated the value, then the schedule is recoverable.

7. IMPLEMENTATION OF ISOLATION
Isolation determines how transactions integrityis visible to otherusers and systems. It means
that a transaction should take place in a system in such a way that it is the only one transaction
that is accessing the resources in a database system.

Isolation level defines the degree to which a transaction must be isolated from the data
modifications made by any other transactions in the database system. The phenomena’s used to
define levels of isolation are:

a) Dirty Read
b) Non-repeatableRead
c) PhantomRead

Dirty Read: If a transaction reads a data value updated by an uncommitted transaction, then
this type of read is called as dirty read.

T1 T2
Read(A)
Write(A)
|
| Read(A) //DirtyRead
| Write(A)
| COMMIT
|
ROLLBACK

As T1 aborted, the results produced by T2 become wrong. This is because T2 read A (Dirty
Read) which is updated by T1.
Non-RepeatableRead:Non repeatableread occurs when atransaction read same data value
twice and get a different value each time. It happens when a transaction reads once before and
once after committed UPDATES from another transaction.

Department of CSE, NRCM Page 94


TableinDatabase
ReadA=10 T1 T2
Read(A)
A
WriteA=20
Write(A)
ReadA=20
Read(A)
First,T1readsdataitemAandgetA=10
Next, T2 writes data item A as A = 20
Last,T1readsdataitemAandgetA=20

Other example for Non-repeatable read:


Table:STUDENT_DATAbeforeT2 Table:STUDENT_DATAafter T2
A B C A B C
100 5 10 100 5 15
101 5 20 101 5 20
102 6 30 102 6 30

T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20)=30
T2: UPDATE STUDENT_DATA SET C = 15 WHEREA=100; Answer,inFirstrowCchangesto15 T1:
SELECT SUM(C ) FROM STUDENT_DATA WHEREB=5; Answer is (15+20) = 35

Phantom reads: Phantom reads occurs when atransaction read same datavalue twiceand get a
different value each time. It happens when a transaction reads once before and once after
committed INSERTS and/or DELETES from another transaction.
Non-repeatableread Phantomread
WhenT1performsecondread,thereisno WhenT1performsecondread,thenoofrows
changeinnoof rowsinthegiven table eitherincreaseordecrease.
T2performUPDATEoperationonthe T2 perform INSERT and/or DELETE
giventable operationonthegiventable

ExampleforPhantomread:
Table:STUDENT_DATAbeforeT2 Table:STUDENT_DATAafter T2
A B C A B C
100 5 10 100 5 10
101 5 20 101 5 20
102 6 30 102 6 30
103 5 25

T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20)=30
T2:INSERT INTOSTUDENT_DATAVALUES(103,5,25); Answer,inFirstrowCchangesto15
Department of CSE, NRCM Page 95
T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20+25)=55

Basedonthesethreephenomena, SQLdefine fourisolation [Link]:


(1) Read uncommitted: This is the lowest level of [Link] this level, one transaction
mayreadthedataitemmodifiedbyothertransactionwhichisnotcommitted. Itmeandirty read is
allowed. In this level, transactions are not isolated from each other.

(2) Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus, it does not allow dirty read. The transaction holds a read/write lock
on the data object, and thus prevents other transactions from reading, updating or deleting it.
(3) Repeatable Read: This is the most restrictive isolation level. The transaction holds read
locks on all rows it references and writes locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, consequently it avoids non-
repeatable read. So other transactions cannot read, update or delete these data items.
(4) Serializable: This is the highest isolation level. A serializable execution is guaranteed to
be a serial schedule. Serializable execution is defined to be an execution of operations in
which concurrently executing transactions appears to be serially executing.
The table given below clearly depicts the relationship between isolation levels and the read
phenomena and locks.
Isolation Level DirtyRead Non-repeatableread PhantomRead
ReadUncommitted Mayoccur Mayoccur Mayoccur
ReadCommitted Don’toccur Mayoccur Mayoccur
RepeatableRead Don’toccur Don’toccur Mayoccur
Serializable Don’toccur Don’toccur Don’toccur
Fromtheabovetable, itisclear thatserializableisolationlevel isbetterthan others.

8. CONCURRENCY CONTROL
 Concurrencyis theabilityof adatabasetoexecutemultiple transactions simultaneously.
 Concurrency controlisamechanismtomanagethesimultaneously executingmultiple
transactions such that no transaction interfere with other transaction.
 Executingmultipletransactionsconcurrentlyimprovesthesystem performance.
 Concurrencycontrolincreasesthethroughputand reduces waitingtimeoftransactions.
 If Concurrency Control is not done, then it may leads to problems like lost updates, dirty
read, non-repeatable read, phantom read etc. (Refer section7 for more details)
 Lost Updates: It occur when two transactions update same data item at thesame time. In
Department of CSE, NRCM Page 96
this the first write is lost and only the second write is visible.
ConcurrencycontrolProtocols:
Theconcurrencycan be controlled with thehelpofthe followingProtocols
(1) Lock-BasedProtocol
(2) Timestamp-BasedProtocol
(3) Validation-BasedProtocol
9. LOCK-BASED PROTOCOL
 Lock assures that one transaction should not retrieve or update a record which another
transaction is updating.
 For example, traffic at junction, there aresignals which indicate stop and go. When one side
signal is green (vehicles allowed passing), then other side signals are red (locked. Vehicles
not allowed passing). Similarly, in database transaction when one transaction operations are
under execution, the other transactions are locked.
 If at a junction, green signal is given to more than one side, then there may be chances of
accidents. Similarly, in database transactions, if the lockingis not done properly, then it will
display the inconsistent and corrupt data.
Therearetwolock modes: (1).SharedLock (2).ExclusiveLock
[Link],thenTi
can only read A but not write into A. Shared lock is requested using lock-S [Link]
Locksare represented by X. If a transaction Tiapply exclusive lock on data item
A,[Link]-X instruction.

LockCompatibilityMatrix:
 LockCompatibilityMatrixcontrolswhethermultipletransactionscanacquirelockson the
same resource at the same time.
TransactionTiapplied
Shared Exclusive
TransactionTj Shared √ X
request for Exclusive X X
 If a transaction Tiapplied shared lock on data item A, then Tjcan also be allowed to apply
shared lock on A.
 If a transaction Tiapplied shared lock on data item A, then Tjis not allowed to apply
exclusive lock on A.
 If a transaction Tiapplied exclusive lock on data item A, then Tjis not allowed to apply
shared lock on A.
 If a transaction Tiapplied exclusive lock on data item A, then Tjis not allowed to apply
Department of CSE, NRCM Page 97
exclusive lock on it.
 Any number of transactions can hold shared locks on a data item, but if any transaction
holdsanexclusivelockonadata item,thenothertransactions arenot allowedtoholdany lock on
that data [Link] a transaction wants to read a data item, it should applyshared
lock and when a transaction wants to write it should apply exclusive lock. If the lock is
not applied, then the transaction is not allowed to perform the operation.

There are four types of lock protocols [Link] are:

(1) Simplistic lockprotocol

 Itisthesimplestlockingprotocol.
 Itconsiderseachread/writeoperationofatransactionasindividual.
 Itallowstransactionstoperformwrite/readoperationonadataitemonlyafterobtaining a lock
on that data item.
 Transactionsunlock thedataitem immediatelyaftercompletingthewrite/read operation.
 When atransaction needs to perform manyread and writeoperations, foreach operation
lockisappliedbeforeperformingitandreleasethelockimmediatelyaftercompletionof the
operation.
(2) Pre-claiming LockProtocol
 Inpre-claimingLockProtocol,foreachtransactionalistispreparedconsistingofthe data items
and type of lock required on each of the data item.
 Beforeinitiatinganexecutionofthetransaction,itrequestsDBMStoissuealltherequired locks
as per the list.
 Ifallthelocksare [Link] transaction
is completed then it releases all the lock.
 If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.
Nooflocks

Beginof Endof
Transaction Transaction

Department of CSE, NRCM Page 98


(3) Two-phase locking(2PL)protocol
 Everytransaction execution starts byacquiringfew locks or zero locks. Duringexecution it
acquire all other required locks one after the other.

 When a transaction releases any of the acquired locks then it cannot acquire any more
newlocks. But,itcanonlyreleasetheacquiredlocksoneaftertheotherduringremaining
execution of that transaction.
Nooflocks

GrowingPhase ShrinkingPhase

Beginof Endof
Transaction Transaction

TheTwoPhaseLocking(2PL)hastwo [Link]:
Growing phase:Inthegrowingphase,anewlockonthedataitemmaybeacquiredbythe transaction, but
none can be released.(Only get new locks but no release of locks).

Shrinking phase:Intheshrinkingphase,existinglockheldbythetransactionmaybereleased, but no


new locks can be acquired. (Only release locks but no more getting new locks).

Example:
Time T1 T2
0 LOCK-S(A)
1 LOCK-S(A)
2 Read(A)
3 Read(A)
4 LOCK-X(B)
5 --
6 Read(B)
7 B=B +100
8 Write(B)
9 UNLOCK(A)
10 LOCK-X(C)
11 UNLOCK(B) --
12 Read(C)
13 C=C+500
14 Write(C)
15 COMMIT
16 UNLOCK(A)
17 UNLOCK(C)
18 COMMIT

Thefollowingwayshowshow unlockingandlockingworkwith2-PL.
Department of CSE, NRCM Page 99
TransactionT1: TransactionT2:
 Growingphase:fromstep1-5 (Afterfirstlockonwards)  Growingphase:fromstep2-11 (Afterfirstlockonwards)
 Shrinkingphase:fromstep10-12(Afterfirstunlockonwards)  Shrinkingphase:fromstep17-18(Afterfirstunlockonwards)
 Lockpoint:at5 (Nomorenewlocks)  Lockpoint:at11 (Nomorenewlocks)

Department of CSE, NRCM Page 100


(4) Strict Two-phase locking(Strict-2PL)protocol
 The firstphase of Strict-2PL issimilar [Link] the firstphase,after acquiring allthe locks,
the transaction continues to execute normally.
 The onlydifference between 2PL and strict 2PL is that Strict-2PL does not release a lock
after using it.
 Strict-2PLwaitsuntilthewholetransactionto commit,andthenitreleasesallthelocksat a time.
 Strict-2PLprotocol doesnot haveshrinkingphaseoflockrelease.
Nooflocks

GrowingPhase

BeginofTra Endof
nsaction Transaction

Strict-2PLdoes not havecascadingabort as 2PLdoes.

10. TIME STAMP BASED PROTOCOL


 Atimestampisissued toeachtransactionwhen itentersintothe system.
 Ituseseithersystemtimeorlogicalcounter asatimestamp.
 Itis most commonlyused concurrencyprotocol.
 Thetimestampoftransaction Tis denotedas TS(T).
 The system order the transactions based on their arrival time. For example, let the arrival
timesoftransactionsT1,T2andT3be9:00AM,9:01AMand9:[Link] TS(T1)
< TS(T2) < TS(T3).(9:00AM < 9:01AM < 9:02AM)
 Byusingtimestamp, thesystem preparesthe serializabilityorder.i.e., T1→T2→T3
 Theread timestampofdata item Xis denotedbyR–timestamp(X).
 R–timestamp(X):Itisthe time stamp of theyoungesttransaction that performedread
operation on X.
T1:Read(X)
X T2: Read(X) R-Timestamp(X)=TS(T3)

T3:Read(X)

Department of CSE, NRCM Page 101


 Thewritetimestampofdata item X is denoted byW–timestamp(X).
 W–timestamp(X): It is the time stamp of the youngest transaction that performed write
operation on X.
T1:Write(X)

X R-Timestamp(X)=TS(T2)
T2:Write(X)

There aremainlytwo Timestamp OrderingAlgorithmsin DBMS. Theyare:


 BasicTimestampOrdering
 ThomasWrite rule

(1). BasicTimestampOrdering

 CheckthefollowingconditionwheneveratransactionTiissuesaRead(X)operation:
o IfW_timestamp(X)>TS(Ti)thentheoperationisrejected.
o IfW_timestamp(X)<= TS(Ti)thentheoperationisexecuted.
(ReadisnotallowedbyTi, ifanyyoungertransactionsthanTiwriteX)

 Checkthefollowingconditionwhenever atransaction Tiissues aWrite(X)operation:


o IfTS(Ti)<R_timestamp(X)thentheoperationisrejected. (WriteisnotallowedbyTi,if any
younger transactions than Ti read X)

o If TS(Ti) < W_ timestamp(X) then the operation is rejected and Ti is rolled back
otherwisetheoperationisexecuted.(WriteisnotallowedbyTi,ifanyyoungertransactionsthan Ti write
X and also Ti should be rolled back and restarted later)

(2) Thomas'sWriteRule
ThomasWriteRuleisatimestamp-basedconcurrencycontrolprotocolwhichignores outdated writes.
It follows the following steps:

(i). IfR_TS(X)>TS(Ta),thenabortandrollbackTaandrejectthe operation.


Transaction:T1 Transaction:T2 VariableA
Arrival=9:00 AM Arrival=9:02AM InitialA=100
TS(T1)=9:00 AM TS(T1)=9:02AM
| |
| Read(A)(A=100) A=100 (R_TS(A)=9:02AM)
| | :
Write(A) (A=200) | (A=100) A=200100

RejectandRollbackT1

Department of CSE, NRCM Page 102


(ii). If W_TS(X)>TS(Ta), thendon’texecutetheWriteOperation ofTabut continueTa
[Link] OutdatedorObsoleteWrites.
Transaction:T1 Transaction:T2 VariableA
Arrival=9:00 AM Arrival=9:02AM InitialA=100
TS(T1)=9:00 AM TS(T1)=9:02AM
| |
| Write(A)(A=400) A=400 (W_TS(A)=9:02AM)
| | :
Write(A) (A=500) | (A=400) A=500 (Outdatedwrite)
|
RejectbutcontinueT1 |
| |

(iii). Iftheconditionin(i)or(ii)isnotsatisfied,thenexecuteWrite(X)ofTaandset W_TS(X) to


TS(Ta).
Outdatedwritesarerejectedbutthetransactioniscontinuedin ThomasWriteRulebutinBasic TO
protocol will reject write operation and terminate such a Transaction.

11. VALIDATION BASED PROTOCOL


In this technique, no concurrency control checking is done while the transaction is under
execution. After transaction execution is completed, then only whether concurrency violated or
not is [Link] is based on timestamp based protocol. Validation Based Protocol has three
phases:
1. Readphase: Inthisphase,thetransaction Tareadthevalueofvariousdata items thatare
required by Taand stores them in temporary local variables. It can perform all the write
operations on temporary variables without an update to the actual database.
1. Validation phase: After Transaction Taexecution completed, Taperform a validation test
to determine whether it can copy the temporary local variable values to actual database
without causing a violation of serializability.
2. Write phase: If the validation of the transaction is successful (valid), then the temporary
results are written to the database. Otherwise the temporary local variable values of Tais
ignored and Tais rolled back.

To perform the validation test, we need to know when the various phases of transaction Tatook
place. We shall therefore associate three different timestamps with transaction Ta.
(i). Start(Ta): thetime whenTa,started its execution.
(ii). Validation (Ta): the time when Tafinished its execution and started its
validation phase.
Department of CSE, NRCM Page 103
(iii). Finish (Ta):thetime whenTafinished itswrite phase.
Theserializabilityorderisdetermined bychangingthetimestampof Tas TS(T)=Validation(T).
Hence the serializability is determined at the validation process and cannot be decided in
advance. Therefore it ensures greater degree of concurrency while executing the transactions.

12. MULTIPLE GRANULARITY


Thesizeofdataitemsis oftencalledthe [Link] levels in
the DBMS. They are:
 Database
 Table
 Record/row
 Cell/field value

TableA TableB TableD


RecordR2
RecordR4
TableE
TableC
Cell/FieldValue

Cell/FieldValue

A database contains multiple tables. Each table contains multiple records. Each record contains
multiple field values. It is shown in the above figure. For example, consider Table D and Record
R2. These two are not mutually exclusive. R2 is a part of D. So granularity means differentlevels
ofdatawhere as smallerlevels arenestedinsidethehigherlevels. Insidedatabase wehave tables.
Inside table we have records. Inside record we have field values. This can be represented with a
tree as shown below.

DB

A B C D Tables

... .. . ... . .. Records


r1 r2 r3 r4

d1 d2 d3 d4 . .. Data values
... .. . ...

Department of CSE, NRCM Page 104


A lock can be applied at a node, if and only if there does not exist any locks on the decedents
(childs and grand childs) of that node. Otherwise lock cannot be applied. If lock is applied on
table A, it implies that the lack is also applicable to sub-tree from node A. If lock is applied on
database (at root node), it implies the lack is also applicable to all the nodes in the tree.

The larger the object size on which lock is applied, the lower the degree of concurrency
permitted. Onthe other hand, thesmallerthe object sizeon which lockis applied, thesystem has to
maintain larger number of locks. More locks cause a higher overhead and needs more disk space.
So, what is the best object size on which lock can be applied? It depends on the types of
transactions involved. If a typical transaction accesses data values from a record, it is
advantageous to have the lock to that one record. On the other hand, if a transaction typically
accesses many records in the same table, it may be better to have lock at that table.

Locking at higher levels needs lock details at lower levels. This information is provided by
additional types of locks called intention locks. The idea behind intention locks is for a
transaction to indicate, along the path from the root to the desired node, what type of lock(shared
or exclusive) it will require from one of the node’s descendants. There are three types of
intention locks:
(1) Intention-shared(IS):Itindicatesthatoneormoresharedlockswillberequestedon some
descendant node(s).
(2) Intention-exclusive (IX): It indicates that one or more exclusive locks will be requested
on some descendant node(s).
(3) Shared-intention-exclusive (SIX): It indicates that the current node is locked in shared
modebutthatoneormoreexclusivelockswillberequestedonsomedescendantnode(s).
The compatibility table of the three intention locks, the shared and exclusive locks, is shown in
Figure.
Mode IS IX S SIX X
IS Yes Yes Yes Yes No
IX Yes Yes No No No
S Yes No Yes No No
SIX Yes No No No No
X No No No No No

It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
Department of CSE, NRCM Page 105
 TransactionT1shouldfollowthelock-compatibilitymatrix.
 TransactionT1 firstlylockstheroot [Link] lockit in anymode.
 IfT1currentlyhastheparentofthenodelockedineitherIXorISmode,thenthe transaction T1
will lock a node in S or IS mode only.
 IfT1currentlyhastheparentofthenodelockedineitherIXorSIXmodes,thenthe transaction T1
will lock a node in X, SIX, or IX mode only.
 IfT1hasnotpreviouslyunlockedanynodeonly,thentheTransactionT1canlocka node.
 If T1currentlyhasnoneof thechildrenofthenode-lockedonly,thenTransactionT1will unlock
a node.
Note:Inmultiple-granularity,thelocksareacquiredintop-downorder,andlocksmustbe released in
bottom-up order.

13. RECOVERY AND ATOMICITY


Databaseneedstoberecovered, whenthefollowingfailuresoccur.
(1) Transactionfailure

(2) System crash

(3) Disk failure

 Transaction failure: During transaction execution, if it cannot proceed further, then it needs
to abort. This is known as transaction failure. A single transaction failure may influencemany
transactions or processes. The reasons for transaction failure are:
 Logicalerrors:Itoccurs duetosomecodeerrororaninternalconditionerror.
 Systemerror: Itoccurs whentheDBMSitselfterminatesanactivetransactiondueto
deadlock or resource unavailability.
 System crash: The system may crash due to the external factors such as interruptions in
power supply, hardware or software failure. Example:Operating System errors.
 Disk failure: In early days of technology evolution, hard-disk drives or storage drives usedto
fail frequently. Disk failure occurs due to the formation of bad sectors, disk head crash,un-
reachable to the disk or any other failure which destroys all or part of disk storage.

When a system crashes, it may have many transactions being executed and many files may be
opened for them. When a DBMS recovers from a crash, it must maintain the following:
 Itmustcheck thestatesofall thetransactionsthat werebeing executed.

Department of CSE, NRCM Page 106


 Few transactions may be within the middle of some operation; the DBMS should
makesure the atomicity of the transaction during this case.
 Itmustcheck foreach transactionwhetheritsexecutionacceptedorto berolled back.
 Notransactionisallowed tobein aninconsistentstate.

The following techniques facilitate a DBMS in recovering as well as maintaining the


atomicityof a transaction:
 Logbased recovery
 Check point
 Shadowpaging

14. LOGBASED RECOVERY


The log file contains information about the start and end of each transaction and any updates
done by the transaction on database items. The log file is saved onto some stable storage so that if
any failure occurs, then it can be used to recover the database. The results of all the operation of
transaction are first saved in the log and latter updated on the database. The log information is used to
recover from system failures.

[Link].
 WhenatransactionTistartsexecution, thelogstores:<Ti, Start>
 WhenatransactionTimodifiesanitemXfromoldvalueV1tonewvalueV2,thelog stores:<Ti,
X, V1, V2>
 WhenthetransactionTiexecutioncompleted, thelogstores: <Ti,commit>
 WhenthetransactionTiexecution aborted,the logstores:<Ti,abort>
RecoveryusingLogrecords
Whenthesystemiscrashed,thentheDBMSchecksthelogtofindwhichtransactionsneedsto be undo
and which need to be redo. There are two major techniques for recovery from non- catastrophic
transaction failures. They are deferred updates and immediate updates.
i. Deferred database modification:In this technique, all the changes done by the
transaction are saved in the system log without modifying the actual [Link] the
transaction committed, then onlythe changes are updated in the database. If a transaction
failsbeforereachingitscommitpoint,ithasnotchangedthedatabaseinanywayso

Department of CSE, NRCM Page 107


UNDO is not needed. It may be necessary to REDO the effect of the operations that are
recorded in the system log, because their effect not yet written in the database.
ii. Immediate database modification: In this technique, the database is modified
immediately after every operation. However, these operations are recorded in the log file
before they are applied to the database, making recovery still possible. If a transaction
fails to reach its commit point, the effect of its operation must be undone i.e. the
transaction must be rolled back hence we require both undo and redo.

15. CHECKPOINT–(Recovery with Concurrent Transactions)


 Inordertorecoverdatabasefromsystemcrashes, allthetransactionoperationsarefirstsaved in the
log file and latter updated on the database. The log file is saved in remote location so that it
can be used to recover the database. As time passes, the entries in the log file maygrow too
[Link] the time of recovery, searching the entire log file is very time consuming and an
inefficient method. To ease this situation, the concept of 'checkpoint' is introduced.
 Checkpoint is a mechanism where all the previous log entries are removed from the log file
and their results are updated in the database. The checkpoint is like a bookmark.
 During the execution of the transactions, after executing few operations, a check point is
created and saved in the log file. Now the log file contains only entries after checkpoint
related to new step of transaction till next checkpoint and so on.
 Thecheckpoint is used to declare apoint before which theDBMS was in theconsistent state,
and all transactions were committed.
RecoveryusingCheckpoint
Inthe followingmanner, arecoverysystemrecovers thedatabasefromthisfailure:
Checkpoint Failure

T1

T2
T3
T4
Time
 Therecoverysystem reads thelogs backwards from theend to thelast checkpoint.
 Itmaintainstwolists,anundo-listandaredo-list.

Department of CSE, NRCM Page 108


 If the recovery system sees a log with <Ti, Start> and <Ti, Commit> or just <Ti,
Commit>, it puts the transaction Tiin the redo-list.
Forexample: Inthelogfile,transactionT1haveonly<Ti,commit>andthetransactions T2 and
T3 have <Ti, Start> and <Ti, Commit>. Therefore T1, T2 and T3 transaction are added to
the redo list.
 If the recovery system finds a log with <Ti, Start> but no commit or abort, then it putsthe
transaction Tiin undo-list.
For example: Transaction T4 will have <Ti, Start>. So T4 will be put into undo listsince
this transaction is not yet complete and failed in the middle.
 Allthetransactionsin theundo-list arethen undoneand theirlogs areremoved.
 All the transactions in the redo-list and their previous logs are removed and then redone
before saving their logs.

16. ARIES ALGORITHM (Algorithm for Recovery and Isolation Exploiting Semantics)
AlgorithmforRecoveryandIsolationExploitingSemantics(ARIES)isoneofthelogbased recovery
method. It uses the Write Ahead Log (WAL) protocol.

Write-ahead logging (WAL): In computer science, write-ahead logging (WAL) is a


family of techniques for providing atomicity and durability (two of the ACID properties) in
database systems. The change done by the transactions are first recorded in the log file and
written to stable storage at remote location, before the changes are written to the database.
Therecoveryprocess ofARIES algorithmhas 3 phases. Theyare:
(1) Analysisphase
(2) RedoPhase
(3) Undo Phase

Startofoldestin-progress SmallestLSNassociated
transaction with dirty page Lastcheckpoint EndofLog Log Time

AnalysisPhase
Redo Phase

Undophase

Department of CSE, NRCM Page 109


(1) Analysis phase: The recovery subsystem scans the log file forward from the last checkpoint
up to the end. The purpose of the scan is to obtain information about the following:
 Thestartingpoint fromwheretheredopassshould start.
 Thelistof transactionstoberolled backin theundo pass.
 Thelistof dirtypages.
(2) Redo: In this phase, the log file is read forward starting from smallest LSN of a dirtypage to
the end and each update found in the log file is redone. The purpose of this redo pass is to
repeat the historyto reconstruct the database to the state present at the time of system failure.
(3) Undo:[Link] ‘loser
transaction’ updates are rolled back in reverse chronological order. If any of the aborted
transaction operations are undone, then skip them, no need to undo once again.

17. DATABASE BACKUP


The process of creating duplicate copy of database is called database backup. Backup
helps to recover against failure of media, hardware or software failures or any other kind of
failures that cause a serious data crash.

Database copy is created and stored in the remote area with the help of network. This
database is periodically updated with the current database so that it will be in sync with data and
other details. This remote database can be updated manually called offline backup. It can be
backed up online where the data is updated at current and remote database simultaneously. Inthis
case, as soon as there is a failure of current database, system automatically switches to the
remote database and starts functioning. The user will not know that there was a failure.

Primarysite Network Backup

Department of CSE, NRCM Page 110


Someofthe backuptechniques areas follows:

 Full backup or Normal backup: Full backup is also known as Normal backup. In this,
an exact duplicate copy of the original database is created and stored every time the
backup made. The advantage of this type of backup is that restoring the lost data is very
fast as compared to other. The disadvantage of this method is that it takes more time to
backup.
 Incremental Backup:Instead of backup entire database every time, backup only the
files that have been updated since the last full [Link] this at least weekly once
normal backup has to be done. While incremental database backups do run faster, the
recovery process is a bit more complicated.
 Differential backup: Differential is similar to incremental backup but the difference is
that the recovery process is simplified by not clear the archive bit. So a file that isupdated
after a normal backup will be archived every time a differential backup is run until the
next normal backup runs and clears the archive bit.

Department of CSE, NRCM Page 111


UNIT– V
Data on External Storage, File Organization and Indexing, Cluster Indexes, Primary and
Secondary Indexes, Index data Structures, Hash Based Indexing, Tree base Indexing,
Comparison of File Organizations, Indexes and Performance Tuning, Intuitions for tree Indexes,
Indexed Sequential Access Methods (ISAM), B+ Trees: A Dynamic Index Structure.

1. DATA ON EXTERNAL STORAGE


Primarymemoryhas limitedstoragecapacityand is volatile. Toovercomethis limitation,
secondary memory is also termed as external storage devices are used. External storage devices
such as disks and tapes are used to store data permanently.

The Secondary storage devices can be fixed or removable. Fixed Storage device is an
internal storage device like hard disk that is fixed inside the computer. Storage devices that are
portable and can be taken outside the computer are termed as removable storage devices such as
CD, DVD, external hard disk, etc.

Magnetic/optical Disk:It supports random and sequential [Link] takes less access time.
Magnetic Tapes:It supports only sequential [Link] takes more access time.
In DBMS, processing a query and getting output need accessing random pages. So, disks
are preferable than magnetic tapes.

2. FILE ORGANIZATION
The database is stored as a collection of files. Each file contains a set of records. Each record
is a collection of fields. For example, a student table (or file) contains many records and each
record belongs to one student with fields (attributes) such as Name, Date of birth, class,
department, address, etc.

File organization defines how file records are mapped onto disk blocks.

The records of a file are stored in the disk blocks because a block is the unit of data transfer
between disk and memory. When the block size is larger than the record size, each block will
contain more than one record. Sometimes, some of the files may have large records that cannot
fit in one [Link] this case,we can store part of a record on one block and the rest on another. A
Department of CSE, NRCM Page 112
pointer at the end of the first block points to the block containing the remainder of the record.

The different types of file organization are given below:


HeapFile SequentialFile
Organization Organization

File
Organization

Hash File ClusteredFile


Organization Organization

Heap File Organization: When a file is created using Heap File Organization mechanism, the
records are stored in the file in the order in which they are inserted. So the new records are
inserted at the end of the file. In this type of organization inserting new records is more efficient.
It uses linear search to search records.

Sequential File Organization: When a file is created using Sequential File Organization
mechanism, all the records are ordered (sorted) as per the primary key value and placed in the
file. In this type of organization inserting new records is more difficult because the records need
to be sorted after inserting every new [Link] uses binary search to search records.

Hash File Organization: When a file is created using Hash File Organization mechanism, ahash
function is applied on some field of the records to calculate hash value. Based on the hash value,
the corresponding record is placed in the file.

Clustered File Organization: Clustered file organization is not considered good for large
databases. In this mechanism, related records from one or more relations are kept in a same disk
block, that is, the ordering of records is not based on primary key or search key.

3. INDEXING
If the records in the file are in sorted order, then searching will become very fast. But, in
most of the cases, records are placed in the file in the order in which they are inserted, so new
records are inserted at the end of the file. It indicates, the records are not in sorted order. In
order to make searching faster in the files with unsorted records, indexing is used.

Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column contains a
copy of the primary or candidate key of a table. The second column contains a set of disk block
addresses where the record with that specific key value stored.
Department of CSE, NRCM Page 113
Indexing in DBMScan be of the following types:

Indexing

PrimaryIndexing SecondaryIndexing ClusteringIndexing

Dense Indexing SparseIndexing

i. Primary Index
 If the index is created by using the primary key of the table, then it is known as primary
indexing.
 As primary keys are unique and are stored ina sorted manner,the performance of the
searching operation is quite efficient.
 The primary index can be classified into two types: dense index and sparse index.

Dense index
 If every record in the table has one index entry in the index table, then it is called
denseindex.
 In this, the number of records (rows) in the index table is same as the number of records
(rows) in the main table.
 Aseveryrecord hasoneindexentry, searchingbecomes faster.

TS TS Hyderabad KCR
AP AP Amaravathi Jagan
TN TN Madras PalaniSwamy
MH MH Bombay Thackray

Sparse index
 Ifonlyfewrecordsinthetablehaveindexentriesintheindextable,thenitiscalled sparse index.
 In this, the number of records (rows) in the index table is less than the number of records
(rows) in the main table.
 Asnotalltherecordhaveindex entries,searchingbecomes slowforrecordsthatdoesnot have
index entries.

Department of CSE, NRCM Page 114


TS TS Hyderabad KCR
TN AP Amaravathi Jagan
MH TN Madras PalaniSwamy
MH Bombay Thackray

ii. Secondary Index


When the sizeofthe main table grows, then sizeofindex tablealso grows. If theindex tablesize
grows then fetching the address itself becomes slower. To overcome this problem, secondary
indexing is introduced.

 Insecondaryindexing, toreducethesizeofmapping, another levelof indexingis introduced.


 Itcontainstwolevels. Inthefirstleveleachrecordinthemaintablehasoneentryinthefirst- level
index table.
 The index entries in the fisrt level index table are divided into different groups. For each
group, one index entry is created and added in the second level index table.

Multi-levelIndex:Whenthemaintablesizebecomestoolarge,creatingsecondarylevelindex
improves the search [Link] if the search process is slow; we can add one more level of
indexing and so on. This type of indexing is called multi-level index.

Department of CSE, NRCM Page 115


iii. Clustering Index
 Sometimestheindexiscreatedonnon-primarykeycolumnswhichmaynotbeunique for
each record.
 Inthiscase,toidentifytherecordfaster,wewillgrouptwoormorecolumnstogetthe unique
value and create index out of them. This method is called a clustering index.
 Therecordswhichhavesimilarcharacteristicsaregrouped,andindexesare createdfor these
group.

Example:Consider acollegecontainsmanystudentsineachdepartment. Allthestudentsbelong to the


same Dept_ID are grouped together and treated as a single cluster. One index pointers point to
the one cluster as a whole. The idex pointer points to the first record in each [Link]
Dept_ID is a non-unique key.

IndexFile Recordsof tableinmemory


CSE 501 Ajay BCD
ECE 502 Ramya BCA
EEE … … …
… 560 Fathima BCE
401 VijayReddy OC
… … …
460 Mallesh ST
201 Jhon SC
… … …
260 Sowmya BCC
… … …
… … …

In above diagram we can see that, indexes are created for each department in the index
file. In the data block, the students of each department are grouped together to form the cluster.
The address in the index file points to the beginning of each cluster.

4. HASHBASED INDEXING
Hashing is a technique to directly search the location of desired data on the disk without
using index structure. Hash function is a function which takes a piece of data ( key) as input and
produces a hash valueas output which maps the data to a particular location in the hash table.

Department of CSE, NRCM Page 116


Theconcept of hashingandhash table is shownin thebelow figure

There aremainlytwo typesof hashing methods:


i. Static Hashing
ii. DynamicHashing
 Extendedhashing
 Linearhashing

5. STATIC HASHING
Instatichashing,[Link] example
consider the hash function
f(x)= x mod 7
Foranyvalueofx, theabovefunctionproducesoneof thehashvaluefrom{0,1,2, 3,4, 5,6}. It means
static hashing maps search-key values to a fixed set of bucket addresses.

Example:Inserting10,21,16and12inhash table.

HashValue DataRecord
f(10)=10 mod7=3 0 21*
f(21)=21 mod7=0 1
f(16)=16mod 7=2 2 16*
f(12)=12 mod7=5 3 10*
4
5 12*
6

Figure5.1:Statichashing

Department of CSE, NRCM Page 117


Suppose, latter if we want to insert 23, it produce hash value as 2 ( 23 mod 7 = 2 ). But, in the
above hash table, the location with hash value 2 is not empty (it contains 16*). So, a collision
occurs. To resolve this collision, the following techniques are used.
o Open addressing
o SeparateChainingorClosed addressing

i. Open Addressing:

Openaddressingisacollisionresolvingtechniquewhichstoresallthekeysinsidethe hash table. No


key is stored outside the hash table. Techniques used for open addressing are:

o LinearProbing
o QuadraticProbing
o DoubleHashing

 LinearProbing:

In linear probing, when there is a collision, we scan forwards for the next empty slot to
fill the key’s record. If you reach last slot, then start from beginning.

Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slotwith
2 is not empty. Then move to next slot (hash value 3), even it is also full, then move once
again to next slot with hash value 4. As it is empty store 23 there. This is shown in the
below diagram.

HashValue DataRecord

0 21*

f(23)=23 mod7=2 2 16*

3 10*

4 23*

5 12*

Figure5.2:LinearProbing

Department of CSE, NRCM Page 118


 QuadraticProbing:

In quadratic probing, when collision occurs, it compute new hash value by taking the
original hash value and adding successive values of quadratic polynomial until an open
slot is found. If here is a collision, it use the following hash function: h(x) = ( f(x) + i2 )
mod n ,where I = 1, 2, 3, 4,….. and f(x) is initial hash value.

Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slot
with hash value 2 is not empty. Then compute new hash value as (2 +1 2) mod 7 =3, even
it is also full, and then once again compute new hash value as (2 +2 2) mod 7 = 6. As it is
empty store 23 there. This is shown in the below diagram.

Hash
DataRecord
Value

0 21*

f(23)=23 mod7=2 2 16*

3 10*

5 12*

6 23*

Figure5.3:Quadratic Probing

 DoubleHashing

In double hashing, there are two hash functions. The second hash function is used to
provide an offset value in case the first function causes a collision. The followingfunction
is an example of double hashing: (firstHash(key) + i * secondHash(key)) % tableSize.
Use i = 1, 2, 3, …

Apopular secondhash functionis : secondHash(key)=PRIME–(key%PRIME)


wherePRIMEisaprimesmallerthantheTABLE_SIZE.

Department of CSE, NRCM Page 119


Example:[Link] toinsert23,[Link] with hash
value 2 is not empty. Then compute double hashing value as
secondHash (key)=PRIME–(key%PRIME)→ secondHash(23)=5–(23 %5)=2
Doublehashing:(firstHash(key)+i*secondHash(key))%tableSize→ (2+1*2))%7 =4
Astheslotwithhashvalue4isempty,[Link] diagram.
HashValue DataRecord
0 21*
1
f(23)=23 mod7=2 2 16*
3 10*
4 23*
5 12*
6

Figure5.4:Double Probing

ii. Separate Chaining or Closed addressing:


To handle the collision, This technique creates a linked list to the slot for which collision
occurs. The new key is then inserted in the linked list. These linked lists to the slots appear like
chains. So, this technique is called as separate chaining. It is also called as closed addressing.

Example:Inserting10, 21,16, 12, 23,19, 28, 30in hash table.

HashValue DataRecord
f(10)=10 mod7=3 0 21*

f(21)=21 mod7=0 1

f(16)=16mod 7=2 2 16*


23* 30*
f(12)=12 mod7=5 3 10*

f(23)=23 mod7=2 4

f(19)=19 mod7=5 5 12* 19*


f(30)=30 mod7=2 6

Figure5.5:Separatechainingexample

Department of CSE, NRCM Page 120


6. DYNAMIC HASHING
The problem with static hashing is that it does not expand or shrink dynamically as the
size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-demand. Dynamic hashing can be
implemented using two techniques. They are:

o Extendedhashing
o LinearHashing

i. Extendable hashing
In extendable hashing, a separate directory of pointers to buckets is used. The number
bitsusedindirectoryiscalledglobaldepth(gd)andnumberentriesindirectory=2 [Link] bits used
for locating the record in the buckets is called local depth(ld) and each bucket canstores up to 2ld
entries. The hash function use last few binarybits of the keyto find the bucket. If
abucketoverflows,itsplits,andiflocaldepth greaterthanglobaldepth,thenthetabledoublesin size. It is
one form of dynamic hashing.
Example: Let global depth (gd) = 2. It means the directory contains four entries. Let the local
depth (ld) of each bucket = 2. It means each bucket need two bits to perform search operation.
Let each Bucket capacity is four. Let us insert 21, 15, 28, 17, 16, 13, 19, 12, 10, 24, 25 and 11.
21 = 10101 19 = 10011
15 = 01111 12 = 01100
28 = 11100 10 = 01010
17 = 10001 24 = 11000
16 = 10000 25 = 11101
13 = 01101 11 = 01011

Local depth 2
Globaldepth 28* 16* 12* 24* Bucket A
2 2
00 21* 17* 25* 13* Bucket B
01
2
10 BucketC
11 10*
Directory 2
15* 19* 11* BucketD
Figure6.1:Extendible hashing example
Now, let us consider insertion of data entry 20* (binary 10100). Looking at directory
element00,weareledtobucketA,[Link],wehavetosplitthebucketby

Department of CSE, NRCM Page 121


allocating a new bucket and redistributing the contents (including the new entry to be inserted)
across the old bucket and its 'split image (new bucket). To redistribute entries across the old
bucket and its split image, we consider the last three bits; the last two bits are 00, indicating a
data entrythat belongs to one of these two buckets, and the third bit discriminates between these
buckets. That is if a key’s last three bits are 000, then it belongs to bucket A and if the last three
bits are 100, then the key belongs to Bucket [Link] we are using three bits for A and A2, so the
local depth of these buckets becomes 3. This is illustrated in the below Figure 6.2.

Local depth 3
16* 24* BucketA
Globaldepth 3
28* 12* 20* Bucket A2
2
00 2
01 21* 17* 25* 13* Bucket B
10
2
11 BucketC
10*
Directory 2
BucketD
15*19* 11*
Figure6.2:Afterinserting20andsplittingBucket A

Aftersplit, BucketAandA2havelocaldepthgreaterthanglobaldepth(3>2),so double the


directory and use three bits as global depth. As Bucket A and A2 has local depth 3, so they have
separate pointers from the directory. But, Buckets B, C and D use only local depth 2, so they
have two pointers each. This is shown in the below diagram.

Local depth 3
16* 24* BucketA
Globaldepth 3
28* 12* 20* BucketA2
3
000
001 2 BucketB
010 21* 17* 25* 13*
011
100 2 BucketC
101 10*
110
111 BucketD
2
Directory 15*19* 11*

Figure6.3: After inserting 20and doubling the directory

Department of CSE, NRCM Page 122


An important point that arises is whether splitting a bucket necessitates a directory
doubling. Consider our example, as shown in Figure 6.3. If we now insert 9* (01001), it belongs
inbucketB;[Link] dealwiththissituationbysplittingthebucket and using
directory elements 001 and 101to point to the bucket and its split image. This is shown in the
below diagram. As Bucket B and its split image now have local depth three and it is not
greaterthan global depth. Hence, abucket split does not necessarilyrequire adirectorydoubling.
However, if either bucket A or A2 grows full and an insert then forces a bucket split, we are
forced to double the directory once again.
Local depth 3
16* 24* BucketA
Globaldepth 3
28* 12* 20* BucketA2
3
000
3
001 BucketB
010 17* 9*
011 3
100 BucketB2
21* 25* 13*
101
110 2
BucketC
111 10*

2
Directory BucketD
15* 19* 11*
Figure6.4:Afterinserting 9

KeyObservations:
 ABucketwillhavemorethanonepointerspointingtoitifitslocaldepthislessthan the global
depth.
 Whenoverflowconditionoccursinabucket,alltheentriesinthebucketarerehashed with a
new local depth.
 IfnewLocalDepthoftheoverflowingbucketisequaltotheglobaldepth,onlythenthe
directories are doubled and the global depth is incremented by 1.
 Thesizeof abucketcannot bechangedafterthedata insertionprocess begins.

ii. LinearHashing
Linear hashing is a dynamic hashing technique that linearly grows or shrinks number of
buckets in a hash file without a directory as used in Extendible Hashing. It uses a family of hash
functions instead of single hash function.

Department of CSE, NRCM Page 123


This scheme utilizes a family of hash functions h0 , h1,h2, ... , with the property that each
function's range is twice that of its predecessor. That is, if himaps a data entry into one of N
buckets, hi+1 maps a data entry into one of 2N buckets. One example of such hash functionfamily
can be obtained by: hi+1(key)=keymod (2iN) where N is the initial number of
buckets and i = 0,1,2,….
Initiallyit use N buckets labelled 0 through N–1 and an initial hashingfunction h0(key) =
key% N is used to map anykeyinto one of the N [Link] each overflow bucket, one of the
buckets in serial order will be splited and its content is redistributed between it and its split
image. That is, for first time overflow in any bucket, bucket 0 will be splited, for second time
overflow in any bucket; bucket 1 will be splited and so on. When number of buckets becomes
2N, then this marks the end of splitting round [Link] function h0 is no longer needed as all
2N buckets can be addressed by hashing function h1. In new round namely splitting-round 1,
[Link] repeated
when the hash file grows.

Example: Let N = 4, so we use 4 buckets and hash function h0(key) = key % 4 is used to map
any key into one of the four buckets. Let us initially insert 4, 13, 19, 25, 14, 24, 15, 18, 23, 11,
16, 12 and [Link] is shown in the below figure.

Bucket#h1h0 Primarypages Overflowpages


0 000 00 4* 24* 16* 12*

1 001 01 13* 25*

2 010 10 14* 18* 10*

3 011 11 19* 15* 23* 11*

Next,when27isinserted,[Link],bucket0(firstbucket)issplited
anditscontent isdistributed betweenbucket 0andnew bucket. This isshown inbelow figure.

Bucket#h1h0 Primarypages Overflowpages


0 000 00 24* 16*

1 001 01 13* 25*

2 010 10 14* 18* 10*

3 011 11 19* 15* 23* 11* 27*

4 100 00 4* 12*

Department of CSE, NRCM Page 124


Next,when30,31and34isinserted,[Link],bucket1issplited and its content
is distributed between bucket 1 and new bucket. This is shown in below figure.
Bucket# h1 h0 Primarypages Overflowpages
0 000 00 24* 16*
1 001 01 13*
2 010 10 14* 18* 10* 30* 34*
3 011 11 19* 15* 23* 11* 27* 31*
4 100 00 4* 12*
5 101 01 25*

When32,35,40and48isinserted,[Link],bucket2issplited andits content is


distributed between bucket 2 and new [Link] is shown in below figure.
Bucket# h1 h0 Primarypages Overflowpages
0 000 00 24* 16* 32* 40* 48*
1 001 01 13*
2 010 10 18* 10* 34*
3 011 11 19* 15* 23* 11* 27* 31* 35*
4 100 00 4* 12*

5 101 01 25*
6 110 10 14* 30*

When26,20and42isinserted,[Link],bucket3issplited andits content is


distributed between bucket 3 and new [Link] is shown in below figure.
Bucket# h1 h0 Primarypages Overflowpages
0 000 00 24* 16* 32* 40* 48*

1 001 01 13*

2 010 10 18* 10* 34* 26* 42

3 011 11 19* 11* 27* 35*

4 100 00 4* 12* 20*

5 101 01 25*

6 110 10 14* 30*

7 111 11 15* 23* 31*

Department of CSE, NRCM Page 125


This marks the end of splitting [Link] function h0is no longer needed as all 2N
buckets can be addressed by hashing function h1. In new round namely splitting-round 1, bucket
split once again starts from bucket 0. A new hash function h2will be [Link] process is
repeated.

7. INTUITIONS FOR TREE INDEXES


We can use tree-like structures as index as well. For example, a binary search tree can
also be used as an index. If we want to find out a particular record from a binary search tree, we
have the added advantage of binary search procedure, that makes searching be performed even
faster. A binary tree can be considered as a 2-way Search Tree, because it has two pointers in
each of its nodes, thereby it can guide you to two distinct ways. Remember that for every node
storing 2 pointers, the number of value to be stored in each node is one less than the number of
pointers, i.e. each node would contain 1 value each.

The abovementioned concept can be further expanded with the notion of the m-Way
Search Tree, where m represents the number of pointers in a particular node. If m = 3, then each
nodeofthesearchtreecontains3pointers,[Link] use mainly two
tree structure indexes in DBMS. They are:

 IndexedSequentialAccessMethods(ISAM)
 B+Tree

8. INDEXED SEQUENTIAL ACCESS METHODS(ISAM)


ISAMisatreestructuredatathatallowstheDBMStolocateparticularrecordusingindex without
having to search the entire data set.
 Therecords in a filearesorted accordingto theprimarykeyand saved in the disk.
 Foreachprimarykey,[Link] index is
nothing but the address of record.
 Asorted data file accordingto primaryindexiscalled anindexed sequential file.
 TheprocessofaccessingindexedsequentialfileiscalledISAM.
 [Link] primary key
has to be selected to make ISAM efficient.

Department of CSE, NRCM Page 126


 ISAM gives flexibility to generate index on other fields also in addition to primary
keyfields.
ISAMcontainthreetypesofnodes:
 Non-leafnodes: Theystorethe search indexkeyvalues.
 Leafnodes: Theystoretheindexof records.
 Overflownodes: Theyalsostorethe indexofrecordsbut aftertheleafnodeis full.

OnISAM, wecanperform search,insertionanddeletion operations.


Search Operation: It follows binarysearch process. The record to be searched will be available
in the leaf nodes or in overflow nodes only. The non-leaf nodes are used to search the value.
Insertion operation: First locate a leaf node where the insertion to be take place (use binary
search). After finding leaf node, insert it in that leaf node if space is available, else create an
overflow node and insert the record index in it, and link the overflow node to the leaf node.
Deletion operation: First locate a leaf node where the deletion to be take place (use binary
search). After finding leaf node, if the value to be deleted is in leaf node or in overflow node,
remove it. If the overflow node is empty after removing the deleted value, then delete overflow
node also.
Example: Insert10, 23, 31, 20,68, 35,42,61, 27,71, 46and 59
31

23 68 42 59

10 20 23 27 68 71 31 35 42 46 59 61

Department of CSE, NRCM Page 127


Afterinserting24, 33, 36, and 39 in theabovetree, it looks like

31

23 68 42 59

10 20 23 27 68 71 31 35 42 46 59 61

24 33 36

39

Deletion:From the above figure, after deleting 42, 71,24 and 36

31

23 68 42 59

10 20 23 27 68 31 35 46 59 61

33

39

9. B+ TREE
B+ Tree is an extension of Binary Tree which allows efficient insertion, deletion and search
operations. It is used to implement indexing in DBMS. In B+ tree, data can be stored onlyon the
leaf nodes while internal nodes can store the search key values.

1. B+treeof anordermcanstoremaxm-1valuesateach node.


2. Eachnode canhaveamaximumofmchildrenandatleastm/2children(except root).
3. Thevaluesin eachnode areinsorted order.
4. Allthenodesmust containatleasthalf fullexcepttheroot node.
5. Onlyleafnodescontainvaluesandnon-leaf nodescontainsearch keys.

Department of CSE, NRCM Page 128


B+ Search:
SearchingforavalueintheB+-Treealwaysstartsattherootnodeandmovesdownwards until it
reaches a leaf node. The search procedure follows binary tree search procedure.

1. Readthe valueto [Link] ussaythis valueasX.


2. Startthesearchprocessfromroot node
3. Ateachnon-leafnode(includingroot node),
a. Ifall thevalues inthenon-leafnodeare greaterthanX, thenmove toits first child
b. Ifallthevaluesinthenon-leafnodearelessthanorequaltoX,thenmovetoits last child
c. Ifforanytwoconsecutivevaluesinthenon-leafnode,leftvalueislessandright value
greaterthan or equal to X, then moveto the child nodewhosepointer is in between
these two consecutive values.
4. Repeatstep-3untilaleaf nodereaches.
5. Atleafnodecomparethekeywiththevaluesinthatnodefromlefttoright. Ifthekey value is
found then display found. Otherwise display it is not found.

Example:Searchingfor35in thebelowgiven B+ tree. Thesearchpath isshown inred color.

18

11 31 64

8 15 23 42 59 68

2 5 8 9 11 12 15 16 18 20 23 27 31 35 42 46 59 61 64 66 68 71

B+ Insertion:

1. Applysearch operation on B+treeandfind aleafnodewherethenew value has to insert.


2. Iftheleaf nodeis notfull, theninsertthevaluein theleaf node.
3. Iftheleafnodeisfull, then

Department of CSE, NRCM Page 129


a. Split that leaf node including newly inserted value into two nodes such that each
contains half of the values (In case of odd, 2nd node contains extra value).
b. Insertsmallestvaluefromnewrightleafnode(2ndnode)intotheparentnode. Add
pointers from these new leaf nodes to their parent node.
c. If theparent is full, split it too. Addthemiddlekey(In caseof even,1 st valuefrom 2nd
part)of this parent node to its parent node.
d. Repeatuntilaparentis foundthatneednot split.
4. If therootsplits, createanewroot whichhasonekeyandtwo pointers.

Example:Insert1,5,3,7,9,2,4,6,8,10intoB+treeofanorder4.

B+treeoforder4indicatestherearemaximum3valuesinanode. Initially

Afterinserting1
1

Afterinserting5
1 5

Afterinserting3
1 3 5

5
Afterinserting7
1 3 5 7
1 3 5 7

Afterinserting9
5

1 3 5 7 9

Afterinserting2
5

1 2 3 5 7 9

Department of CSE, NRCM Page 130


Afterinserting4
5 3 5

1 2 3 4 5 7 9 1 2 3 4 5 7 9

Afterinserting6
3 5

1 2 3 4 5 7 9 6

3 5 7

1 2 3 4 5 6 7 9

Afterinserting8
3 5 7

1 2 3 4 5 6 7 8 9

Afterinserting10
3 5 7

1 2 3 4 5 6 7 8 9 10

3 5 7 9

1 2 3 4 5 6 7 8 9 10

3 5 9

1 2 3 4 5 6 7 8 9 10

Department of CSE, NRCM Page 131


B+Deletion

 Identifythe leaf nodeLfromwheredeletion should takeplace.


 Removethedatavaluetobe deletedfrom theleafnodeL
 IfLmeetsthe"half full"criteria, thenitsdone.
 IfLdoesnotmeetsthe "halffull"criteria,then
o If L's right sibling can give a data value, then move smallest value in right sibling
to L (After giving a data value, the right sibling should satisfy the half fullcriteria.
Otherwise it should not give)
o Else, if L's left sibling can give a data value, then move largest value in leftsibling
to L (After giving a data value, the left sibling should satisfy the half full criteria.
Otherwise it does not give)
o Else,merge Landasibling
o If any internal nodes (including root) contain key value same as deleted value,
then delete those values and replace with it successor. This deletion may
propagate up to root. (If the changes propagate up to root then tree height
decreases).

Example:Considerthegivenbelowtree anddelete19,
19

5 14 24 33

2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39

Delete19:Half fullcriteriaissatisfied evenafter deleting19,so justdelete19 fromleaf node

19

5 14 24 33

2 3 5 7 14 16 20 22 24 27 29 33 34 38 39

Department of CSE, NRCM Page 132


Delete 20: Half full criteria is not satisfied after deleting 20, so bring 24 from its right siblingand
change key values in the internal nodes.
19

5 14 27 33

2 3 5 7 14 16 22 24 27 29 33 34 38 39

Delete 24: Half full criteria is not satisfied after deleting 24, bringing a value from its siblings
also not possible. Therefore merge it with its right sibling and change key values in the internal
nodes.
19

5 14 33

2 3 5 7 14 16 22 27 29 33 34 38 39

Delete 5: Half full criteria is not satisfied after deleting 5, bringing a value from its siblings also
not possible. Therefore merge it with its left sibling (you can also merge with right) and change
key values in the internal nodes.
19

14 33

2 3 7 14 16 22 27 29 33 34 38 39

Delete7:Halffullcriteriaissatisfiedevenafterdeleting7,sojustdelete7fromleafnode.
17

14 33

2 3 14 16 22 27 29 33 34 38 39

Department of CSE, NRCM Page 133


Delete 2: Half full criteria is not satisfied after deleting 2, bringing a value from its siblings also
not possible. Therefore merge it with its right sibling and change key values in the internalnodes.
22 33

3 14 16 22 27 29 33 34 38 39

9. INDEXES AND PERFORMANCE TUNING


Indexing is very important to execute DBMS query more efficiently. Adding indexes to
important tables is a regular part of performance tuning. When we identify a frequently executed
querythat is scanningatableorcausingan expensivekeylookup, then first consideration is ifan
index can solve this problem. If yes add index for that table.

While indexes can improve query execution speed, the price we pay is on index
maintenance. Update and insert operations need to update the index with new data. This means
that writes will slow down slightly with each index we add to a table. We also need to monitor
index usage and identify when an existing index is no longer needed. This allows us to keep our
indexing relevant and trim enough to ensure that we don’t waste disk space and I/O on write
operations to any unnecessary [Link] improve performance of the system,we need to do the
following:

 Identifytheunusedindexesand removethem.
 Identifytheminimallyused indexesand remove them.
 An index that is scanned more frequently, but rarely findsthe required answer. Modifythe
index to reach the answer.
 Identifytheindexes that are verysimilarand combinethem.

-oO0Oo–

Department of CSE, NRCM Page 134

You might also like