Database Management Systems Overview
Database Management Systems Overview
Tech: AI & ML
onwards (UGC Autonomous) II Year – I Sem
Topics Covered:
Database System Applications:
The different applications of DBMS are as follows −
departure of the train and the status of arrival and give updates to the passengers
1
Library Management System − There will be so many numbers of books in the
library and it is very hard to keep a record of all the books in a register or a copy.
So, DBMS is necessary to keep track of all the book records, issue dates, name of
Banking − We are doing a lot of transactions daily without directly going to the
banks. The only reason is the usage of databases and it manages all the data of
Educational Institutions − All the examinations and the data related to the
students maintained over the internet with the help of a database management
system. It contains registration details of the student, results, grades and courses
available. All these works can be done online without visiting an institution.
Social Media Websites − By filling the required details we are able to access
social media platforms. Many users daily sign up for social websites such as
Facebook, Pinterest and Instagram. All the information related to the users are
.
2
Database Systems Vs File Systems
[Link].
File System DBMS
File system is software that DBMS is software to create and manage the
manages the data file and the user database file, it provides more advantages
has to write the programs for than a file system. DBMS is a collection of
managing the database. Handling data. In DBMS the user is not required to
a file system is easier than DBMS, write the programs.
File system stored data in the form Data stored in a tabular form i.e., structured
of unstructured format. format.
Exampe: Data stores in a tabular form
Exampe: In a Notepad file data
2 stores like Empno Name Salary
101 xyz 20000
1. Abc 20000 manager 102 Abc 40000
hyderabad 103 Sam 35000
104 Smith 45000
3
2. Xyz 4000 director pune
Data redundancy and Data In DBMS, the related data resides in the
inconsistency exist in file system. same storage location or the same
File processing system has more information may not be duplicated as a result
data redundancy minimizing data redundancy and reduced
3 data inconsistency. In DBMS there is no
redundancy and no inconsistency.
The file system offers lesser Database Management System offers high
6. security. security.
them.
7.
4
Unauthorized access is not Unauthorized access is restricted in DBMS
restricted in file system
8. Multiple user access to the Allow multiple users to access the database
information is difficult to provide at the same time
.
5
View of Data-Data Abstraction
Levels of Abstraction in a DBMS
There are three levels of data abstraction in DBMS which reduce the complexity
of the database. They are
6
.1. Internal Level:
Physical Schema or Internal level is the lowest level of abstraction in the DBMS which
describes how the data is actually stored in the database and it also describes complex
low level data structure and access methods used by database. This internal level deals
with the storage of the data for the whole database system. This is the first or lowest
level of abstraction which describes how a record is actually stored in the system
memory. It is a low-level representation of the database
2. Logical Level:
Logical level or Conceptual Schema is the intermediate level, next in higher level and
also known as conceptual level which describes what data is stored and reveals the
relationships that exists among the stored data.
It tried to describe the entire data. It means what tables to be created and what are the
links are between these tables are mentioned in this level. This is less complicated than
the physical level. Little bit of complexity over here as well. But, it is not that much like at
the physical level. This level is used by database administrators or developer.
Example: Let us take an example where we use the relational model for storing the
data. We have to store the data of a student, the columns in the student table will be
student_name, age, mail_id, roll_no etc. We have to define all these at this level while
we are creating the database.
It is less complex than the physical level. So, overall, the logical level contains tables
(fields and attributes) and relationships among table attributes.
7
3. View Level:
View level or External Schema: When coming to the third level that is the view level and
it is the highest level. This is also called the external level. There are different levels of
views in view level and each view defines only a part of a entire data. It also interacts
with the user since it provides the different views of the same database. It also provides
multiple views for the same database. View level can be used by all the users of the
database.
Example: If the student has a login-id and password in a university system, then as
a student, he can view his marks, attendance, fee structure, etc. But the faculty of the
university will have a different view. He will have options like salary, edit marks of a
student, enter attendance of the students, etc. So, both the student and the faculty
have a different view.
Example:
Let’s say a table teacher in our database whose name is School, suppose the table
has 50 records so the instance of the database has 50 records for now and tomorrow
we are going to add another fifty records so tomorrow the instance has a total of 100
records. This is called an instance.
Schema
Schema is the overall description of the database. The basic structure of how the
data will be stored in the database is called schema.
Example:
Let’s say a table teacher in our database named school, the teacher table requires
the name, dob, and doj in their table so we design a structure as:
Teacher table
name: String
doj: date
dob: date
8
Data Models
Data Model:
Data model means to model the data i.e., to give a shape to the data and to give a
figure to the stored data. A data model makes it easier to understand the meaning of
the data by its figure.
There are various types of data model but the relational model is the most widely used
model.
1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Model
5. Object-Relational Model
6. Entity-Relationship Model
1. Hierarchical Model
Hierarchical Model was the first DBMS model and one of the oldest Database Model
The general shape of this model is like an Organizational chart (Example-2). A node
on the chart represents a particular entity. The terms parent and child are used in
describing a hierarchical model. This model organizes the data in the hierarchical tree
structure. The hierarchy starts from the root which has root data and then it expands
in the form of a tree adding child node to the parent node.
.
9
Example: Organization Chart
2. Network Model
This model is the same as the hierarchical model; the only difference is that a record
can have more than one parent. The network model was developed to overcome the
limited scope of hierarchical model. It replaces the hierarchical tree with a graph.
In Network Model, multiple parent-child relationships are used. The network model
uses a network structure, which is a data structure of nodes and branches.
Example-2:
10
The network model for ‘UNIVERSITY’ system is shown above figure, the Mathematics
Department node is associated with ‘Computer Department’ node. Similarly ‘Computer
Lab; and ‘Library’ nodes are associated with both the ‘Mathematics Department’ and
‘Computer Department’ nodes.
3. Relational Model
Relational Model is the most widely used model. In this model, the data is maintained
in the form of a two-dimensional table. All the information is stored in the form of rows
and columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model.
For example student roll number, student name, student course etc., and columns
also known as attributes. An attribute represents the characteristics of a person, place
or thing, for example Salary attribute in a given below example.
11
Popular examples of standard relational databases include Microsoft SQL Server,
Oracle Database, MySQL and IBM DB2.
4. Object-Oriented Model
An object-oriented data model is one of the most developed data models which contain
video, graphical files, and audio.
5. Object-Relational Model
An Object relational model is a combination of a Object oriented model and a Relational
model. So, it supports objects, classes, inheritance etc. just like Object Oriented models
and has support for data types, tabular structures etc. like Relational data model.
One of the major goals of Object relational data model is to close the gap between
relational database and the object oriented database frequently used in many
programming languages such as C++, C#, Java etc.
6. Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In
this model, we represent the real-world problem in the pictorial form to make it easy to
understand. It is also very easy for the developers to understand the system by just
12
looking at the ER diagram. We use the ER diagram as a visual tool to represent an ER
Model.
1. Entity
2. Attribute
3. Relationship
Example:
.
13
Database Languages
The DDL Commands in Structured Query Language are used to create and modify the
schema of the database and its objects. The syntax of DDL commands is predefined for
describing the data. The commands of Data Definition Language deal with how the data
should exist in the database.
1. CREATE Command
2. DROP Command
3. ALTER Command
4. TRUNCATE Command
5. RENAME Command
1. CREATE Command
CREATE is a DDL command used to create databases, tables, triggers and other
database objects.
Example 1: This example describes how to create a new database using the CREATE
DDL command.
14
Example 2: This example describes how to create a new table using the CREATE DDL
command.
Example 3: This example describes how to create a new index using the CREATE DDL
command.
2. DROP Command
DROP command used to delete/remove the database objects from the SQL database.
We can easily remove the entire table, view, or index from the database using this
DDL command.
3. ALTER Command
ALTER command which changes or modifies the existing structure of the database, and
it also changes the schema of database objects.
Example: This example shows how to add a new field to the existing table.
15
4. TRUNCATE Command
TRUNCATE command which deletes or removes all the records from the table but
without structure.
This command also removes the space allocated for storing the table records.
5. RENAME Command
RENAME command which is used to change the name of the database table.
Example
This query changes the name of the table from Student to Student_Details.
.
16
DML:
We can easily access, store, modify, update and delete the existing records from the database using DML
commands.
1. SELECT Command
2. INSERT Command
3. UPDATE Command
4. DELETE Command
1. SELECT Command
SELECT is the most important data manipulation command in Structured Query
Language. The SELECT command shows the records of the specified table. It also
shows the particular record of a particular column by using the WHERE clause.
Example 1: This example shows all the values of every column from the table.
This SQL statement displays the following values of the student table:
Example 2: This example shows all the values of a specific column from the table.
17
This SELECT statement displays all the values of Emp_Salary and Emp_Id column
of Employee table:
2. INSERT Command
INSERT command allows users to insert data in database tables.
Example:
3. UPDATE Command
UPDATE command which allows users to update or modify the existing data in
database tables.
Here, 'UPDATE', 'SET', and 'WHERE' are the SQL keywords, and 'Table_name' is the
name of the table whose values you want to update.
Suppose, you want to update the Product_Price of the product whose Product_Id is
P102. To do this, you have to write the following DML UPDATE command:
UPDATE Student SET Stu_Marks = 80, Stu_Age = 21 WHERE Stu_Id = 103 AND Stu
_Id = 202;
4. DELETE Command
DELETE is a DML command which allows SQL users to remove single or multiple
existing records from the database tables.
18
We use the WHERE clause with the DELETE command to select specific rows from the
table.
A user issues an access request, using some application program or data manipulation
language. The application program determines what data are needed and
communicates the need to the database management system. The DBMS intercepts
the request and interprets it.
The database management system is a bridge between the application program, (that
determines what data are needed and how they are processed), and the operating
system of the computer, which is responsible for placing data on the magnetic storage
devices.
To retrieve data from the database, the following operations are performed internally:
2. The application program determines what data are needed and communicates the
need to the database management system.
There are 2 ways application programs can access the database through ODBC and
JDBC.
19
Database Users and Administrator:
There are two types of database users, Users, and Administrators.
Database Users
Database users are the ones who really use and take the benefits of the database. There
will be different types of users depending on their needs and way of accessing the
database
3. Stand-alone Users – These users will have a stand-alone database for their
personal use. These kinds of the database will have readymade database
packages which will have menus and graphical interfaces.
4. Naive Users – these are the users who use the existing application to interact with
the database. For example, online library system, ticket booking systems, ATMs
etc
For examples, Railway’s ticket booking users are naive users. Clerks in any bank
is a naive user because they don’t have any DBMS knowledge but they still use
the database and perform their given task.
Database Administrators
A DBA has many responsibilities. A good-performing database is in the hands of DBA.
Database Administrator (DBA) is a person/team who defines the schema and also
controls the levels of database.
The DBA will then create a new account id and password for the user if he/she need to
access the data base. DBA is also responsible for providing security to the data base
and he allows only the authorized users to access/modify the data base.
20
DBA also monitors the recovery and back up and provide technical support.
DBA repairs damage caused due to hardware and/or software failures.
Transaction Management:
Transactions are a set of operations used to perform a logical set of work. It is all the
instructions of a logical operation. A transaction usually means that the data in the
database has changed. One of the major uses of DBMS is to protect the user’s data
from system failures. It is done by ensuring that all the data is restored to a consistent
state when the computer is restarted after a crash. The transaction is any one
execution of the user program in a DBMS. One of the important properties of the
transaction is that it contains a finite number of steps. Executing the same program
multiple times will generate multiple transactions.
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
.
21
Data Base System Structure:
Storage Manager:
Storage Manager: Storage Manager is a program that provides an interface
between the data stored in the database and the queries received. It is also known
as Database Control System. It maintains the consistency and integrity of the
database by applying the constraints and executing the DCL statements. It is
22
responsible for updating, storing, deleting, and retrieving data in the database.
It contains the following components –
Authorization Manager: It ensures role-based access control, i.e,. checks whether
the particular person is privileged to perform the requested operation or not.
File Manager: It manages the file space and the data structure used to represent
information in the database.
Buffer Manager: It is responsible for cache memory and the transfer of data
between the secondary storage and main memory.
Data Dictionary: It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.
Indices: It provides faster retrieval of data item.
23
The Query Processor:
Query Processor: It interprets the requests (queries) received from end user via an
application program into instructions. It also executes the user request which is
received from the DML compiler.
Query Processor contains the following components –
DML Compiler: It processes the DML statements into low level instruction (machine
language), so that they can be executed.
DDL Interpreter: It processes the DDL statements into a set of table containing meta
data (data about data).
Embedded DML Pre-compiler: It processes DML statements embedded in an
application program into procedural calls.
Query Optimizer: It executes the instruction generated by DML Compiler.
.
24
MODULE-1 UNIT-2 : NOTES
Syllabus: Beyond ER Design, Entities, Attributes and Entity Sets,
Relationships and Relationship Sets, Additional Features of ER Model
odels
E-R diagrams:
Attributes and Entity Sets:
Attributes:
In DBMS, we have entities, and each entity contains some property about their behavior
which is also called the attribute. In relational databases, we have tables, and each
column contains some entity that has some attributes, so all the entries for that column
should strictly follow the attribute of the entity. Entities define the characteristic property
of the attributes.
25
.
1. Simple Attribute:
It is also known as atomic attributes. When an attribute cannot be divided further, then it is
called a simple attribute.
For example, in a student table, the branch attribute cannot be further divided. It is called a
simple or atomic attribute because it contains only a single value that cannot be broken further.
2. Composite Attribute:
Composite attributes are those that are made up of the composition of more than one
attribute. When any attribute can be divided further into more sub-attributes, then that
attribute is called a composite attribute.
26
For example, in a student table, we have attributes of student names that can be further
broken down into first name, middle name, and last name. So the student name will be
a composite attribute.
Another example from a personal detail table would be the attribute of address. The
address can be divided into a street, area, district, and state.
3. Single-valued Attribute:
Those attributes which can have exactly one value are known as single valued
attributes. They contain singular values, so more than one value is not allowed.
For example, the DOB of a student can be a single valued attribute. Another example is
gender because one person can have only one gender.
4. Multi-valued Attribute:
Those attributes which can have more than one entry or which contain more than one
value are called multi valued attributes.
27
In the Entity Relationship (ER) diagram, we represent the multi valued attribute by
double oval representation.
For example, one person can have more than one phone number, so that it would be a
multi valued attribute. Another example is the hobbies of a person because one can
have more than one hobby.
Derived attributes are also called stored attributes. When one attribute can be derived
from the other attribute, then it is called a derived attribute. We can do some calculations
on normal attributes and create derived attributes.
For example, the age of a student can be a derived attribute because we can get it by
the DOB of the student.
Another example can be of working experience, which can be obtained by the date of
joining of an employee.
Entity Sets:
An entity set is a group of similar entities and these entities can have attributes.
In terms of DBMS, an entity is a table or attribute of a table in database, so by showing
relationship among tables and their attributes, ER diagram shows the complete logical
structure of a database.
28
An entity set is a collection or set of all entities of a particular entity type at any point in
time. The type of all the entities should be the same.
Example:
The collection of all the students from the student table at a particular instant of
time is an example of an entity set.
The collection of all the employees from the employee table at a particular instant
of time is an example of an entity set.
1. One-to-One Relationship
According to this relationship, a single record in Table A is related to a single record in
Table B, and vice-versa is also true.
29
Example: Consider 2 entities -
Person - ID, name, age, address
Passport - passport_ID, passport_no
Only one passport is allowed per person, and each passport belongs to only one person.
Therefore it is a one-to-one relationship.
3. Many-to-Many Relationship
According to this relationship, each record of Table A can be related to one or
more than one record of table B and vice versa is also true, i.e., each record of
Table B can be connected to one or more than one record of table A.
30
Example: Consider 2 entities -
Each customer can buy more than 1 product, and many customers can buy one.
Therefore it is a many-to-many relationship.
Relationship sets
A relationship set is a set of relationships of same type. The number of entity sets that
participate in a relationship set is termed as the degree of that relationship set.
31
Additional Features of the ER Model in DBMS:
Three new concepts were added to the existing ER Model, they were:
1. Generalization
2. Specialization
3. Aggregration
1. Generalization:
Generalization is a process of extracting common properties from a set of entities and
creating a generalized entity from it. It is a bottom-up approach, and it helps to reduce
the size and complexity of the schema.
Example: Let us take two low-level entities as Car and Bus, and these two will have
many common attributes and some specific attributes. And We will generalize and link
the common attributes to the newly formed high-level entity named Vehicle.
2. Specialization
Specialization is opposite to Generalization. In this, entity is divided into subentities bases on
their charactertics(distingvishing features). It breaks an entity into multiple entities from higher
level to lower level. It is a top down approach.
32
3. Aggregration
Aggregation refers to the process by which entities are combined to form a single
meaningful entity. The specific entities are combined because they do not make sense
on their own. To establish a single entity, aggregation creates a relationship that
combines these entities. The resulting entity makes sense because it enables the
system to function well.
33
Module-2
Unit-1
SYLLABUS:
Unit-1: The Relational Model
Notes:
Topic-1: Introduction to the Relational Model
The relational model represents how data is stored in Relational Databases. A
relational database consists of a collection of tables, each of which is assigned a
unique name. Consider a relation STUDENT with attributes ROLL_NO, NAME,
ADDRESS, PHONE, and AGE shown in the table.
Table Student
ROLL_NO NAME ADDRESS PHONE AGE
34
ROLL_NO NAME ADDRESS PHONE AGE
4 SURESH DELHI 18
Properties of Relations
o Name of the relation is distinct from all other relations.
o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence
Domain Constraint
Entity Integer Constraint
35
Referential Integrity Constraint
Key Constraints
Domain Constraint:
A domain constraint is a restriction on the values that can be stored in a column. Strings,
character, time, integer, currency, date etc. Are examples of the data type of domain
constraints.
Entity integrity:
Entity integrity constraints would ensure that null values are not entered into any required
columns. It states that primary key value can't be null. This is because the primary key value is
used to identify individual rows in relation and if the primary key has a null value, then we can't
identify those rows
For example, if you have a column for "roll_number" an entity integrity constraint in DBMS would
ensure that this column cannot contain any null values.
Key Constraints:
A key constraint is a rule that defines how data in a column(s) can be stored in a table. A key is
composed of one or more columns whose values uniquely identify each row in the table. There
are several different types of key constraints in DBMS, each with its own specific purpose.
36
SQL provides several alternative ways to handle foreign key violations. We must consider three
basic questions:
1. If an Enrolled row is inserted, with a student_id column value that does not appear in
any row of the Students table
In this case, the INSERT command is simply rejected.
The symbol ´*´ means that we retain all fields of selected tuples in the result.
We can retrieve rows corresponding to students who are younger than 18 withthe following SQL
query:
Example
SELECT * FROM Students WHERE age < 18;
The condition age < 18 in the WHERE clause specifies that we want to select only tuples in
which the age field has a value less than 18.
37
In addition to selecting a subset of tuples, a query can extract a subset of the fields of each
selected tuple. we can compute the student_id and First_name of students who are younger
than 18 with the following query:
Example:
SELECT ID,FirstName FROM Students WHERE age < 18;
Syntax
SELECT table1.column1, table2.column2
FROM table1, table2
WHERE table1.column1 = table2.column1;
Logical database design is the process of transforming (or mapping) a conceptual schema of
the application domain into a schema for the data model underlying a particular DBMS, such
as the relational or object-oriented data model.
Logical database design is the process of deciding how to arrange the attributes of the entities
in a given business environment into database structures, such as the tables of a relational
database. The goal of logical database design is to create well structured tables that properly
reflect the company's business environment. The tables will be able to store data about the
company's entities in a non-redundant manner and foreign keys will be placed in the tables so
that all the relationships among the entities will be supported.
38
Logical database design helps you define and communicate your business' information
requirements. When you create a logical database design, you describe each piece of
information you need to track and the relationships among, or the business rules that govern,
those pieces of information.
Once you create a logical database design, you can verify with users and management that the
design is complete (that is, it contains all of the data that must be tracked) and accurate (that
is, it reflects the correct table relationships and enforces the business rules).
A view of data in DBMS is created to display the specific information that a user needs, rather
than revealing sensitive portions of the schema that should remain confidential to the owner.
Essentially, the data view in a DBMS presents a selective representation instead of exposing
the complete database. It’s important to note that any modifications made to the database will
also be reflected in the corresponding view.
Acquiring a view in DBMS involves formulating a set of statements designed to extract specific
elements from the database. These statements then construct a table on the user’s device,
organizing the outcomes in a tabular format. The pseudocode to obtain a data view can be
outlined as follows:
create
view view
as
from tableName
where condition;
Example:
The ALTER VIEW statement modifies an existing view by altering a reference type column to
add a scope. The ALTER VIEW statement also enables or disables a view for use in query
optimization.
The ALTER VIEW command modifies views created using the CREATE VIEW command or a
view projected from a persistent class. The altered view replaces the existing view, so you
cannot modify specific columns in a view.
After dropping a table if you try to use the table then compiler shows an error as ” table or view
does not exist
1. Alter command in SQL is used to make modifications to the columns in the existing table.
2. It is used to add columns, delete columns, drop constraints, renaming the columns, changing
the data type and data type size of the column existing in the table.
40
.
MODULE-2: Unit-2
Topic-1: Relational Algebra:
Relational algebra in dbms helps in the design, implementation, and optimization of queries and
provides a foundation for query languages such as SQL. It also helps to ensure that queries
are correct, efficient, and scalable.
Select operation chooses the subset of tuples from the relation that satisfies the
given condition mentioned in the syntax of selection. The selection operation is also
known as horizontal partitioning since it partitions the table or relation horizontally.
Notation:
σ c(R)
where ‘c’ is selection condition which is a boolean expression(condition), we can
have a single condition like Roll= 3 or combination of condition like X>2 AND Y<1,
symbol ‘σ (sigma)’ is used to denote select(choose) operator,
b) Projection:
This operation selects certain required attributes, while discarding other attributes.
41
Notation –
πA (R)
where ‘A’ is the attribute list, it is the desired set of attributes from the attributes of
relation(R),
symbol ‘π(pi)’ is used to denote the Project operator,
R is generally a relational algebra expression, which results in a relation.
1. Union
o The SQL Union operation is used to combine the result of two or more SQL SELECT
queries.
o In the union operation, all the number of datatype and columns must be same in both
the tables on which UNION operation is being applied.
o The union operation eliminates the duplicate rows from its resultset.
Example:
2. Union All
Union All operation is equal to the Union operation. It returns the set without removing
duplication and sorting the data.
42
Example:
Union All query will be like:
3. Intersect
o It is used to combine two SELECT statements. The Intersect operation returns the
common rows from both the SELECT statements.
o In the Intersect operation, the number of datatype and columns must be the same.
o It has no duplicates and it arranges the data in ascending order by default.
Example:
4. Minus
o It combines the result of two SELECT statements. Minus operator is used to display the
rows which are present in the first query but absent in the second query.
o It has no duplicates and data arranged in ascending order by default.
Example
43
Topic-3 Renaming:
The rename operator ρ is one of the unary operators in relational algebra and is used to
rename relations in a DBMS. In other words, some relations or attributes may have complex
names and can be changed to make writing queries easier.
Types
Three following types of renaming can be done:
Renaming a relation
Renaming an attribute
Renaming both
Rename a relation
Suppose we have a relation named Students and we want to change it
to FinalYrStudents, the rename operation works as follows:
a base for SQL (Structured Query Language) and is used to rename relations.
Syntax
ρnew_relation_name(new_attribute_name_list)(R)
The new_relation_name contains the name we want to replace with the original relation R and
the new_attribute_name_list contains the changed attribute names of R.
ρFinalYrStudents(Students)
Rename an attribute
Suppose we have a relation named Students and we want to change its
attributes StudentID, StudentName to SID and SName, the rename operation works as follows:
ρ(SID,SName)(Students)
The attributes will be ordered as same as the tables and must be renamed in the
same order. If only selective attributes are to be changed, rewrite the original attribute
names of those that are supposed to be unchanged.
44
Rename both
Next, we'll change both the relation name and attributes of
the Students class:
()()ρFinalYrStudents(SID,SName)(Students)
Topic-4: Joins:
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to
combine two or more tables".
In SQL, JOIN clause is used to combine the records from two or more tables in a database.
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the
condition is satisfied. It returns the combination of all rows from both the tables where the
condition satisfies.
Syntax
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right
table. If there is no matching join value, it will return NULL.
45
Syntax
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the
matched values from the left table. If there is no matching in both tables, it will return NULL.
Syntax
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables
have all the records from both tables. It puts NULL on the place of matches not found.
Syntax
46
the function result in an expression called a proposition. It can be either true or false. It is a
tailored version of a subset of the Predicate Calculus to communicate with the relational
database.
For example:
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.
47
For example:
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a database.
Module-3
Unit-1:
Syllabus:
The SELECT statement indicates that you which to query and retrieve information from
a database. The select_list specifies the type of information (or column names) to
48
retrieve. The keyword ALL or the wildcard character asterisk (*) could be used to signify
all columns in the database. Also, the keyword DISTINCT could be used to discard
duplicate records and retrieve only the unique records for the specified columns.
The FROM clause is the only required clause in the SELECT statement. The FROM
clause specifies the specific database tables to retrieve data from.
The WHERE clause limits the results to those records (or rows) that meet some particular
conditions (optional).
The GROUP BY clause specifies the format of the output. The expression specifies a
column listing such that all rows contained in the specified columns will be aggregated
together (optional).
The HAVING clause specifies the specific conditions to group by (optional).
The ORDER BY clause specifies whether to output the query result in ascending or
descending order. The expression specifies a column listing to order by (optional).
Example:
the command would be:
or
49
Example 2:
The following query displays the values of particular column from the
above Student_Record table:
Example 3:
The following query shows the record of those employees from the above table whose
Emp_Panelty is 500:
Example 4:
The following SELECT with GROUP BY query lists the number of cars of the same price:
50
Example 5:
The following query sorts the salary of employees in descending order from the above
Employee_Order table:
UNION:
The UNION operator is used to combine the result-set of two or
more SELECT statements.
Every SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in every SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Example:
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers
ORDER BY City;
51
UNION ALL Syntax
The UNION operator selects only distinct values by default. To allow duplicate values,
use UNION ALL:
Example:
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers
ORDER BY City;
INTERSECT:
The INTERSECT operator in SQL is used to retrieve the records that are
identical/common between the result sets of two or more tables.
52
EXCEPT:
The EXCEPT operator in SQL is used to retrieve all the unique records
from the left operand (query), except the records that are present in the
result set of the right operand (query).
53
which are not in common with the records of the second table.
54
for every row of the outer query.
A correlated subquery is evaluated once for each row processed by the parent statement.
The parent statement can be a SELECT, UPDATE, or DELETE statement.
55
Set Comparison operators: IN, ANY, ALL, EXISTS
Both the ANY and ALL operators can be used with various comparison operators such as =,
<>, !=, >, <, >=, <=.
1. IN
The IN operator allows the specification of a set of expressions (or constants) that are
subsequently used for the query search. This operator can be applied to a subquery
for the same reason—that is, when the result of an inner query contains a set of
values. Example 6.52 shows the use of the IN operator in a subquery.
Get full details of all employees whose department is located in Dallas:
SELECT *
FROM employee
WHERE dept_no IN
(SELECT dept_no
FROM department
WHERE location = 'Dallas');
2. <> ANY
The <> ANY operator returns true if the value is not equal to any value in the list.
SELECT * FROM Orders
WHERE Amount <> ANY (SELECT Amount FROM Orders WHERE Amount IN (100,
200));
This query will return all orders where the amount is not equal to either
100 or 200.
3. = ALL
The = ALL operator returns true if the value is equal to all values in the list.
SELECT * FROM Orders
WHERE Amount = ALL (SELECT Amount FROM OrdersWHERE Amount IN (500,
500));
This query will return all orders where the amount is equal to both 500 and 500 (which
essentially means equal to 500).
when using ALL, the condition must be true when compared to every single result in
the set.
4. EXISTS:
The EXISTS operator is used to test for the existence of any record in a subquery.
56
The EXISTS operator returns TRUE if the subquery returns one or more records.
EXISTS Syntax
SELECT column_name(s)
FROM table_name
WHERE EXISTS
(SELECT column_name FROM table_name WHERE condition);
EXAMPLE:
SELECT SupplierName
FROM Suppliers
WHERE EXISTS (SELECT ProductName FROM Products WHERE [Link] =
[Link] AND Price = 22);
Aggregate Operators:
o Aggregation operators is used to perform the calculations on multiple rows of a single
column of a table. It returns a single value.
o It is also used to summarize the data.
1. COUNT:
o COUNT function is used to Count the number of rows in a database table. It can work on both
numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified table.
COUNT(*) considers duplicate and Null.
57
Syntax
COUNT(*)
or
COUNT( [ALL|DISTINCT] expression )
Example:
SELECT COUNT(*)
FROM PRODUCT_MAST;
Example:
SELECT COUNT(*)
FROM PRODUCT_MAST;
WHERE RATE>=20;
Example:
SELECT COUNT(DISTINCT COMPANY)
FROM PRODUCT_MAST;
Example:
SELECT COMPANY, COUNT(*)
FROM PRODUCT_MAST
GROUP BY COMPANY;
2. SUM:
Sum function is used to calculate the sum of all selected columns. It works on numeric fields
only.
Syntax
SUM()
or
SUM( [ALL|DISTINCT] expression )
58
SELECT SUM(COST)
FROM PRODUCT_MAST;
3. AVG:
The AVG function is used to calculate the average value of the numeric type. AVG function
returns the average of all non-Null values.
Syntax
AVG()
or
AVG( [ALL|DISTINCT] expression )
Example:
SELECT AVG(COST)
FROM PRODUCT_MAST;
4. MAX:
MAX function is used to find the maximum value of a certain column. This function determines
the largest value of all selected values of a column.
Syntax
MAX()
or
MAX( [ALL|DISTINCT] expression )
Example:
SELECT MAX(RATE)
FROM PRODUCT_MAST;
59
5. MIN:
MIN function is used to find the minimum value of a certain column. This function determines
the smallest value of all selected values of a column.
Syntax
MIN()
or
MIN( [ALL|DISTINCT] expression )
Example:
SELECT MIN(RATE)
FROM PRODUCT_MAST;
Null Values::
Null values are special values in DBMS that represent values which are
unknown and are always different from zero value.
NULL values could be possible because at the time of data entry information is not
available. So SQL supports a special value known as NULL which is used to represent
the values of attributes that may be unknown or not apply to a tuple. SQL places a
NULL value in the field in the absence of a user-defined value.
60
IS NOT NULL – equals the operation '<> NULL' (records without NULL values)
IS NULL:
It is not possible to test for NULL values with comparison operators, such as =, <, or
<>.
We will have to use the IS NULL and IS NOT NULL operators instead. The IS NULL operator
is used to test for empty values (NULL values).
IS NULL Syntax
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
The following SQL lists all customers with a NULL value in the "Address" field:
Example:
SELECT CustomerName, ContactName, Address
FROM Customers
WHERE Address IS NULL;
IS NOT NULL:
The IS NOT NULL operator is used to test for non-empty values (NOT NULL values).
SELECT column_names
FROM table_name
WHERE column_name IS NOT NULL;
The following SQL lists all customers with a value in the "Address" field:
Example
SELECT CustomerName, ContactName, Address
FROM Customers
WHERE Address IS NOT NULL;
61
are also used to serve as conjunctions for multiple conditions in a statement.
AND − Returns TRUE if both component conditions are TRUE. Returns FALSE if either
is FALSE; otherwise returns UNKNOWN.
For example,
OR − Return TRUE if either component condition is TRUE. Return FALSE if both are
FALSE. Otherwise, return UNKNOWN.
For example,
For example,
Outer Joins:
Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins: LEFT JOIN returns only unmatched rows from the
left table, as well as matched rows in both tables
Outer joins are joins that return matched values and unmatched values from either or both
tables. There are a few types of outer joins:
LEFT JOIN returns only unmatched rows from the left table, as well as matched rows
in both tables.
RIGHT JOIN returns only unmatched rows from the right table , as well as matched
rows in both tables.
62
FULL OUTER JOIN returns unmatched rows from both tables,as well as matched rows
in both tables.
Note: LEFT JOIN is also refered to as OUTER LEFT JOIN. RIGHT JOIN is also refered to
as OUTER RIGHT JOIN. FULL OUTER JOIN is also refered to as OUTER JOIN.
o In the SQL outer JOIN, all the content from both the tables is integrated together.
o Even though the records from both the tables are matched or not, the matching and non-
matching records from both the tables will be considered an output of the outer join in
SQL.
63
Triggers and Active Databases:
Triggers:
Use of trigger
To implement any complex business rule, that cannot be implemented using integrity constraints.
Triggers will be used to audit the process. For example, to keep track of changes made to a table.
Trigger is used to perform automatic action when another concerned action takes place.
Active Databases:
64
Features of Active Database:
MODULE-3: UNIT-2
Topic-1: Problem Caused by Redundancy
Redundancy in DBMS is having several copies of the same data in the database. Redundancy
in DBMS occurs when the database is not normalized. Redundancy causes insertion, deletion,
and updation anomalies. Redundancy can be avoided by normalizing the database, maintaining
master data
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then insertion will
not be possible till the time course is decided for the student.
This problem happens when the insertion of a data record is not possible without
adding some additional unrelated data to the record.
Deletion Anomaly:
If the details of students in this table are deleted then the details of the college will also get
deleted which should not occur by common sense. This anomaly happens when the deletion
of a data record results in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other information in the table
as well.
65
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the database
which will be time-consuming and computationally costly.
Topic-2: Decompositions
The process of breaking up or dividing a single relation into two or more sub relations is called
as decomposition of a relation. No information is lost from the original relation during
decomposition. When the sub relations are joined back, the same relation is obtained that was
decomposed.
Types of Decomposition
Decomposition is of two major types in DBMS:
Lossless
Lossy
1. Lossless Decomposition
A decomposition is said to be lossless when it is feasible to reconstruct the original relation R
using joins from the decomposed tables. It is the most preferred choice. This way, the
information will not be lost from the relation when we decompose it. A lossless join would
eventually result in the original relation that is very similar.
For example,
Let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider that it is
decomposed into: A1, A2, A3, . . . . An; with instance: a1, a2, a3, . . .. an, If a1 ⋈ a2 ⋈ a3 . . .
. ⋈ an, then it is known as ‘Lossless Join Decomposition’.
Just like the name suggests, whenever we decompose a relation into multiple relational
schemas, then the loss of data/information is unavoidable whenever we try to retrieve the
original relation.
2. Lossy Decomposition
Just like the name suggests, whenever we decompose a relation into multiple relational
schemas, then the loss of data/information is unavoidable whenever we try to retrieve the
original relation.
66
Properties of Decomposition
Decomposition must have the following properties:
2. Dependency Preservation
1. Loss of Information.
2. Loss of Functional Dependency. ...
3. Increased Complexity. ...
4. Redundancy.
5. Performance Overhead.
1. Loss of Information
Non-loss decomposition: When a relation is decomposed into two or more smaller relations,
and the original relation can be perfectly reconstructed by taking the natural join of the
decomposed relations, then it is termed as lossless decomposition. If not, it is termed "lossy
decomposition."
Example: Let's consider a table `R(A, B, C)` with a dependency `A → B`. If you decompose it
into `R1(A, B)` and `R2(B, C)`, it would be lossy because you can't recreate the original table
using natural joins.
Once tables are decomposed, certain functional dependencies might not be preserved, which
can lead to the inability to enforce specific integrity constraints.
Example: If you have the functional dependency `A → B` in the original table, but in the
decomposed tables, there is no table with both `A` and `B`, this functional dependency can't be
preserved.
3. Increased Complexity
67
Decomposition leads to an increase in the number of tables, which can complicate
queries and maintenance tasks. While tools and ORM (Object-Relational Mapping)
libraries can mitigate this to some extent, it still adds complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some cases, can
even introduce new redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead to more
complex SQL queries involving multiple joins, which can introduce performance
overheads.
Functional dependencies play a vital role in the normalization process in relational database
design. They help in defining the relationships between attributes in a relation and are used
to formalize the properties of the relation and drive the process of decomposition.
1. Trivial Dependency
- If Y is a subset of X, then the dependency X -> Y is trivial.
- For example, in {A, B} -> {A}, the dependency is trivial because A is part of {A, B}.
Example: For attributes A,B,C:
68
2. Full Functional Dependency
- An attribute functionally depends on a set of attributes, X, and does not functionally depend
on any proper subset of X.
Example: Consider a relation StudentCourses that has the following attributes:
3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
Example: Consider a relation Employees with the following attributes:
EmployeeID (unique identifier for each employee)
EmployeeName
Department (department in which the employee works)
DepartmentLocation (location of the department)
Now, let's consider the following functional dependencies:
EmployeeID → Department
Department → DepartmentLocation
From the above functional dependencies:
An EmployeeID determines the Department an employee works in.
A Department determines its DepartmentLocation.
However, the DepartmentLocation is also dependent on the EmployeeID through Department.
This means the DepartmentLocation has a transitive dependency on EmployeeID via
Department.
69
.
Topic-5: Normal Forms
First Normal Form (1NF): This is the most basic level of normalization. In 1NF,
each table cell should contain only a single value, and each column should have a
unique name. The first normal form helps to eliminate duplicate data and simplify
queries.
The first normal form (1NF) is the first step in normalizing a table by reducing
confusion and redundancy. In 1NF, we remove the redundant columns (columns
with the same name and/or data) and redundant fields (such as a full name field
when we already have first and last names), and add a primary key.
Example 1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
70
Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that each
column should be directly related to the primary key, and not to other columns.
A relation is said to be in the 2nd Normal Form in DBMS (or 2NF) when it is in the
First Normal Form but has no non-prime attribute functionally dependent on any
candidate key's proper subset in a relation. A relation's non-prime attribute refers
to that attribute that isn't a part of a relation's candidate key.
Example:
Look at the table given below: NOT IN SECOND NORMAL FORM :
111 S1 1000
222 S2 1500
111 S4 2000
444 S3 1000
444 S1 1000
222 S5 2000
71
Table 1:
CAND_NO SUBJECT_NO
111 S1
222 S2
111 S4
444 S3
444 S1
222 S5
Table 2:
SUBJECT_NO SUBJECT_FEE
S1 1000
S2 1500
S3 1000
S4 2000
S5 2000
Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that each column should be
directly related to the primary key, and not to any other columns in the same table.
A given relation is said to be in its third normal form when it's in 2NF but has no
transitive partial dependency. Meaning, when no transitive dependency exists for
the attributes that are non-prime, then the relation can be said to be in 3NF.
Example:
EMPLOYEE_DETAIL table : NOT IN THIRD NORMAL FORM
72
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
EMPLOYEE table:
EMPLOYEE_ZIP table:
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
73
462007 MP Bhopal
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures
that each determinant in a table is a candidate key. In other words, BCNF ensures
that each non-key attribute is dependent only on the candidate key.
BCNF is a normal form that applies to relational databases that have a primary key,
which is a column or a combination of columns that uniquely identifies each row in
a table. BCNF states that every non-key column in a table should depend on the
whole primary key, and not on any subset of it.
Computer
101 CSE 1 202
Networks
VLSI
102 ECE 3 401
Technology
Mobile
102 ECE Communicati 3 402
on
IN BCNF:
Stu_Branch Table
Stu_ID Stu_Branch
101 CSE
102 ECE
74
Stu_Course Table
Stu_ID Stu_Course_No
101 201
101 202
102 401
102 402
When the sub relations combine again then the new relation must be the same as the
original relation was before decomposition.
75
The decomposition is lossless when it satisfies the following statement −
If we union the sub Relation R1 and R2 then it must contain all the attributes that are available in the
original relation R before decomposition.
Intersections of R1 and R2 cannot be Null. The sub relation must contain a common attribute. The
common attribute must contain unique data.
EXAMPLE:
R (A, B, C)
A B C
12 25 34
10 36 09
12 42 30
R1 (A, B)
A B
12 25
10 36
12 42
R2 (B, C)
B C
25 34
36 09
76
B C
42 30
R1U R2 = R
A B C
12 25 34
10 36 09
12 42 30
The relation is the same as the original relation R. Hence, the above decomposition is
Lossless-join decomposition.
77
R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and
R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
so, F' = F.
And so, F'+ = F+.
.
Topic-8: Schema Refinement in Database design
The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
Identifying and clearing the future problems in the database is called schema refinement.
In this refinement main problem is data redundancy. It is avoided by normalization
technique.
The Basic Goal of Normalization is used to eliminate redundancy.
Redundancy refers to repetition of same data or duplicate copies of same data stored in
different locations.
Multivalued Dependency (MVD) is a form of data dependency where two or more attributes,
other than the key attribute, are functionally dependent on each other, but not on the key itself.
Data errors and redundancies may result from Multivalued Dependency.
o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.
78
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
The Fourth Normal Form (4NF) is a level of database normalization where there are no non-
trivial multivalued dependencies other than a candidate key. It builds on the first three normal
forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal Form (BCNF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
79
A table with a multivalued dependency violates the normalization standard of the Fourth
Normal Form (4NF) because it creates unnecessary redundancies and can contribute to
inconsistent data. To bring this up to 4NF, it is necessary to break this information into two
tables.
.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
80
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Topics Covered:
81
Transaction Concept:
Transaction in Database Management Systems (DBMS) can be defined as a set of
logically related operations. It is the result of a request made by the user to access the
contents of the database and perform operations on it. It consists of various operations
and has various states in its completion journey. It also has some specific properties
that must be followed to keep the database consistent.
Operations of Transaction
A user can make different types of requests to access and modify the contents of a
database. So, we have different types of operations relating to a transaction. They are
discussed as follows:
i) Read(X)
A read operation is used to read the value of X from the database and store it in a
buffer in the main memory for further actions such as displaying that value.
ii) Write(X)
A write operation is used to write the value to the database from the buffer in the main
memory. For a write operation to be performed, first a read operation is performed to
bring its value in buffer, and then some changes are made to it,
iii) Commit
This operation in transactions is used to maintain integrity in the database. Due to
some failure of power, hardware, or software, etc., a transaction might get interrupted
before all its operations are completed. a commit operation is performed to the
changes made by a transaction permanently to the database.
iv) Rollback
This operation is performed to bring the database to the last saved state when any
transaction is interrupted in between due to any power, hardware, or software failure.
Transaction State:
82
These are different types of Transaction States :
Active State –
When the instructions of the transaction are running then the transaction is in active
state. If all the ‘read and write’ operations are performed without any error then it
goes to the “partially committed state”; if any instruction fails, it goes to the “failed
state”.
1. Partially Committed –
After completion of all the read and write operation the changes are made in main
memory or local buffer.
2. Failed State –
When any instruction of the transaction fails, it goes to the “failed state” or if
failure occurs in making a permanent change of data on Data Base.
3. Aborted State –
After having any type of failure the transaction goes from “failed state” to “aborted
state” and since in previous states, the changes are only made to local buffer or
main memory and hence these changes are deleted or rolled-back.
4. Committed State –
It is the state when the changes are made permanent on the Data Base and the
transaction is complete and therefore terminated in the “terminated state”.
5. Terminated State –
If there isn’t any roll-back or the transaction comes from the “committed state”,
then the system is consistent and ready for new transaction and the old
transaction is terminated.
This scheme is based on making copies of the Database. These copies of Databases are also
known as Shadow Copies. Suppose there is an active transaction T1. Now, A pointer called
Db_pointer is maintained on the disk, which points to the current copy of the Database. The
transaction T1 that want to update the Database will first create a complete copy of the
Database. All further operations are done on new copy and leave that original copy
untouched.
If at any point transaction T1 is aborted, then system delete new copy and the old copy
is not affected. It remain untouched.
if Transaction T1 success then OS (Operating System) will make sure that all the pages
of new copy of Database are written in the disk. Database system update Db_pointer
to point to the new copy of Database. Now, new copy will become current copy of
Database. Also old copy of Database will be deleted. Transaction T1 is said to
committed only if updated Db-pointer is written on disk.
Concurrent Executions:
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
is executed simultaneously on a multi-user system by different users. While working on the
database transactions, there occurs the requirement of using the database by multiple users
for performing different operations, and in that case, concurrent execution of the database is
performed.
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of the
transaction operations, there occur several challenging problems that need to be solved.
Serializability:
Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule.
It helps in maintaining the transactions to execute simultaneously without interleaving one
another. In simple words, serializability is a way to check if the execution of two or more
transactions are maintaining the database consistency or not.
84
Types of Serializability
In DBMS, there are different types of serializable. Each type of serializable has some
advantages and disadvantages. The two most common types of serializable are view
serializability and conflict serializability.
1. Conflict Serializability
Conflict serializability is a type of conflict operation in serializability that operates the same data
item that should be executed in a particular order and maintains the consistency of the
database. In DBMS, each transaction has some unique value, and every transaction of the
database is based on that unique value of the database.
2. View Serializability
View serializability is a type of operation in the serializable in which each transaction should
produce some result and these results are the output of proper sequential execution of the data
item. Unlike conflict serialized, the view serializability focuses on preventing inconsistency in
the database. In DBMS, the view serializability provides the user to view the database in a
conflicting way.
Recoverability:
Recoverability is a property of database systems that ensures that, in the event of a failure or
error, the system can recover the database to a consistent state. Recoverability guarantees
that all committed transactions are durable and that their effects are permanently stored in
the database, while the effects of uncommitted transactions are undone to maintain data
consistency.
The recoverability property is enforced through the use of transaction logs, which record all
changes made to the database during transaction processing. When a failure occurs, the
system uses the log to recover the database to a consistent state, which involves either
undoing the effects of uncommitted transactions or redoing the effects of committed
transactions.
Implementation of Isolation:
Isolation is one of the core ACID properties of a database transaction, ensuring that
the operations of one transaction remain hidden from other transactions until
completion. It means that no two transactions should interfere with each other and
affect the other's intermediate state.
Implementing isolation typically involves concurrency control mechanisms. Here are
common mechanisms used:
85
1. Locking Mechanisms
Locking ensures exclusive access to a data item for a transaction. This means that while one
transaction holds a lock on a data item, no other transaction can access that item.
Shared Lock (S-lock): Allows a transaction to read an item but not write to it.
Exclusive Lock (X-lock): Allows a transaction to read and write an item. No other
transaction can read or write until the lock is released.
Two-phase Locking (2PL): This protocol ensures that a transaction acquires all the locks
before it releases any. This results in a growing phase (acquiring locks and not releasing
any) and a shrinking phase (releasing locks and not acquiring any).
2. Timestamp-based Protocols
Every transaction is assigned a unique timestamp when it starts. This timestamp
determines the order of transactions. Transactions can only access the database if they
respect the timestamp order, ensuring older transactions get priority.
Precedence Graph
It can be described as a Graph G(V, E) with vertices V = “V1, V2, V3,…, Vn” and
directed edges E = “E1, E2, E3,…, En”. One of the two operations—READ or WRITE—
performed by a certain transaction is contained in the collection of edges. Where Ti ->
Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.
86
Lock Based Protocols:
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate
lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete or
update on it. It will unlock the data item after completing the transaction.
87
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution phase of the transaction into three
parts.
o In the first part, when the execution of the transaction starts, it seeks permission for the
lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
88
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks, the
transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after
using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a time.
Timestamp-based protocols in dbms are used to order the transaction in ascending order of
their creation time. The creation time is the system time or a logical counter.
The transaction which is created first or you can say older transactions are given high priority
over new transactions.
For example, if there are two transactions T1 and T2. T1 enters the system at 008 and T2 enters
the system at 009 then T1 is given priority over T2.
89
Timestamp-based protocols in dbms order the transaction according to their transaction
timestamps. A schedule that is ordered in the serial order of their transaction timestamp is the
only serializable schedule equivalent to the timestamp-ordered-based transaction schedule.
In order to ensure that the conflicting operation occurring in the schedule does not violate the
timestamp ordering two-time stamp values relating to each database item (X) are used :
W_TS(X) (write timestamp) is the largest timestamp of any transaction that executed
write(X) successfully.
R_TS(X) (read timestamp) is the largest timestamp of any transaction that executed
read(X) successfully.
In this technique, no checking is done while the transaction is been executed. Until the
transaction end is reached updates in the transaction are not applied directly to the database.
All updates are applied to local copies of data items kept for the transaction. At the end of
transaction execution, while execution of the transaction, a validation phase checks whether
any of transaction updates violate serializability. If there is no violation of serializability the
transaction is committed and the database is updated; or else, the transaction is updated and
then restarted.
1. Read Phase:
Values of committed data items from the database can be read by a transaction.
Updates are only applied to local data versions.
2. Validation Phase:
Checking is performed to make sure that there is no violation of serializability
90
when the transaction updates are applied to the database.
3. Write Phase:
On the success of the validation phase, the transaction updates are applied to the
database, otherwise, the updates are discarded and the transaction is slowed
down.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.
Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
91
MODULE-4: UNIT-2
UNIT-2: Syllabus: Recovery System
Failure Classification, Storage Structure, Recovery and Atomicity, Log Based Recovery,
Recovery and Concurrent Transactions, Buffer Management, Failure and Loss of Non-
volatile Storage, Advance Recovery Techniques, Remote Backup Systems.
Failure Classification:
To find that where the problem has occurred, we generalize a failure into the following
categories:
1. Transaction failure
2. System crash
3. Disk failure
92
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called as
transaction failure.
1. Logical errors: If a transaction cannot complete due to some code error or an internal
error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because
the database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.
2. System Crash
o System failure can occur due to power failure or other hardware or software
failure. Example: Operating system error.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash, and
unreachability to the disk or any other failure, which destroy all or part of disk storage.
Storage Structure:
The electromagnetic devices used in database systems for data storage are classified as
follows:
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory
93
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the
central processing unit. The primary memory must, in general, function. ..
The size of these devices is considerably smaller and they are volatile.
According to performance and speed, the primary memory devices are the fastest
devices, and this feature is in direct correlation with their capacity.
These primary memory devices are usually more expensive due to their increased speed
and performance.
2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that
can be accessed for storing data that will be needed at a later point in time for various
purposes or database actions. ,
Example:
Flash Memory: Flash memory, also known as flash storage, is a type of nonvolatile
memory that erases data in units called blocks and rewrites data at the byte level. Flash
memory is widely used for storage and data transfer in consumer devices, enterprise
systems, and industrial applications.
94
Magnetic Disk Storage: A Magnetic Disk is a type of secondary memory that is a flat disc
covered with a magnetic coating to hold information. It is used to store various programs
and files.
3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data
without being constantly connected to the server or the peripherals. A device of this type is
connected either to a server or to a device where the database is stored from the outside.
Due to the fact that tertiary storage provides more space than other types of device
memory but is most slowly performing, the cost of tertiary storage is lower than primary
and secondary. .
The ability to use secondary devices and to delete the contents of the tertiary devices is
similar.
Example:
Optical Storage: Typically data written on CDs and DVDs are examples of Optical
Storage.
Tape Storage: Tape Storage is a type of storage data where we use magnetic tape to
store data. It is used to store data for a long time and also helps in the backup of data in
case of data loss.
The recovery procedures in DBMS ensure the database's atomicity and durability. If a system
crashes in the middle of a transaction and all of its data is lost, it is not regarded as durable. If
just a portion of the data is updated during the transaction, it is not considered atomic. Data
recovery procedures in DBMS make sure that the data is always recoverable to protect the
durability property and that its state is retained to protect the atomic property. The procedures
listed below are used to recover data from a DBMS,
95
The atomicity attribute of DBMS safeguards the data state. If a data modification is performed,
the operation must be completed entirely, or the data's state must be maintained as if the
manipulation never occurred. This characteristic may be impacted by DBMS failure brought on
by transactions, but DBMS recovery methods will protect it.
Log is nothing but a file which contains a sequence of records, each log record refers
to a write operation. All the log records are recorded step by step in the log file. We
can say, log files store the history of all updates activities.
Log contains start of transaction, transaction number, record number, old value, new
value, end of transaction etc. For example, mini statements in bank ATMs.
Redo(Ti) − All data items updated by the transaction Ti are set to a new value.
Undo(Ti) − All data items updated by the transaction Ti, are set to old value.
Redo(Ti) − All data items updated by the transaction Ti are set to a new value.
96
Recovery and Concurrent Transactions:
Whenever more than one transaction is being executed, then the interleaved of logs occur.
During recovery, it would become difficult for the recovery system to backtrack all logs and then
start recovering.
Recovery with concurrent transactions can be done in the following four ways.
1. Interaction with concurrency control
2. Transaction rollback
3. Checkpoints
4. Restart recovery
Transaction Rollback
Sometimes, instead of recovering the whole system, it's more efficient to just rollback a
particular transaction that has caused inconsistency or when a deadlock occurs.
When errors are detected during transaction execution (like constraint violations) or if a user
issues a rollback command, the system uses the logs to undo the actions of that specific
transaction, ensuring the database remains consistent.
Checkpoints
Checkpointing is a technique where the DBMS periodically takes a snapshot of the current state
of the database and writes all changes to the disk. This reduces the amount of work during
recovery.
During the checkpoint, ongoing transactions might be temporarily suspended, or their logs
might be force-written to the stable storage, depending on the implementation.
97
Restart Recovery
In the case of a system crash, the recovery manager uses the logs to restore the database to
the most recent consistent state. This process is called restart recovery.
The redo phase then ensures that all logged updates of committed transactions are reflected in
the database. Lastly, the undo phase rolls back the transactions that were active during the
crash to ensure atomicity.
Buffer Management:
Data is read and written to and from the disc by a DBMS. In general, disc I/O operations
take longer than memory operations. Data that is often accessed is kept in memory by
the database buffer, which improves speed by lowering the need for disc I/O
operations.
The database buffer reduces the amount of time it takes to retrieve frequently used
data by storing it in memory.
Data consistency is managed by the database buffer as well. When an application alters
data in the buffer, the buffer labels the modified data as "dirty." When there is enough
room in the buffer or when the data is no longer required, the buffer sends the
modifications back to the disc.
98
Supporting Concurrency Control
Concurrency management is also supported by the database buffer. The buffer makes
sure that the data is consistent and that all apps may access it without interfering with
one another when several applications are using the same data at once.
Once this restoration has been accomplished, the system uses the log to bring the database
system to the most recent consistent state.
More precisely, no transaction may be active during the dump procedure, and a procedure
similar to checkpointing must take place:
1. Output all log records currently residing in main memory onto stable storage.
To recover from the loss of nonvolatile storage, the system restores the database to disk by
using the most recent dump. Then, it consults the log and redoes all the transactions that
have committed since the most recent dump occurred. Notice that no undo operations need to
be executed.
A dump of the database contents is also referred to as an archival dump, since we can
archive the dumps and use them later to examine old states of the database. Dumps of a
database and checkpointing of buffers are similar. .
99
Remote backup systems provide a wide range of availability, allowing the transaction
processing to continue even if the primary site is destroyed by a fire, flood or
earthquake.
Data and log records from a primary site are continuously backed up into a remote
backup site.
One can achieve ‘wide range availability’ of data by performing transaction processing
at one site, called the ‘primary site’, and having a ‘remote backup’ site where all the
data from the primary site are duplicated.
The remote site is also called ‘secondary site’.
The remote site must be synchronized with the primary site, as updates are performed
at the primary.
In designing a remote backup system, the following points are important.
a) Detection of failure: It is important for the remote backup system to detect when the
primary has failed.
b) Transfer of control: When the primary site fails, the backup site takes over the
processing and becomes the new primary site.
c) Time to recover: If the log at the remote backup becomes large, recovery will take a
long time.
MODULE-5: UNIT-1
UNIT-1: Syllabus: Storage and Indexing
Data on External Storage, File Organization and indexing, Cluster indexes, Primary and
Secondary indexes, Index Data Structures, Hash Based indexing, Tree Based indexing,
Comparison of File Organizations, Indexes and Performance.
Topics:
Data on External Storage:
100
An external storage device, also referred to as auxiliary storage and secondary storage, is a
device that contains all the addressable data storage that is not inside a computer's main
storage or memory.
External storage enables users to store data separately from a computer's main or primary
storage and memory at a relatively low cost. It increases storage capacity without having to
open up a system.
External storage often stores information that's accessed less frequently by applications running
on a desktop, laptop, server or mobile device, such as an Android or iOS smartphone or tablet.
For PCs, an external storage device often consists of stationary or portable hard disk drives
(HDDs) or solid-state drives (SSDs), possibly attached through a USB or wirelessly.
For enterprises, an external storage device can serve as primary storage connected to servers
through Ethernet or Fibre Channel switches, or as secondary storage for backup and archiving
purposes. External storage offers HDD, all-flash and hybrid storage arrays for block-based, file-
based or object storage, or a mix of these three protocols known as unified storage. Storage
area networks (SANs) for block-level storage and network-attached storage (NAS) devices for
file-based storage are examples of external storage.
File organization and indexing are fundamental concepts in database management systems
(DBMS) that deal with efficient storage, retrieval, and management of data.
File Organization
File organization refers to the arrangement of data on storage devices. The method chosen can
have a profound effect on the efficiency of various database operations.
101
Common methods of file organization include:
Ordered Records: Records in a sequential file are stored based on a key field.
Continuous Memory Allocation: The records are stored in contiguous memory locations.
No Direct Access: To access a record, you have to traverse from the first record until you find
the desired one.
Less Overhead: There's no need for complex algorithms or mechanisms to store records.
Hash Function: A hash function converts a record's key value into an address.
Buckets: A bucket typically stores one or more records. A hash function might map multiple
keys to the same bucket.
No Ordering of Records: Records are not stored in any specific logical order.
Primary Data File: The actual database file where records are stored.
Index: An auxiliary file that contains key values and pointers to the corresponding records in the
data file.
102
Multi-level Index: Sometimes, if the index becomes large, a secondary (or even tertiary) index
can be created on the primary index to expedite searching further.
Indexing
Indexing involves creating an auxiliary structure (an index) to improve data retrieval times. Just
like the index in the back of a book, a database index provides pointers to the locations of
records.
Structure of Index
We can create indices using some columns of the database.
|-------------|----------------|
| Search Key | Data Reference |
|-------------|----------------|
The search key is the database’s first column, and it contains a duplicate or copy of the table’s
candidate key or primary key. The primary key values are saved in sorted order so that the
related data can be quickly accessible.
The data reference is the database’s second column. It contains a group of pointers that point to
the disk block where the value of a specific key can be found.
Cluster indexes
Clustering indexing is a database indexing technique that is used to physically arrange
the data in a table based on the values of the clustered index key. This means that the
rows in the table are stored on disk in the same order as the clustered index key. With
a clustered index, the database can more efficiently retrieve data because it doesn’t
have to scan the entire table to find the data it needs. Instead, it can use the clustered
index to quickly locate the data, resulting in faster query execution times and improved
overall performance.
103
Advantages
Improved Query Performance: Clustering indexing results in faster query
performance, as the data is stored in a way that makes it easier to retrieve the
desired information.
Reduced Disk Space Usage: Clustering indexing reduces the amount of disk
space required to store the index. This is because the index contains only the
information necessary to retrieve the data, rather than storing a copy of the data
itself.
Better Handling of Complex Queries: Clustering indexing provides better
performance for complex queries that involve multiple columns.
Improved Insert Performance: Clustering indexing can result in improved insert
performance, as the database does not have to update the index every time a new
record is inserted.
Improved Data Retrieval: Clustering indexing can also improve the efficiency of
data retrieval operations. In a clustered index, the data is stored in a logical order,
which makes it easier to locate and retrieve the data.
.
Primary and Secondary indexes
Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the
records.
o As primary keys are stored in sorted order, the performance of the searching operation is quite
efficient.
Secondary indexes
For example:
104
o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at this
level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using
the address 110, it goes to the data block and starts searching each record till it gets 111.
o The first column of the database is the search key that contains a copy of the primary key or
candidate key of the table. The values of the primary key are stored in sorted order so that the
corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers holding
the address of the disk block where the value of the particular key can be found.
105
Hashing in DBMS is a technique to quickly locate a data record in a database
irrespective of the size of the database. For larger databases containing thousands
and millions of records, the indexing data structure technique becomes very inefficient
because searching a specific record through indexing will consume more time. This
doesn’t align with the goals of DBMS, especially when performance and date retrieval
time are minimized. So, to counter this problem hashing technique is used. In this
article, we will learn about various hashing techniques.
Types of Hashing in DBMS
1. Static Hashing
In static hashing, the hash function always generates the same bucket’s address. For
example, if we have a data record for employee_id = 107, the hash function is mod-5
which is – H(x) % 5, where x = id. Then the operation will take place like this:
2. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that
frequently changes data sets. This method offers us a way to add and remove data
buckets on demand dynamically. This way as the number of data records varies, the
buckets will also grow and shrink in size periodically whenever a change is made.
106
Comparison of File Organizations
In sequential file organization, records are stored in sequence, one after the other, based on a
key field. This key field is a unique identifier for records, ensuring that they have some order.
The records are inserted at the end of the file, ensuring the sequence is maintained.
Heap File
Heap File Organization works with data blocks. In this method records are inserted at the
end ofthe file, into the data blocks. No Sorting or Ordering is required in this method. If a data
block isfull, the new record is stored in some other block, Here the other data block need not
be the verynext data block, but it can be any block in the memory. It is the responsibility of
DBMS to storeand manage the new records.
In hash file organization, a hash function is used to compute the address of a block (or bucket)
where the record is stored. The value returned by the hash function using a record's key value
is its address in the database.
107
Indexed File Organization
Indexed file organization is a method used to store and retrieve data in databases. It is designed
to provide quick random access to records based on key values. In this organization, an index
is created which helps in achieving faster search and access times.
Overflow areas are designated for insertion of new records, which keeps the main file in sequence.
Periodically, the overflow area can be merged back into the main file.
.
UNIT-2: Syllabus: Tree Structured Indexing
108
Indexed Sequential Access Method s (ISAM), B+ Trees, A Dynamic Index Structure
If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.
Advantages:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.
Disadvantages:
o This method requires extra space in the disk to store the index value.
109
o When the new records are inserted, then these files have to be reconstructed to maintain the
sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise, the
performance of the database will slow down.
B+ Trees:
A B+ Tree is a type of self-balancing tree structure commonly used in databases and file
systems to maintain sorted data in a way that allows for efficient insertion, deletion, and search
operations. B+ Trees are an extension of B-Trees but differ mainly in the way they handle leaf
nodes, which contain all the key values and point to the actual records.
3. The tree is perfectly balanced, meaning that all leaf nodes are at the same level.
4. All keys are stored in the leaf nodes, and the internal nodes act as 'guides' to locate the leaf nodes
faster.
Operations on B+ Trees:
1. Search: Starts at the root and traverses down the tree, guided by the key values in each node,
until it reaches the appropriate leaf node.
2. Insert: Inserts a new key-value pair and then reorganizes the tree as needed to maintain its
properties.
3. Delete: Removes a key-value pair and then reorganizes the tree, again to maintain its properties.
110
The purpose of indexing is to provide an efficient way to search from a large collection of the
database. In order to generate meaningful retrieval results, recent retrieval systems have
incorporated users' relevance feedback to modify the retrieval process.
This makes them more flexible and scalable, capable of handling varying workloads and data
distributions efficiently.
Dynamic index structures are crucial for databases that are expected to evolve over time,
especially when it comes to insertion, deletion, and updating of records. They maintain balanced
tree structures, self-adjust, and optimize, ensuring that the database operations remain efficient
even as the data grows or changes.
B-Trees and B+-Trees: These are perhaps the most well-known examples of dynamic index
structures. B-Trees and B+-Trees automatically split or merge nodes to maintain a balanced
tree, ensuring that search, insert, and delete operations take logarithmic time. They are widely
used in various types of databases and filesystems.
111
The UNION operator combines the results of two or more SELECT statements into a single result set, with only distinct values being included by default . It requires the same number of columns and compatible data types across all SELECT statements . In contrast, the INTERSECT operator returns only the common rows found in both result sets, effectively an intersection . While UNION eliminates duplicates and allows ordering, INTERSECT inherently has no duplicates and also arranges data in ascending order by default. The choice between these operators impacts query results by either combining distinct datasets or focusing only on mutual elements .
A hash function is used in hash file organization to convert a record's key value into an address, which determines the data block where the record is stored . This method allows for quick access to records by directly calculating the data location without needing sequential searching. However, it means that the data is not stored in a specific order, possibly causing collisions where multiple records are mapped to the same bucket. This necessitates mechanisms to handle such collisions, influencing retrieval efficiency .
ISAM's main limitations include its static nature and the need for periodic maintenance of overflow areas . Since ISAM uses a static primary index that does not adjust dynamically with data changes, it can lead to inefficient data access patterns if the data distribution significantly alters. Overflow areas temporarily house new records, which disrupts the sequential ordering, requiring periodic merging back into the main file to restore order and performance. These factors can degrade retrieval performance over time, requiring careful management and maintenance to uphold optimal access speeds .
Indexed and hashed file organizations differ significantly in structure and performance. Indexed file organization uses an auxiliary file (index) that contains key values and pointers to database records, allowing quick random access based on key values . This structure is beneficial for range and partial retrievals due to orderly data storage . In contrast, hashed file organization uses a hash function to compute a record's storage address, offering fast data retrieval for exact queries but often leading to unordered data storage. Hashed organization is typically faster for individual record access but less efficient for range queries due to unordered storage .
SQL JOIN operations are used to selectively combine data from multiple tables. An INNER JOIN returns records with matching values in both tables, typically used when ensuring data inclusion from both datasets . LEFT JOIN returns all records from the left table and matched records from the right table, useful for retaining full data from a primary dataset while including related data from another . RIGHT JOIN is similar, but focuses on preserving all records from the right table. FULL JOIN combines results of both LEFT and RIGHT JOIN operations, including all records from both tables with matches where available, addressing comprehensive data integration scenarios .
In timestamp-based concurrency control protocols, each transaction is given a unique timestamp upon initiation, determining its priority and order in the schedule . The timestamps ensure that operations comply with their temporal order, maintaining consistency. Two-time stamp values, R_TS (read) and W_TS (write), track the timestamps of successful read and write operations on database items, respectively. These timestamps prevent conflicting transactions from violating the set order, thus enforcing serializability by giving priority to older transactions, ensuring isolation and consistency in transaction execution .
View serializability is used to prevent database inconsistency by ensuring that the database reflects a sequence of transactions as if they were executed in isolation, even if they were interleaved in practice . It maintains consistency by allowing the user to perceive transactions in a serial order that could produce the same results as if the transactions were processed one after the other without interference. The primary focus is on ensuring that the final state of the database and the views drawn from it remain consistent, avoiding conflicts between transactions .
Multiple granularity enhances lock management by hierarchically breaking down the database into blocks that can be independently locked . This approach reduces lock overhead and improves concurrency control by allowing different levels of locking granularity, which can either narrow down or expand depending on the transaction needs. By distinguishing hierarchically between data items, multiple granularity allows flexible lock control, reducing unnecessary locking and unlocking actions, thereby improving database performance and resource efficiency .
DBMS recoverability ensures data consistency post-failure by recording all changes made during transaction processing in transaction logs . These logs enable the system to differentiate between committed and uncommitted transactions. Upon failure, the system reverts uncommitted changes to maintain consistency or redoes committed changes to ensure durability. This process guarantees that the database returns to a consistent state where only durable transactions persist, safeguarding against data inconsistency caused by incomplete or erroneous transactions .
Optimistic concurrency control techniques differ from traditional locking methods by assuming low conflict rates and deferring consistency checks until the transaction's end . Instead of acquiring locks, these techniques use phases: read phase, where data is read and local copies are updated; validation phase, which checks for serializability violations before committing changes; and a write phase to apply updates if validation succeeds. Traditional methods like locking involve acquiring locks during the transaction, potentially causing wait times and deadlocks. Optimistic techniques minimize such contention but risk rollback if validation fails, trading immediate consistency for increased concurrency .