Overview of Database Management Systems
Overview of Database Management Systems
UNIT– I
Database System Applications: A Historical Perspective, File Systems versus a DBMS, the
Data Model, Levels of Abstraction in a DBMS, Data Independence, Structure of a DBMS
Introduction to Database Design: Database Design and ER Diagrams, Entities, Attributes,
and Entity Sets, Relationships and Relationship Sets, Additional Features of the ER Model,
Conceptual Design With the ER Model
1. INTRODUCTION
Data: Data is a piece of information. Data can exist in a variety of forms:
As numbers or text on pieces of paper
As bits and bytes stored in computer memory
As facts stored in a person's mind.
Data is raw information and it does not give correct meaning. The processed data becomes
Information and it gives correct meaning.
2. A HISTORICAL PERSPECTIVE
From the earliest days of computers, storing and manipulating data have been a major application
focus. The first general-purpose DBMS called the Integrated Data Store (IDS) was designed by
Charles Bachman in the early1960s. It formed the basis for the network data model.
In the late 1960s, IBM developed the Information Management System (IMS). This formed
the basis for an alternative data representation framework called the hierarchical data model.
In 1970, Edgar Codd, proposed a new data representation framework called the relational data
model. In a relational data model, the data is stored in the form of table containing rows and
columns. This became very famous database model. The SQL (Structured Query Language) is
the standard query language used to access relational databases.
Several vendors (e.g., IBM's DB2, Oracle 8, etc) developed data warehouses. A Data
warehouse collects data from several databases and this data is used for carrying out specialized
analysis.
In mid 90s, DBMSs have entered the Internet Age. All the database vendors are added
features to their DBMS aimed at making it more suitable for deployment over the Internet.
Database management continues to gain more popularity and more data is brought online to
access through computer networking.
Today the field is being driven by exciting visions such streaming data (youtube, vimeo, etc)
as interactive video (flash, wire wax etc), multimedia databases (facebook, instagram, gaana etc),
digital libraries (DELNET, Shodh ganga, etc). Thus the study of database systems could prove to
be richly rewarding in more.
1. Telecom: A database is required to keep track of the information regarding calls history,
network usage, customer details, generating monthly bills, maintaining balances on
prepaid calling cards etc. Without the database systems it is hard to maintain that huge
amount of data that keeps updating every millisecond. Ex: Airtel, IDEA, Jio, etc
2. Banking System: A database stores bank customer’s information, maintain day to day
credit and debit transactions, generate bank statements etc. Ex: SBI, HDFC, etc
3. Online shopping: The online shopping websites store the product information, your
addresses and preferences, credit details and provide you the relevant list of products
based on your query. Ex: Amazon, Flipkart etc.
4. Airlines: Passenger details, reservation in formation along with flight schedule is stored
in database. Eg: Air India, Indigo, etc
5. Education sector: Database systems are used in schools, colleges and universities tostore
and retrieve the data regarding student details, staff details, course details, exam details,
attendance details, fees details etc. Ex: JNTUH, IITB, etc
6. Sales: To store customer information, stock details and invoice details a database is
needed. Ex: Reliance Fresh, D-Mart etc.
7. Human resources: For information about employees, salaries, payroll taxes, and benefits
and for generation of paychecks a database is required.
8. Credit card transactions: For purchases on credit cards and generation of monthly
statements.
9. Stock market: For storing information about holdings, sales, and purchases of stocks;
also for storing real-time market data to enable online trading.
Backup and Backup and recovery process is not DBMS has a sophisticated backup and
Recovery efficient in files system. recovery techniques.
Process
Concurrent Concurrent access to the data in the DBMS takes care of Concurrent access
access file system has many problems using some form of locking.
User can locates the physical
Physical In DBMS, user is unaware of physical
address of the files to access data in
address address where data is stored.
File System.
Security File system provides less security to DBMS provides more security to the
the data as compared to DBMS. data.
FAT, NTFS and Ext are some MySQL, MS-Access, Oracle, and DB2
Example examples of file systems. are some examples of DBMS.
Figure:NetworkModel
Hierarchical Model
Hierarchical database model organizes data into a tree-like-structure, with a single root, to which
all the other data is linked. In this model, a child node will only have a single parent node.
Figure:HierarchicalData Model
Entity-Relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. ER Model is best used for the conceptual design of a database. While formulating
real-world scenario into the database model, it depend on two important things. They are:
Entity and their attributes
Relationships among entities
Relational Model
The most popular data model in DBMS is the Relational Model. The relational model contains a
set of tables (relations). Each table has a specified number of columns but can have any number
of rows.
Admission No Name Age Class
1001 Ram 15 9
1002 Ajay 14 9
1003 Jhon 14 9
1004 Akbar 15 10
View Level
LogicalLevel
PhysicalLevel
10. Physical level: This is the lowest level of data abstraction. It describes how data is actually
stored in database. It deals with physical memory storage details of records. These details are
often hidden from the programmers.
11. Logical level: This is the middle level of 3-level data abstraction architecture. It describes
what data is stored in database. This level gives details about each attribute data type and size,
the relationship among attributes and defined constraints (Primary key, foreign key etc) on the
table. The programmers generally work at this level because they are aware of such things
about database systems.
12. View level: This is the highest level of data abstraction. This level describes the user
interaction with database system. At this level, user enters the query to get the answer. Many
users may require different sets of fields from a table. Therefore there exist many view levels.
7. DATA INDEPENDENCE
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level without requiring changing the schema at the next higher level. Data
independence helps you to keep data separated from all programs that make use of it.
Physical Data Independence: Physical data independence is the ability to change the
internal schema without having to change the conceptual schema. That is, if we do any changes
in the storage side of the database system server, then the Conceptual structure of the database
will not be affected. For example, in case we want to change or upgrade the storage system itself
−suppose we want to replace hard-disks with SSD−it should not have any impact on the logical
data or schemas.
Logical Data Independence: Logical data independence is the ability to change the
conceptual schema without having to change the external schema. That is, if we do any changes
in the logical view of the data, then the user view/ external view of the data should not be
affected.
8. STRUCTUREOFADBMS
The DBMS accepts SQL commands generated from a variety of user interfaces such as web
forms, applications, SQL interface and etc. When a user issues a query, the parsed query is
presented to a query optimizer, which uses information about how the data is stored to produce
an efficient execution plan for evaluating the query. An execution plan is a blueprint for
evaluating a query. It executes these plans against the database, and returns the answers to the
user.
QueryEval
uationEngi
ne
Figure:DBMSStructure
DBMS consists of a Query Evaluation engine which accepts commands from the front end
applications like web forms, SQL interfaces and evaluates the query to retrieve the requested
data.
Query Evaluation engine consists of the following components
o Parser: It parses the received SQL commands.
o Operator evaluator: It evaluates the operators used in the query.
o Plan executor: It designs a plan to obtain the result.
o Optimizer: It optimizes the query to improve the process of retrieving the result ant
data.
File and access methods: It is responsible for the abstraction of file structures stored and for
creating indexes on the files for faster access.
Buffer Manager: The purpose of buffer manager is to move pages in and out from a disk to
main memory.
Disk-Space Manager: It manages space on the disk by providing empty space for new
requests, deleting space allocated for existing files which are deleted by the user.
Transaction Manager and lock manager: It is responsible for maintaining concurrency of
the data, when accessed by multiple users.
Recovery manager: It is responsible for maintaining log files and supports crash recovery.
When a system crashes recovery manager is responsible for bringing the system to a safe
state.
9. DATABASE DESIGN
The database design process can be divided into six steps.
i. Requirements Analysis: The very first step in designing a database application is to gather
information from different stake holders such as management, employees and end users. The
development team conducts discussions with different user groups, study the current
operating environment, analyze any available documentation on existing applications and
gather all of the types of information that to be recorded in the database. The gathered
information is documented properly.
ii. Conceptual Database Design: The information gathered in the requirements analysis step is
used to develop Entity Relationship (ER) model. The ER model facilitates discussion among
all the people involved in the design process, even those who have no technical background.
iii. Logical Database Design: The task in this stage is to convert the ER model into relational
schemas. Each entity and each relationship is converted into a relation or a table.
iv. Schema Refinement: The fourth step in database design is to analyze the collection of
relations in our relational database schema to identify potential problems, and to refine it.
This process is called normalization.
v. Physical Database Design: In this step, the database design is refined to ensure that it meets
desired performance criteria and satisfies the expected workload. This step may simply
involve building indexes on some tables and clustering some tables, or it may involve a
substantial redesign of parts of the database schema obtained from the earlier design steps.
vi. Application and Security Design: Foreachrole(manager/accountant/clerk),some part of the
database is accessible and other part of the database is not accessible. The software developer
should enforce these accessing rules while developing the applications (using application
languages like java) to access data using DBMS.
Realistically, all above six design steps are repeated until the design is satisfactory to complete
database design.
10. ERDIAGRAMS
An entity-relationship (ER) diagram is a graphical representation of entities and their
relationships to each other, typically used to the organization of data within databases. An entity-
relationship (ER) diagram is also called as an entity relationship model.
Component of ER Diagram
The cardinality of a relationship is the number of instances of entity B that can be associated
with entity A. Based on the cardinality; the relationships are classified into four types. They are:
a. One-to-One Relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship.
Example: A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship: When only one instance of the entity on the left, and more than
one instance of an entity on the right associates with the relationship then this is known as a one-
to-many relationship.
Example: Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship: When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the relationship then it is known as a many-
to-one relationship.
Example: Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship: When more than one instance of the entity on the left, and more
than one instance of an entity on the right associates with the relationship then it is known as a
many-to-many relationship.
Example: Employee can assign by many projects and project can have many employees.
Entity Set: An Entity set is a set of entities of the same type that share the same properties.
The above diagram contains two entities; Instructor and Student. In the below figure, Instructor
entity contains six different instructor values (rows) called as Instructor entity set and Student
entity contain seven different student values (rows) called as Student entity set.
Relationship Set: A relationship set is a set of relationships of the same type. In the below
figure one instructor can advice many students but every student is advised by only one
instructor. This relationship is called as many-to-one relationship.
For example, an employee can supervise multiple employees. The role of one employee is HOD
and the role and other employees is faculty. That is, one HOD supervises many faculties.
Binary Relationship: When there is a relationship between two different entities, it is known as
a binary relationship.
Each employee only has a single ID card. Hence this is a one to one binary relationship where 1
employee has 1 ID card.
Ternary Relationship: When there is a relationship between three different entities, it is known
as a ternary relationship. An example of a ternary relationship can be shown as follows:
In this example, there is a ternary relationship between Doctor, Patient and Medicine.
There can be an employee without a dependent in the Company but there will be no record of the
Dependent in the company systems without any association with an Employee .
Generalization
Generalization is a bottom-up approach in which two or more lower-level entities combines to
form a new higher-level entity. In generalization, the generalized entity of higher level can also
combinewithentitiesofthelower-leveltomakefurtherhigher-levelentity. It is like a super class and
subclass system, but the only difference is that it uses the bottom-up approach. In this process,
the common attributes of two or more lower level entities are given to higher level entity
Aggregation
Aggregation is a process when relation between two entities is treated as a single entity.
In the diagram above, the relationship between Center and Course together, is acting as an
Entity, which is in relationship with another entity Visitor. Now in real world, if a Visitor or a
Student visits a Coaching Center, he/she will never enquire about the center only or just aboutthe
course, rather he/she will ask enquire about both.
SOLUTION
Step1: Identify theEntities
1. DEPARTMENT
2. COURSE
3. INSTRUCTOR
4. STUDENT
1 N
Department Offers Course
2. Onecourseisenrolledbymultiplestudentsandalsoonestudentenrollsformultiple courses. So
the relationship is many to many.
M Enrolled N
Course Student
by
3. One department has multiple instructors and also one instructor belongs to one and only
one department. So the relationship is one to many.
1 Has N
Department Instructor
4. Each department has one "Head of Department" and one Instructor is Department" for
only one department, hence the relationship is one to one.
1 Headed N
Department Instructor
by
5. One course is taught by only one instructor but one instructor teaches many courses,
hence the relationship between course and instructor is many to one.
N Taught 1
Course Instructor
by
The relationship between instructor and student need NOT be defined in the [Link]
reasons are as follows:
1. There is no business significance of this relationship.
2. We can always derive this relationship indirectly through course and instructor, and
course and students.
Instructor
1 1 1
N 1
N
Course#
Instructor N Taught 1 Student
by
Duration
N
Course Name InstructorName Room#
Enrolled Telephone#
by
Student
1. An entity set that does not have sufficient attributes to form a primary key is a
a) Strong entity set b)Variant setc)Weak entity set d)Variable set
2. In the relational model,cardinality is termed as:
a) [Link] tuples b)no. of attributes c)[Link] tables d) [Link] constraints
3. In a relational model,relations are termed as
a) Tuples. b) Attributes c)Tables. d) Rows.
4. In an E-R diagram attributes are represented by
a) rectangle b) square c)ellipse d) triangle.
5. An abstraction concept for building composite object from their component to bject is
called:
a).Specializationb).Normalization c).Generalization d).Aggregation
1. RELATIONAL MODEL
Relational data model is the most popular data model used widely around the world for data
storage. In this model data is stored in the form of tables.
Table is also called Relation. Let the below table name be SUDENT_DATA
Degree=No ofcolumns=4
Table: In relational model the data is saved in the form of tables. A table has two properties
rows and columns. Rows represent records and columns represent attributes.
Attribute: Each column in a Table is an attribute. Attributes are the properties that define a
relation. e.g., HTNO, NAME, AGE, CITY in the above relation.
Relation Schema: It represents the name of the relation (Table) with its attributes.
Domain Constraint: These are attribute level constraints. An attribute can only take
values which lie inside the domain [Link]: If a constrain AGE > 0 is applied on
STUDENT relation, inserting negative value of AGE will result in failure. If the domain of
AGE is defined as integer, inserting an alphabet in age column is not accepted.
Example:
ID NAME SEMESTER AGE
1001 TOM I 18
1002 JHONSON IV 20
1003 KATE VI 21
1004 JHON II 19
1005 MORGAN II A
Not allowed. Because AGE is an integer attribute
Entity integrity constraints: The entity integrity constraint states that primary key
value can't be null. This is because the primary key value is used to identify individual rows
in relation. A table can contain a null value other than the primary key field.
Department of CSE, NRCM Page 24
Example: Let ID be the primary key in the below table.
Key Constraints: A Key Constraintis a statement that a certain minimal subset of the
fields of a relation is a unique identifier for a [Link] are 4 types ofkey constraints.
Theyare
i. Candidate key: The candidate keys in a table are defined as the set of keys that is
minimal and can uniquely identify any data row in the table.
ii. Primary key: It can uniquely identify any data row of the table. The primary key is
one of the selected candidate key.
iii. Super key: Super Keyis the superset of primary key. The super key contains a set of
attributes, including the primarykey, which can uniquelyidentify any data row in the
table.
CompositeKey: If any single attribute of a table is not capable of being the key i.e it
cannot identify a row uniquely, then we combine two or more attributes to form akey.
This is known as a composite key.
Secondary Key: Only one of the candidate keys is selected as the primary key. The
rest of them are known as secondary keys.
1. Data type constraint: This defines the type of data, data length, and a few other
attributes which are specifically associated with the type of data in a column.
2. Default constraint: This defines what value the column should use when no value has
been supplied explicitly when inserting a record in the table.
3. Nullability constraint: This defines that if a column is NOT NULL or allow NULL
values to be stored in it.
4. Primary key constraint: This is the unique identifier of the table. Each row must have a
distinct value. The primary key can be either a sequentially incremented integer number
or a natural selection of data that represents what is happening in the real world (e.g.
Social Security Number). NULL values are not allowed in primary key values.
5. Unique constraint: This defines that the values in a column must be unique and no
duplicates should be stored. Sometimes the data in a column must be unique even though
the column does not act as Primary Key of the table. Only one of the values can be
NULL.
6. Foreign key constraint: This defines how referential integrity is enforced between two
tables.
7. Check constraint: This defines a validation rule for the data values in a column so it is a
user-defined data integrity constraint. This rule is defined bythe user when designing the
column in a table.
Employee Department
EmpID EName salary DeptID DName Location
CREATETABLEEmployee ( CREATETABLEDepartment (
EmpID NUMBER(3), DeptID NUMBER(3),
ENameVARCHAR(20), DNameVARCHAR(15),
Salary NUMBER(5), LocationVARCHAR(15),
PRIMARYKEY(EmpID) PRIMARYKEY(DeptID)
); );
2. Each relationship in the ER model will become a table. Key attributes of participating entities
in the relationship will become columns of the table. If the relationship has any attributes,
then they also will become columns of the table.
Example: From the above ER diagram, theWorks_In relationship converted as
Employee Department
EmpID EName Salary DeptID DName Location
Works_In
EmpID DeptID since
CREATETABLEWorks_In (
EmpIDNUMBER(3),
DeptIDNUMBER(3),
Since DATE,
PRIMARYKEY(EmpID, DeptID),
FOREIGNKEY(EmpID)REFERENCESEmployee(EmpID),
FOREIGNKEY(DeptID)REFERENCESDepartment(DeptID),
);
city
address
HTNO Student state
Student Address
HTNO name age HTNO houseNo street city state
CREATETABLEStudent ( CREATETABLEAddress
HTNOCHAR(10),name (
VARCHAR(20), HTNOCHAR(10),
ageNUMBER(2), houseNo NUMBER(3),
PRIMARYKEY(HTNO) street VARCHAR(20),
); cityVARCHAR(15), state
VARCHAR(15),
PRIMARYKEY(HTNO),
FOREIGNKEY(HTNO)REFERENCESStudent(HTNO),
);
4. Each weak entity is converted into a table with all its attributes as columns and primary key
of the strong entity acts as a foreign key in this table.
Course Section
CourseID CName NOofCredits CourseID SectionNo location
CREATETABLECourse ( CREATETABLESection (
CourseIDNUMBER(2), SectionNoCHAR(2),
CName VARCHAR(20), CourseIDNUMBER(2),
NOofCreditsNUMBER(2), location VARCHAR(15),
PRIMARYKEY(CourseID) PRIMARYKEY(CourseID,SectionNo),
); FOREIGNKEY(CourseID)REFERENCES Course(CourseId)
);
Thereasonsforusingviewsare
If we want to hide the salary column from accessing a group of users, then we can create view on
employees table as follows.
CREATE VIEWemp AS
SELECTeid,ename,experience FROM employees;
emp
eid Name Experience
101 Jhon 2
105 Sam 2
108 Ram 4
DROPVIEW view_name;
Example:DROPVIEWemp;
6. RELATIONAL ALGEBRA
Relational Algebra is procedural query language, which takes Relation as input and
generates relation as output. Relational algebra mainly provides theoretical foundation for
relational databases and SQL.
Department of CSE, NRCM Page 30
OperatorSymbol OperatorName Explanation
Projection Selectcolumnnames
Selection Selectrow values
i. Select Operation(σ):Itselectstuplesthatsatisfythegivenpredicatefromarelation.
Notation:σp(r)
where σ stands for selecting tuples (rows) and r stands for relation (table) name. p is
prepositional logic formula which may use connectors like and, or, and not. These termsmay
use relational operators like = ,≠ ,≥ ,<,> ,≤ .
Example1:σsubject="database"(Books)
Output:Selects rowswhosesubjectis'database'frombooks table.
Example2:σsubject="database"andprice="450"(Books)
Output:Selects rowsfrombookswheresubjectis'database'and'price'is450.
Example3:σsubject="database"andprice="450"oryear>"2010"(Books)
Output:Selectsrowsfrombookswheresubjectis'database'and'price'is450orthose books
published after 2010.
Example:∏author(Books)∪∏author(Articles)
Output: Projects the names of the authors who have either written a book or an article or
both.
R ∩ S returns a relation instance containing all tuples that occur in both R and S. The
relations R and S must be union-compatible, and the schema of the result is defined to be
identical to the schema of R.
∏author(Books)∩∏author(Articles)
Output: Projects the names of the authors who have written bothbook and an article.
v. Set Difference (−): It finds tuples which are present in one relation but not in the second
relation.
Notation: r−s
Finds all the tuples that are present in r but not ins.
Example: ∏author(Books)−∏author(Articles)
Output− Provides the name of authors who have written books but not articles.
vi. Cartesian Product (Χ): It returns a relation instance whose schema contains all the
fields of table-1 (in the same order as they appear in table-1) followed by all the fields of
table-2. It combines every row in first table with every row in the second table.
Notation: r Χ s
Where rand sare relations and their output will be defined as :r Χs={ qt|q∈r and t ∈s}
vii. Natural join( ):The most general version of the join operation accepts a join condition
C and a pair of relation instances as arguments and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:
viii. Natural Join( ): In this case, we can simply omit the join condition; the default is that
the join condition is a collection of equalities on all common fields. We call this special case
as natural join, and it has the nice propertythat the result is guaranteed not to have two fields
with the same name.
ix. Rename Operation (ρ): The results of relational algebra are also relations but without
any name. The rename operation allows us to rename the output relation. 'rename' operation
is denoted with small Greek letter rho ρ.
Notation:ρ(temp,E)
x. Division (/): Consider two relation instances A and B in which A has (exactly) two fields x
and y and B has just one field y, with the same domain as in A. We define the division
operation A / B as the set of all x values (in the form of unary tuples) such that for every y
value in (a tuple of) B, there is a tuple (x,y) in A.
Example:
S2 P2 SNO
PNO A/B2
S3 P2 S1
B3 P2 S4
S4 P2
P4
S4 P4 SNO
A/B3
S1
(Q5)Find the names of sailors who have reserved a red or a green boat.
ρ(Tempboats,(σcolor=′red′Boats)U(σcolor=′green′Boats))
πsname(Tempboats Reserves Sailors)
(Q6)Find the names of sailors who have reserved a red and a green boat
ρ(Tempboats2,(σcolor=′red′Boats)∩(σcolor=′green′Boats))
πsname(Tempboats2 Reserves Sailors)
However, this solution is incorrect-it instead tries to compute sailors who have reserved a boat
[Link];thisquerywillalwaysreturnanempty answer set.
The right answer is
ρ(Reservationpairs(1→sid1,2→sname1,3→bid1,4→
sid2,5→sname2,6→bid2),Reservations×Reservations)
πsname1σ(sid1=sid2)∩ (bid1=bid2)Reservationpairs
First, we compute tuples of the form (sid, sname, bid), where sailor sid has made a reservationfor
boat bid; this set of tuples is the temporary relation Reservations. Next we find all pairs of
Reservations tuples where the same sailor has made both reservations and the boats involved are
distinct. Here is the central idea: To show that a sailor has reserved two boats, we must find two
Reservations tuples involving the same sailor but distinct boats. Finally, we project the names of
such sailors.
(Q8)Find the sids of sailors with age over 20 who have not reserved ared boat.
πsid(σage>20Sailors)−πsid((σcolor=′red′Boats) Reserves Sailors)
This query illustrates the use of the set-difference operator. Again, we use the fact that sid is
The use of the word all(orevery)is a good indication that the division operation might be
applicable:
ρ(Tempsids,(πsid,bidReserves)/(πbidBoats))
πsname(Tempsids Sailors)
(Q10)Find the names of sailors who have reserved all boats called Interlake.
ρ(Tempsids, (πsid,bid Reserves)/(πbid(σbname=′Interlake′ Boats)))
πsname(Tempsids Sailors)
7. RELATIONAL CALCULUS
Relational calculus is an alternative to relational algebra. In contrast to the algebra, which is
procedural, the calculus is nonprocedural, or declarative, in that it allows us to describe the set
of answers without being explicit about how they should be computed.
TupleRelationalCalculus
Tuple Relational Calculus is a non-procedural query language unlike relational [Link]
Calculus provides only the description of the query but it does not provide the methods to solve
it. Thus, it explains what to do but not how to do.
Where t=resulting tuples,P(t)=known as Predicate and these are the conditions that are used to
fetch t. Thus, it generates set of all tuples t, such that Predicate P(t) is true for t.
(Q13)Findthesailorname,boatid,andreservationdateforeachreservation
{P|∃R∈Reserves S∈Sailors
(Q1)Findthenamesofsailorswhohavereservedboat103.(similarquestionQ1from relational
algebra)
{P|∃S∈Sailors∃R∈Reserves([Link]=[Link]∧[Link]=103∧[Link]∧[Link])}
This query can be read as follows: “Retrieve all sailor tuples for which there exists a tuple in
Reserves, having the same value in the sid field, and with bid = 103.”
∧∃B∈Boats([Link]=[Link]∧[Link]=′red′))}
{P|∃S∈Sailors∃R1∈Reserves∃R2∈Reserves([Link]=[Link]
[Link] =[Link] ∧[Link]≠ [Link]∧[Link]= [Link])}
A formula is recursively defined to be one of the following, where P and q are themselves
formulas and p(X) denotes a formula in which the variable X appears:
anyatomic formula
┐p,P/\q, P V q, orp=>q
∃X(p(X)), whereX isadomain variable
∀X(p(X)),whereX isa domain variable
(Q7)Find the names of sailors who have reserved at least two boats.
{〈N〉 |∃ I,T, A(〈I,N,T,A〉∈ Sailors∧
∀ B,BN,C(¬(〈B,BN,C〉∈ Boats) V
1. SQLCOMMANDS
Structured Query Language (SQL) is the database language used to create a database and
to perform operations on the existing database. SQL commands are instructions used to
communicate with the database to perform specific tasks and queries with data. These SQL
commands are categorized into five categories as:
SQL commands
DROP SAVEPOINT
UPDATE
TRUNCATE
CREATE: It is used to create the database or its objects (like table, index, function,
views, store procedure and triggers).
DROP: It is used to delete objects from the database.
ALTER: It is used to alter the structure of the database.
TRUNCATE: It is used to remove all records from a table, including all
spaces allocated for the records are removed.
ii. DQL (Data Query Language): DML statements are used for performing queries on the
data within schema objects. The purpose of DQL Command is to get data from some schema
relation based on the query passed to [Link] DQL commands are:
SELECT–is used to retrieve data from the database.
iii. DML (Data Manipulation Language):The SQL commands that deals with the
manipulation of data present in the database belong to DML or Data Manipulation Language
and this includes most of the SQL statements. The DML commands are:
2. DDL COMMANDS
DDL or Data Definition Language consists of the SQL commands that can be used to define
CREATE: It is used to create the database or its objects like table, index, function, views,
store procedure and triggers.
Note: The content in the squarebrackets indicates it is [Link] not required,you can skip it.
Column constraints
o PRIMARYKEY //Use only,If one column name as primarykey.
o NOTNULL //It does not accept NULLvalue in that column.
o DEFAULTvalue //It storede fault value in that column,if no value is inserted
o UNIQUE //It allows to store only unique values in the column
Table constraints
o PRIMARY KEY(column_name1,column_name2,…)
Use it,If one column name or multiple column names acts as primarykey.
o UNIQUE(column_name1,column_name2,…)
Use it,if one column name o rmultiple column names should contain unique values.
If multiple column names are used, then for each row,it consider values from all the columns
mentioned to decide the uniqueness, but not column wise.
o FOREIGN KEY(column_name1) REFERENCES other_table_name(column_name2)
It is used to link data from one table to other table.
o CHECK(condition)
It does not allow inserting value(s), if the condition is not satisfied. The condition may
alsocontain multiple column names.
c) The ‘CREATETABLEAS’ Statement:You can also create a table from another existing
table. The newly created table also contains data of existing table.
Syntax: CREATE TABLE NewTableName AS(SELECT Column1,column2,...,ColumnN
FROM ExistingTableName
WHERE[condition]);
iii. TRUNCATE: This command is used to delete the information present in the table but
does not delete the table. So, once you use this command, your information will be lost, but
not the table.
Syntax: TRUNCATE TABLE TableName; .
iv. ALTER: This command is used to add, delete or modifycolumn(s) in an existing table. It
can also be used to rename the existing table and also to rename the existing column name.
a) The‘ALTERTABLE’ with ADD column:You can use this command to add a new
column to the existing table.
Syntax: ALTER TABLE TableName ADD
Column Name Datatype;
Example 2: Changing the data type of column ‘EmployeeID’ in the table ‘Employee_info’
from int to char(10).
d) The‘ALTER TABLE’ with CHANGE column name: This statement is used to change
the column name of an existing column in a table.
e) The ‘ALTER TABLE’ with RENAME table name: This statement is used to change
the table name in the database.
3. DML COMMANDS: The SQL commands that deals with the manipulation of data
present in the database belong to DML or Data Manipulation Language and this includes
most of the SQL statements. The DML commands are:
i. INSERT: This statement is used to insert new record (row) into the table.
Example1:
INSERT INTO Employee_Info(EmployeeID,EmployeeName,PhoneNumber,City,Country) VALUES ('06', 'Sanjana',
'9921321141', 'Chennai', 'India');
Example2: When inserting all column values as per their order in the table, you can omit
column names.
INSERT INTO Employee_Info
VALUES('07','Sayantini','9934567654','Pune','India');
ii. DELETE: This statement is used to delete the existing records in a table.
Example:
DELETE FROM Employee_Info WHERE
EmployeeName='Preeti';
Note: If where condition is not used in DELETE command, then all the rows data will be deleted. If used
only rows which satisfies the condition are deleted.
iii. UPDATE: This statement isused to modify the record values already present in the table.
Syntax: UPDATETableName
SETColumn1=Value1,Column2=Value2,... [WHERE
Condition];
Example:
UPDATE Employee_Info
SET EmployeeName='Jhon',City='Ahmedabad' WHERE
EmployeeID = 1;
Note:If where condition is not used in UPDATE command, then in all the rows Employee Name changes to
'Jhon' and City name changes to 'Ahmedabad'.Ifusedonly rows which satisfies the condition are
updated.
4. DQL COMMAND:The purpose of DQL Command is to get data from one or more
tables based on the query passed to it.
i. SELECT: This statement is used to select data from a database and the data returned
isstored in a result table, called the result-set.
Example2:
SELECT EmployeeID,EmployeeName
FROM Employee_Info;
The ‘SELECT with DISTINCT’ Statement: This statement is used to display only
different unique values. It mean it will not display duplicate values.
The ‘ORDER BY’ Statement: The ‘ORDER BY’ statement is used to sort the required
results in ascending or descending order. The results are sorted in ascending order by
default. Yet, if you wish to get the required results in descending order, you have to use
the DESC keyword.
Example
/*Selectallemployeesfromthe'Employee_Info'tablesortedby City*/
SELECT*FROMEmployee_Info
ORDER BY City;
SELECT*FROMEmployee_Info
ORDER BY City DESC;
/*Selectallemployeesfromthe'Employee_Info'tablesortedby
CityinDescendingorderandEmployeeNameinAscendingorder:*/
SELECT*FROMEmployee_Info
ORDERBYCityASC,EmployeeNameDESC;
MIN() Function: The MIN function returns the smallest value of the selected column in a table.
Syntax:SELECT MIN(ColumnName)
FROM TableName
WHERECondition;
MAX( ) Function:The MAX function returns thelargest value of the selected column in a table.
Syntax:SELECT MAX(ColumnName)
FROM TableName
WHERECondition;
Example:
SELECT MAX(Salary)ASLargestFees
FROM Employee_Salary;
COUNT()Function:The COUNT function returns the number of rows which match the specified criteria.
SUM() Function:The SUM function returns the total sum of a numeric column that you
choose.
Syntax: SELECT
AVG(ColumnName)
FROM TableName
WHERECondition;
Example:
SELECT AVG(Salary)
FROMEmployee_Salary;
Example:
--Tolistthenumberofemployeesfromeachcity.
SELECT COUNT(EmployeeID),City
FROM Employee_Info
GROUPBYCity;
The‘HAVING’Clause:The‘HAVING’ clause must be used SQL along with GROUPBY clause only. It
is similar to the WHERE clause.
Example
/* To list the number of employees in each city. The employees should be sorted
high to low and only those cities must be included who have more than 5 employees:*/
SELECTCOUNT(EmployeeID),City
FROM Employee_Info
GROUPBY City
HAVINGCOUNT(EmployeeID)>2;
Operators in SQL:
Arithmetic operators
Bitwise operators
Comparison operator
Compound operator
Logicaloperator
Bitwise Operators:
Operator Description
^ BitwiseExclusiveOR(XOR) [A^ B]
| BitwiseOR [A|B]
& BitwiseAND[A &B]
Comparison Operators:
Operator Description
<> NotEqual to [A <>B]
<= Lessthan or equalto [A<= B]
>= Greaterthanor equalto[A>= B]
< Lessthan [A<B]
> Greaterthan[A>B]
= Equalto [A =B]
Compound Operators:
Operator Description
|*= BitwiseORequals [A |*=B]
^-= BitwiseExclusiveequals [A^-=B]
&= BitwiseAND equals[A&= B]
%= Moduloequals [A%=B]
/= Divideequals [A /=B]
*= Multiplyequals[A*= B]
-= Subtractequals[A-= B]
+= Addequals[A+= B]
Logical Operators: The Logical operators present in SQL are as follows:AND,OR,NOT, BETWEEN, LIKE, IN,
EXISTS, ALL, ANY.
Syntax:
SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERECondition1ANDCondition2ANDCondition3...;
Example:
SELECT*FROMEmployee_Info
WHERECity='Mumbai'ANDCity='Hyderabad';</pre>
OR Operator: This operator displays all those records which satisfy any of the conditions separated by
OR and give the output TRUE.
Syntax:SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERECondition1ORCondition2ORCondition3...;
Example:
SELECT*FROMEmployee_Info
WHERECity='Mumbai'ORCity='Hyderabad';
NOT Operator: The NOT operator is used, when you want to display the records which do not satisfy a
condition.
Syntax: SELECTColumn1,Column2,...,ColumnN
FROM TableName
WHERENOTCondition;
Example:
NOTE:You can also combine the above three operators and write a query as follows:
SELECT * FROM Employee_Info
WHERE NOT Country='India' AND (City='Bangalore 'OR City='Hyderabad');
BETWEEN Operator: The BETWEEN operator is used, when you want to select values within a given
range. Since this is an inclusive operator, both the starting and ending values are considered.
Syntax:
Department of CSE, NRCM Page 51
SELECTColumnN
ame(s) FROM
TableName
WHEREColumnNameBETWEENValue1ANDValue2;
Example:
SELECT * FROM Employee_Salary
WHERE Salary BETWEEN 40000 AND 50000;
LIKE Operator
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column of a table.
There are mainly two wildcards that are used in conjunction with the LIKE operator:
Syntax
SELECTColumnName(s
) FROM TableName
WHEREColumnNameLIKEpattern;
Refer to the following table for the various patterns that you can mention with the LIKE operator.
LikeOperatorCondition Description
WHERE CustomerName LIKE ‘v% Finds any values that start with “v”
WHERE CustomerName LIKE ‘%v’ Finds any values that end with“v”
WHERE CustomerName LIKE ‘%and%’ Finds any values that have“and”in any position
Finds any values that h0ave“q”in the second
WHERE CustomerName LIKE ‘_q%’
position.
Finds any values that start with“u” and are at
WHERE CustomerName LIKE ‘u_%_%’
least 3 characters in length
Finds any values that start with“m”and end
WHERE ContactName LIKE ‘m%a’
with “a”
Example:
SELECT * FROM Employee_Info
WHEREEmployeeNameLIKE'S%';
IN Operator:This operator is used for multiple OR [Link] allows you to specify multiple values
in a WHERE clause.
Syntax:SELECTColumnName(s) FROM
TableName
WHERE ColumnNameIN(Value1,Value2...);
Department of CSE, NRCM Page 52
Example:
SELECT * FROM Employee_Info
WHERE City IN('Mumbai','Bangalore','Hyderabad');
NOTE:You can also use IN while writing Nested Queries.
EXISTS Operator: The EXISTS operator is used to test if a record exists or not.
Syntax:
SELECTColumn
Name(s) FROM
TableName WHERE
EXISTS
(SELECT ColumnName FROM TableNameWHERE condition);
Example:
SELECT City
FROM Employee_Info
WHERE EXISTS (SELECTCity
FROM Employee_Info
WHERE EmployeeId=05 AND City='Kolkata');
ALL Operator: The ALL operator is used with a WHERE or HAVING clause and returns TRUEif all
of the subquery values meet the condition.
Syntax:SELECTColumnName(s)
FROM TableName
WHERE ColumnName operator ALL
(SELECT ColumnName FROM TableName WHEREcondition);
Example:
SELECTEmployeeName
FROM Employee_Info
WHEREEmployeeID=ALL(SELECTEmployeeID
FROMEmployee_Info
WHERECity='Hyderabad');
ANYOperator:Similar to the ALL operator,the ANY operator is also used withaWHEREor HAVING
clause and returns true if any of the subquery values meet the condition.
Syntax:
SELECTColumnN
ame(s) FROM
TableName
WHERE ColumnNameoperator ANY
(SELECTColumnNameFROMTableNameWHEREcondition);
Example:
SELECTEmployeeName
FROM Employee_Info
Department of CSE, NRCM Page 53
Aliases Statement:Aliases are used to give a column /table a temporary name and only exists for
duration of the query.
SELECTColumnNameASAliasName
FROM TableName;
Example:
SELECTEmployeeIDASID,EmployeeNameASEmpName
FROM Employee_Info;
5. NESTED QUERIES
Nested queries are those queries which have an outer query and inner [Link], basically,
the subquery is a query which is nested within another query.
OUTERQUERY SUBQUERYorINNERQUERY
SELECT EmployeeName,PhoneNumber
FROM Employee_Info
WHERE City IN( SELECT City
FROM Office
WHERE County=‘INDIA’);
First the inner query gets executed and the result will be used to execute the outer query.
6. SETOPERATIONS:UNION,INTERSECT,EXCEPT
There are mainly three set operations:UNION, INTERSECT,[Link] can refer to the image below
to understand the set operations in SQL.
ii. INTERSECT:This clause used to combine two SELECTstatements and return the
intersection of the data-sets of both the SELECT statements.
Syntax: SELECTColumnName(s)FROMTable1WHEREcondition
INTERSECT
SELECTColumnName(s)FROMTable2WHEREcondition;
iii. EXCEPT:This operator returns those tuples that are returned by the first SELECT
operation, and are not returned by the second SELECT operation.
Note: UNION, INTERSECT or EXCEPT operations are possible if and only if first SELECT
query and second SELECT query produces same no of columns in same order, same
column names and data type. Otherwise it gives an error.
7. JOINS
JOINS are used to combine rows from two or more tables, based on a related column betweenthose
tables. The following are the types of joins:
INNER JOIN:This join returns those records which have matching values in both
thetables.
FULL JOIN:This join returns all those records which either have a match in the left or
the right table.
LEFT JOIN:This join returns records from the left table, and also those records which
satisfy the condition from the right table.
RIGHTJOIN:Thisjoinreturnsrecordsfromtherighttable,andalsothoserecords which
satisfy the condition from the left table.
Employee_Info
EmployeeID EmployeeName PhoneNumber City Country
01 Shravya 9898765612 Mumbai India
02 Vijay 9432156783 Delhi India
03 Preeti 9764234519 Bangalore India
04 Vijay 9966442211 Hyderabad India
05 Manasa 9543176246 Kolkata India
Technologies
TechID EmpID TechName ProjectStartDate
1 01 DevOps 04-01-2019
2 03 Blockchain 06-07-2019
3 04 Python 01-03-2019
4 06 Java 10-10-2019
INNER JOIN or EQUI JOIN: This is a simple JOIN in which the result is based on matched data as
per the equality condition specified in the SQL query. This join is used mostly. NATURAL JOIN is a
type INNER JOIN. We can also use it. It also gives same result.
Syntax
SELECT
ColumnName(s) FROM
Table1
INNER JOIN Table2 ON [Link]=[Link];
Example
Syntax
SELECTColumnName(s
) FROM Table1
[Link]=[Link];
Example
[Link],[Link],[Link]
FROM Employee_Info E
[Link]=[Link];
LEFT JOIN: The left outer join returns a result-set table with the matched data from the two tables
and then the remaining rows of the left table with null for the right table's columns.
Syntax:
SELECTColumnName(s)
FROM Table1
[Link]=[Link];
Example:
[Link],[Link],[Link]
FROM Employee_Info E
[Link]=[Link];
Example:
[Link],[Link],[Link] FROM
Employee_Info E
[Link]=[Link];
8. TRIGGERS
A trigger is a stored procedure in database which automatically invokes when ever a special
event in the database occurs. For example, a trigger can be invoked when a row is inserted into a
specified table or when certain table columns are being updated. So, a trigger can be invoked
either BEFORE or AFTER the data is changed byINSERT, UPDATE or DELETEstatement.
Refer to the image below.
CREATE TRIGGER[TriggerName]
[BEFORE | AFTER]
{INSERT|UPDATE|DELETE}
on[TableName]
[FOREACHROW]
[TriggerBody]
Explanation of syntax:
BEFOREandAFTERofTrigger:
BEFORE triggers run the trigger action before the triggering statement is run.
AFTER triggers run the trigger action after the triggering statement is run.
EXAMPLE:
CREATE TRIGGER nb BEFORE INSERT ON accounts FOR EACH ROW /*Event*/ Begin
IF:[Link]<0THEN /*Condition*/
DBMS_OUTPUT.PUT_LINE('BALANCE IS NAGATIVE..'); /*Action*/
END IF;
End;
A trigger called ‘nb’ is created to alert the user when inserting account details with negative balance
value in to accounts table. Before inserting, the trigger is activated if the condition istrue. When a trigger
activated, the action part of the trigger is get executed.
9. NORMALIZATION
Normalization is the process of minimizing the redundancy from a relation or set of
relations.
It is used to eliminate the Insertion,Update and Deletion Anomalies.
Normalization divides the larger table into the smaller table and links them using
relationship.
Normalization is done with the help of different normal form.
TheProblemofredundancy
Redundancy means having multiple copies of same data in the [Link] problem arises when a
database is not normalized. Redundancy leads the following problems.
Theabovetableis not in 1NFbecause 501 and 502 is havingtwo valuesin mobilecolumn. If we add a new
column as alternative mobile number to the above table, then for 503 alternative mobile number is
[Link], if a student has ‘n’ mobile numbers, then adding ‘n’ extra column is meaningless. It is
better to add extra rows. If we add extra row for each 501 and 502 then the table looks like
But the above table violates primary key constraint. Therefore instead of adding either columns or rows,
the best solution is to split the table into two tables as shown below. If we do as shown below, if a
student having ‘n’ number of mobile numbers also can be added.
HTNO FIRST LAST HTNO MOBILE
NAME NAME 501 9999988888
501 Jhansi Rani 501 7777799999
502 Ajay Kumar 502 8888888881
502 7897897897
503 Priya Verma 503 9898989898
2NF Example:
The above table is not in 2NF because there exist partial function dependencies. HTNO is a key attribute
in the above table. If every non-key attribute fully dependent on key attribute, then we say it is fully
functional dependent. Consider the below diagram. {Name, DOB, DeptNo, DeptName, Location}
depends on HTNO. But {DeptName, Location} also depends on DeptNo.
Name
DOB DeptName
DeptNo
DeptNo
HTNO Location
DeptName
Location
It is clear that DeptName and Location not only depends upon HTNO but also on DeptNo. So, there
exists partial function dependency. This partial functional dependency can be removed by splitting the
above table into two tables as follows.
3NFExample:
BOOK_DETAILS
BookID GenreID GenreType Price
1 1 Gardening 250.00
2 2 Sports 149.00
3 1 Gardening 100.00
4 3 Travel 160.00
5 2 Sports 320.00
The above table is not in 3NF because there exist transitive [Link] the table able,
BookIDdeterminesGenreID {BookID→GenreID}
GenreIDdeterminesGenreType. {GenreID→GenreType}
BookIDdeterminesGenreTypeviaGenreID. {BookID→GenreType}
It implies that transitive functional dependency is existing and the structure does not satisfy third
normal [Link] bring this table into third normal form,we split the table into two as follows:
BOOK_DETAILS
BookID GenreID Price
1 1 250.00 GENRE_DETAILS
2 2 149.00 GenreID GenreType
3 1 100.00 1 Gardening
4 3 160.00 2 Sports
5 2 320.00 3 Travel
A relation (table) is said to be in the BCNF if and only if it satisfy the following conditions:
ThatisEmailID → PatientID.
In the above dependency, EmailId is non-prime attribute and PatientID is a primeattribute. Therefore
the above table is not in BCNF. In order to bring the table into BCNF, we split it into two tables as
shown below.
In other words we can also define BCNF as there should not be any overlapping between candidate keys. If you
consider the original table (before splitting), we can get two candidate keys {PateintID, AdmittedDate}
and{EmailID, AdmittedDate}.
As there exist overlapping in the candidate keys,the table is not in BCNF. To bring it into BCNF, we split into
two tables as shown above.
WhatisMulti-valuedDependency?
A table is said to have multi-valued dependency, if the following three conditions are true.
If all these three conditions are true for any relation (table), then it contains multi-valued
[Link]-valueddependencycanbe explainedwithanexample. LettheRelationR containing
three columns A, B, C and four rows s, t, u, v.
A B C
s a1 b1 c1
t a1 b1 c2
u a1 b2 c1
v a1 b2 c2
If s(A) =t(A)=u(A)=v(A)
s(B)=t(B) ands(B) =v(B)
s(C)=u(C)andt(C)=v(C), then thereexist multi-valueddependency.
Example:Consider the below college enrolment table with columns HTNO,Subject and
Hobby.
Cricket Dancing
Hobby
Asthereexistmultivalueddependency,theabovetableisdecomposedintotwotablessuch that
15. 5NF
A relation is said to be in 5-NFif and onlyif it satisfies the followingconditions
ItshouldbeintheFourth NormalForm.
Thetableshould not haveanyjoin Dependencyand joiningshould belossless.
A table is decomposed into multiple small tables to eliminate redundancy, and when we re-join the
decomposed tables, there should not be any loss in the original data or shold not create any new data. In
simple words, joining two or more decomposed table should not lose records nor create new records.
In the table,DBMS is taught by Ravindar and Uma Rani in semester4, DS by Sindhusha and Venu in
sem 3. In this case, the combination of all these fields required to identify valid data.
So to make the table into 5NF,we can decompose it intot here relations,
The above table is decomposed into three tables as follows to bringit into 5-NF.
R R1 R2
A B C A B B C
1 2 1 decompose→ 1 2 and 2 1
2 5 3 2 5 5 3
3 3 3 3 3 3 3
Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have:R1⋈ R2 = R .Now, if we perform the natural join ( ⋈) of the sub relations R 1
and R2 , we get
A B C
1 2 1
Thisrelationis sameastheoriginalrelation R.
2 5 3
3 3 3
Thus, we conclude that the above decomposition is lossless join decomposition. This is
because the resultant relation after joining the sub relations is same as the decomposed
relation. No extraneous tuples (rows) appear after joining of the sub-relations.
R R1 R2
A B C A C B C
1 2 1 decompose→ 1 1 and 2 1
2 5 3 2 3 5 3
3 3 3 3 3 3 3
Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have:R1⋈ R2 = R .Now, if we perform the natural join ( ⋈) of the sub relations R 1
and R2 , we get
A B C
1 2 1
ThisrelationisnotsameastheoriginalrelationR.
2 5 3
2 3 3
3 5 3
3 3 3
Thus, we conclude that the above decomposition is not lossless join decomposition. This is
because the resultant relation after joining the sub relations is not same as the decomposed
relation. Extraneous tuples (rows) appear after joining of the sub-relations.
Department of CSE, NRCM Page 72
PROBLEMS
ConsiderarelationR isdecomposed intotwo sub relationsR1andR2.
Condition-01:Unionofboththesubrelationsmustcontainalltheattributesthatarepresent in the
original relation R.
R1∪R2= R
Condition-02:Intersectionofboththesubrelationsmustnotbenull. Inotherwords,there must
be some common attribute which is present in both the sub relations.
R1∩R2 ≠∅
Condition-03:IntersectionofboththesubrelationsmustbeasuperkeyofeitherR1orR2or both.
************************************************************************
Problem-01:ConsiderarelationschemaR(A,B,C,D)withthefunctionaldependencies A → B
and C → D. Determine whether the decomposition of R into R1 ( A , B ) and R2 ( C , D ) is
lossless or lossy.
Solution:To determine whether the decomposition is lossless or lossy, we will check all the
conditions one by one. If any of the conditions fail, then the decomposition is lossy otherwise
lossless.
Condition-01:Accordingto condition-01,union ofboththe subrelations must containallthe
attributes of relation R. So, we have:
R1( A, B)∪R2( C , D ) = R(A , B, C , D )
Clearly, union of the sub relations contains all the attributes of relation R. Thus, condition-01
satisfies.
Condition-02:According to condition-02, intersection of both the sub relations must not be
null. So, we have-
R1( A, B)∩R2( C , D ) = Φ
Clearly, intersection of the sub relations is [Link], condition-02 fails. Thus, we conclude that
the decomposition is lossy.
************************************************************************
Problem-02:ConsiderarelationschemaR(A,B,C,D)withthefollowingfunctional dependencies
A→ B B→ C C→D D→ B
StrategytoSolve:Whenagivenrelationisdecomposedintomorethantwosubrelations, then
Considerany onepossiblewaysinwhichtherelationmighthavebeendecomposed into those
sub relations.
First,dividethe givenrelationintotwosub relations.
Then,dividethesubrelationsaccordingtothesubrelationsgiveninthequestion. As a
thumb rule, remember-
Anyrelation can bedecomposed onlyinto two sub relations at a time.
Condition-01:According to condition-01, union of both the sub relations must contain all the
attributes of relation R. So, we have
R‘( A, B, C )∪R3( B, D ) = R (A , B, C , D )
Clearly, union of the sub relations contains all the attributes of relation R. Thus, condition-01
satisfies.
Clearly,intersectionofthesubrelationsisasuperkeyofoneofthesubrelations. So,
condition-03 satisfies. Thus, we conclude that the decomposition is lossless.
Condition-01: According to condition-01, union of both the sub relations must contain all the
attributes of relation R’. So, we have
R1( A, B)∪R2( B, C ) = R’(A, B, C )
Clearly, union of the sub relations contain all the attributes of relation R’. Thus, condition-01
satisfies.
Condition-02: According to condition-02, intersection of both the sub relations must not
benull.
So, wehave
R1( A, B)∩R2( B, C) = B
Clearly,intersectionofthesubrelations [Link],condition-02 satisfies.
Clearly,[Link], condition-03
satisfies. Thus, we conclude that the decomposition is lossless.
The set of all those attributes which can be functionally determined from an attribute
set is called as a closure of that attribute set. Closure of attribute set {X} is denoted as {X}+.
StepstoFindClosureofanAttributeSet:Followingstepsarefollowedtofindthe closure of an
attribute set:
Step-01:Addtheattributescontainedintheattributesetforwhichclosureisbeing calculated to the
result set.
Step-02:Recursivelyaddtheattributestotheresultsetwhichcanbefunctionallydetermined from the
attributes already contained in the result set.
Solution:
Closureof {A, B}:
+
{AB} ={ A ,B }
={ A , B, C , D } (UsingAB→ CD)
={ A , B, C , D ,G } (UsingC→ G)
Thus,{AB}+={ A , B, C , D , G}
Super Key:
Iftheclosureresultofanattributesetcontainsalltheattributesoftherelation, then that attribute set is
called as a super key of that relation.
Thus,wecansay,“Theclosureofasuperkeyistheentirerelationschema.”
Example:In the above example (Question 1),
Theclosureof attribute A is theentirerelation schema.
Thus,attributeA is asuper keyfor that relation.
Step-01:Determineallessentialattributesofthegiven relation
EssentialattributesarethoseattributeswhicharenotpresentonRHSofanyfunctional
dependency.
[Link] cannot
be determined by other attributes.
Example:LetR(A,B,C,D,E,F)bearelationschemewiththefollowingfunctional
dependencies: A → B,C → DandD → E.
The RHS of all the above functional dependencies contain only B, D and E. The
attributeswhicharenotpresentonRHSofanyfunctionaldependencyareA,CandF. So,
essential attributes are: A, C and F.
Step-02:Determiningallnon-essentialattributesusingessentialattributes
[Link] can be
determined by using essential attributes.
Now,followingtwocasesarepossible-
Case-01:Ifallessentialattributestogethercandetermineallremainingnon-essential
attributes, then
o Thecombinationofessentialattributesis thecandidatekey.
o Itis the onlypossible candidatekey.
Case-02:Ifallessentialattributestogethercannotdetermineallremainingnon-
essential attributes, then-
Thesetofessentialattributesandsomenon-essentialattributeswillbethecandidate key(s).
Inthiscase,multiplecandidatekeysarepossible.
Tofindthecandidatekeys,wecheckdifferentcombinationsofessentialandnon-
essential attributes.
Problem-01:LetR=(A,B,C,D,E,F)bearelationschemewiththefollowing
dependencies:C→F,E→A,EC→DandA→B. [Link], determine the
total number of candidate keys and super keys.
Solution:Wewillfindcandidatekeysof thegivenrelationinthefollowingsteps-
Step-01:Determineallessentialattributesofthegiven relation.
Step-02:Now,wewillcheckiftheessentialattributestogethercandetermineallremaining non-
essential attributes. To check, we find the closure of CE. So,
{ CE }+ ={ C , E }
={ C, E , F} (UsingC→F)
={ A , C , E , F} (UsingE→A)
={ A , C , D , E , F} (UsingEC→ D)
={ A , B, C , D , E,F} (UsingA→B)
[Link],CEistheonly possible
candidate key of the relation.
TotalNumberofSuper Keys-
Thereare2essentialattributes-Cand E.
Remaining4 attributesarenon-essential attributes.
Essential attributes will be definitelypresent in every key.
Non-essential attributesmayor maynot bepresent ineverysuperkey.
C E ABDF
Problem-02:ConsidertherelationschemeR(E,F,G,H,I,J,K,L,M,N)andtheset
offunctionaldependencies: {E, F} → { G}, { F} → {I, J}, { E, H} → { K,L},
{ K} → { M} and {L}→ {N}.Determine thecandidatekey(s).
Step-02:
We will check if the essential attributes together can determine all remaining on-
essential attributes.
To check,we find the closure of EFH.
So, we have-
1. TRANSACTION
Definition: A transaction is a single logical unit consisting of one or more database access
operation.
Example: Withdrawing 1000 rupees fromATM.
The following set of operations are performed to withdraw1000 rupees from database
2. ACID PROPERTIES
ACID properties are used for maintaining the integrityof database during transaction processing.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
Atomicity: This property ensure that either all of the tasks of a transaction are performed or
none of them. In simple words it is referred as “all or nothing rule”.
Each transaction is said to be atomic if when one part of the transaction fails, the entire
ii. Consistency: The consistency property ensures that the database must be in consistent state
before and after the transaction. There must not be any possibility that some data is incorrectly
affected by the execution of a transaction.
For example, transfering funds from one account to another, the consistency property ensures
that the total values of funds in both the accounts is the same before and end of the transaction.
i.e., Assume initially, A balance = $400 and B balance = 700$.
ThetotalbalanceofA +B =1100$(Beforetransferring100$fromA toB) The total
balance of A + B = 1100$(After transferring 100$ from A to B)
iii. Isolation:Foreverypairoftransactions,oneofthetransactionsshouldnotstartexecution
before the other transaction execution completed, if they use some common data variable. That
is, if the transaction T1 is executing and using the data item X, then transaction T2 should not
start until the transaction T1 ends, if T2 also use same data item X.
Forexample,TransactionT1: Transfer 100$from accountAto accountB
TransactionT2:Transfer150$ fromaccountBtoaccountC
Assumeinitially,A balance=Bbalance=Cbalance=$1000
TransactionT1 TransactionT2
10:00AM ReadA’sbalance ($1000) ReadB’s balance ($1000)
10:01AM Abalance=ABalance –100$ (1000-100=900$) Bbalance=B Balance –150$ (1000-150= 850$)
10:02AM ReadB’s balance ($1000) ReadC’sbalance ($1000)
10:03AM Bbalance=B Balance+100$ (1000+100= 1100$) Cbalance=C Balance+150$ (1000+150=1150$)
iv. Durability: Once a transaction completes successfully, the changes it has made into the
database should be permanent even if there is a system failure. The recovery-management
[Link],assume
account A balance = 1000$. If A withdraw 100$ today, then the A balance = 900$. After two
days or a month, A balance should be 900$, if no other transactions done on A.
3. STATES OF TRANSACTION
A transaction goes through many different states throughout its [Link] states are called
as transaction states. They are:
Active State:
Thisis the firststate in the lifecycleof a transaction.
Oncethetransactionstarts executing,thenit issaidto beinactivestate.
During this state it performs operations like READ and WRITE on some data items. All
the changes made by the transaction are now stored in the buffer in main memory. They
are not updated in database.
Committed State:
After all the changes made by the transaction have been successfully updated in the
database, it enters into a committed state and the transaction is considered to be fully
committed.
After a transaction has entered the committed state, it is not possible to roll back (undo)
the transaction. This is because the system is updated into a new consistent state and the
changes are made permanent.
Theonlywaytoundothechangesisbycarryingoutanothertransactioncalled as compensating
transaction that performs the reverse operations.
Failed State:
When a transaction is getting executed in the active state or partiallycommitted state and
some failure occurs due to which it becomes impossible to continue the execution, it
enters into a failed state.
Aborted State:
After the transaction has failed and entered into a failed state, all the changes made by it
have to be undone.
To undo the changes made by the transaction, it becomes necessary to rollback the
transaction.
After the transaction has rolledback completely,it enters into an abortedstate.
Terminated State:
Thisis thelast state inthelifecycleof a transaction.
After entering the committed state or aborted state,the transaction finally enters into a
Department of CSE, NRCM Page 84
Terminated state where its lifecycle finally comes to an end.
4. TYPES OF SCHEDULES–SERIALIZABILITY
In DBMS,schedules may be classified as
i. Serial Schedules:
All the transactions execute serially one after the other.
When one transaction executes,no other transaction is allowed to execute.
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 A=A+500
Write(A) Write(A)
Read(B) COMMIT
B=B+100 Read(A)
Write(B) A=A-100
COMMIT Write(A)
Read(A) Read(B)
A=A+500 B=B+100
Write(A) Write(B)
COMMIT COMMIT
In schedule 1, after T1 completes its execution, transaction T2 executes. So, schedule-1 is a Serial
Schedule. Similarly, in schedule-2, after T2 completes its execution, transaction T1 executes. So,
schedule -2 is also an example of a Serial Schedule.
Examples:
Schedule-1 Schedule-2
T1 T2 T1 T2
Read(A) Read(A)
A=A-100 Read(A)
Write(A) A=A-100
Read(A) Write(A)
A=A+500 A=A+500
Read(B) Read(B)
B=B+100 B=B+100
Write(B) Write(B)
COMMIT COMMIT
Write(A) Write(A)
COMMIT COMMIT
In schedule-1 and schedule-2, the two transactions T1andT2 executing [Link] operations of
T1 and T2 are interleaved. So, these schedules are Non-Serial Schedule.
iii. SerializableSchedules:
There exists a cycle in the above [Link],the schedule S is not conflict serializable.
Schedule– S
T1 T2 T3 T4
Read(X)
Write(X)
COMMIT
Write(X)
COMMIT
Write(Y)
Read(Z)
COMMIT
Read(X)
Read(Y)
COMMIT
Solution: List all the conflicting operations to determine the dependency between transactions.
R2(X),W3(X) (T2→T3)
W3(X),W1(X) (T3→T1)
W3(X),R4(X) (T3→T4)
R2(X),W1(X) (T2→T1)
W1(X),R4(X) (T1→T4)
There exists nocycle in the precedence [Link], the schedule S is conflict serializable.
View Serializability:Two schedules S1andS2 are said to be view equivalent if both of them
satisfy the following three rules:
(1) InitialRead:The first read operation on each data item inboth the schedule must be same.
For each data itemX, If first read onXisdonebytransactionTainscheduleS1,thenin schedule2 also
the first read on X must be done by transaction T a only.
(2) UpdatedRead:Itshouldbesameinboththe schedules.
IfRead(X)ofTafollowedby Write(X)ofT bin scheduleS1,thenin scheduleS2also,Read(X) of T a must
follow Write(X) of Tb ..
(3) Finalwrite:Thefinal writeoperationon each dataitem inboth theschedulemust besame.
ForeachdataitemX,ifXhasbeenupdatedatlastbytransaction T iinscheduleS1,thenin schedule S2 also,
X must be updated at last by transaction T i.
ViewSerializabilityDefinition:If agivenscheduleisviewequivalenttosomeserial
schedule,thenitiscalled asaviewserializableschedule.
Note:Everyconflictserializablescheduleis also viewserializableschedulebut not vice-versa
Problem03:Checkwhetherthe givenscheduleSisviewserializableornot
Schedule–1
T1 T2
Read(A)
Write(A)
Read(A)
Write(A)
Read(B)
Write(B)
Read(B)
Write(B)
Solution:
S
c
h
e
d
u
l
e
-
1
(
S
1
)
S
c
h
e
d
u
l
e
-
2
(
S
2
)
Note:Other way of solving it is, if weare able to prove that S1 is conflict serializable, then S1 isalso view serializable. (Refer
conflict serializable problems. Every conflict serializable schedule is also view serializable but not vice-versa.)
If the transaction complete successfully, then the database system updates the pointer db-
pointer to point to the new copy of the database; the new copy then becomes the original
copy of the database. The old copy of the database is then deleted. Figure below depicts
the scheme, showing the database state before and after the update.
Figure:Shadowcopytechniqueforatomicityand durability
6. RECOVERABILITY
During execution, if any of the transaction in a schedule is aborted, then this may leads
the database into inconsistence state. If anything goes wrong, then the completed operations in
the schedule needs to be undone. Sometimes, these undone operations may not possible. The
recoverability of schedule depends on undone operations.
Example:Considerthefollowingschedule-
T1 T2
Read(A)
Write(A)
|
| Read(A) //DirtyRead
| Write(A)
|
COMMIT
COMMIT//Delayed
Here,
T2performsa dirtyreadoperation.
Thecommitoperationof T2is delayedtillT1commitsorroll backs.
T1commitslater.
T2isnow allowedto commit.
Incase,T1wouldhavefailed,T2hasachancetorecoverbyrollingback.
7. IMPLEMENTATION OF ISOLATION
Isolation determines how transactions integrityis visible to otherusers and systems. It means
that a transaction should take place in a system in such a way that it is the only one transaction
that is accessing the resources in a database system.
Isolation level defines the degree to which a transaction must be isolated from the data
modifications made by any other transactions in the database system. The phenomena’s used to
define levels of isolation are:
a) Dirty Read
b) Non-repeatableRead
c) PhantomRead
Dirty Read: If a transaction reads a data value updated by an uncommitted transaction, then
this type of read is called as dirty read.
T1 T2
Read(A)
Write(A)
|
| Read(A) //DirtyRead
| Write(A)
| COMMIT
|
ROLLBACK
As T1 aborted, the results produced by T2 become wrong. This is because T2 read A (Dirty
Read) which is updated by T1.
Non-RepeatableRead:Non repeatableread occurs when atransaction read same data value
twice and get a different value each time. It happens when a transaction reads once before and
once after committed UPDATES from another transaction.
T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20)=30
T2: UPDATE STUDENT_DATA SET C = 15 WHEREA=100; Answer,inFirstrowCchangesto15 T1:
SELECT SUM(C ) FROM STUDENT_DATA WHEREB=5; Answer is (15+20) = 35
Phantom reads: Phantom reads occurs when atransaction read same datavalue twiceand get a
different value each time. It happens when a transaction reads once before and once after
committed INSERTS and/or DELETES from another transaction.
Non-repeatableread Phantomread
WhenT1performsecondread,thereisno WhenT1performsecondread,thenoofrows
changeinnoof rowsinthegiven table eitherincreaseordecrease.
T2performUPDATEoperationonthe T2 perform INSERT and/or DELETE
giventable operationonthegiventable
ExampleforPhantomread:
Table:STUDENT_DATAbeforeT2 Table:STUDENT_DATAafter T2
A B C A B C
100 5 10 100 5 10
101 5 20 101 5 20
102 6 30 102 6 30
103 5 25
T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20)=30
T2:INSERT INTOSTUDENT_DATAVALUES(103,5,25); Answer,inFirstrowCchangesto15
Department of CSE, NRCM Page 95
T1:SELECTSUM(C)FROMSTUDENT_DATAWHEREB=5; Answeris(10+20+25)=55
(2) Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus, it does not allow dirty read. The transaction holds a read/write lock
on the data object, and thus prevents other transactions from reading, updating or deleting it.
(3) Repeatable Read: This is the most restrictive isolation level. The transaction holds read
locks on all rows it references and writes locks on all rows it inserts, updates, or deletes.
Since other transaction cannot read, update or delete these rows, consequently it avoids non-
repeatable read. So other transactions cannot read, update or delete these data items.
(4) Serializable: This is the highest isolation level. A serializable execution is guaranteed to
be a serial schedule. Serializable execution is defined to be an execution of operations in
which concurrently executing transactions appears to be serially executing.
The table given below clearly depicts the relationship between isolation levels and the read
phenomena and locks.
Isolation Level DirtyRead Non-repeatableread PhantomRead
ReadUncommitted Mayoccur Mayoccur Mayoccur
ReadCommitted Don’toccur Mayoccur Mayoccur
RepeatableRead Don’toccur Don’toccur Mayoccur
Serializable Don’toccur Don’toccur Don’toccur
Fromtheabovetable, itisclear thatserializableisolationlevel isbetterthan others.
8. CONCURRENCY CONTROL
Concurrencyis theabilityof adatabasetoexecutemultiple transactions simultaneously.
Concurrency controlisamechanismtomanagethesimultaneously executingmultiple
transactions such that no transaction interfere with other transaction.
Executingmultipletransactionsconcurrentlyimprovesthesystem performance.
Concurrencycontrolincreasesthethroughputand reduces waitingtimeoftransactions.
If Concurrency Control is not done, then it may leads to problems like lost updates, dirty
read, non-repeatable read, phantom read etc. (Refer section7 for more details)
Lost Updates: It occur when two transactions update same data item at thesame time. In
Department of CSE, NRCM Page 96
this the first write is lost and only the second write is visible.
ConcurrencycontrolProtocols:
Theconcurrencycan be controlled with thehelpofthe followingProtocols
(1) Lock-BasedProtocol
(2) Timestamp-BasedProtocol
(3) Validation-BasedProtocol
9. LOCK-BASED PROTOCOL
Lock assures that one transaction should not retrieve or update a record which another
transaction is updating.
For example, traffic at junction, there aresignals which indicate stop and go. When one side
signal is green (vehicles allowed passing), then other side signals are red (locked. Vehicles
not allowed passing). Similarly, in database transaction when one transaction operations are
under execution, the other transactions are locked.
If at a junction, green signal is given to more than one side, then there may be chances of
accidents. Similarly, in database transactions, if the lockingis not done properly, then it will
display the inconsistent and corrupt data.
Therearetwolock modes: (1).SharedLock (2).ExclusiveLock
[Link],thenTi
can only read A but not write into A. Shared lock is requested using lock-S [Link]
Locksare represented by X. If a transaction Tiapply exclusive lock on data item
A,[Link]-X instruction.
LockCompatibilityMatrix:
LockCompatibilityMatrixcontrolswhethermultipletransactionscanacquirelockson the
same resource at the same time.
TransactionTiapplied
Shared Exclusive
TransactionTj Shared √ X
request for Exclusive X X
If a transaction Tiapplied shared lock on data item A, then Tjcan also be allowed to apply
shared lock on A.
If a transaction Tiapplied shared lock on data item A, then Tjis not allowed to apply
exclusive lock on A.
If a transaction Tiapplied exclusive lock on data item A, then Tjis not allowed to apply
shared lock on A.
If a transaction Tiapplied exclusive lock on data item A, then Tjis not allowed to apply
Department of CSE, NRCM Page 97
exclusive lock on it.
Any number of transactions can hold shared locks on a data item, but if any transaction
holdsanexclusivelockonadata item,thenothertransactions arenot allowedtoholdany lock on
that data [Link] a transaction wants to read a data item, it should applyshared
lock and when a transaction wants to write it should apply exclusive lock. If the lock is
not applied, then the transaction is not allowed to perform the operation.
Itisthesimplestlockingprotocol.
Itconsiderseachread/writeoperationofatransactionasindividual.
Itallowstransactionstoperformwrite/readoperationonadataitemonlyafterobtaining a lock
on that data item.
Transactionsunlock thedataitem immediatelyaftercompletingthewrite/read operation.
When atransaction needs to perform manyread and writeoperations, foreach operation
lockisappliedbeforeperformingitandreleasethelockimmediatelyaftercompletionof the
operation.
(2) Pre-claiming LockProtocol
Inpre-claimingLockProtocol,foreachtransactionalistispreparedconsistingofthe data items
and type of lock required on each of the data item.
Beforeinitiatinganexecutionofthetransaction,itrequestsDBMStoissuealltherequired locks
as per the list.
Ifallthelocksare [Link] transaction
is completed then it releases all the lock.
If all the locks are not granted then this protocol allows the transaction to rolls back and
waits until all the locks are granted.
Nooflocks
Beginof Endof
Transaction Transaction
When a transaction releases any of the acquired locks then it cannot acquire any more
newlocks. But,itcanonlyreleasetheacquiredlocksoneaftertheotherduringremaining
execution of that transaction.
Nooflocks
GrowingPhase ShrinkingPhase
Beginof Endof
Transaction Transaction
TheTwoPhaseLocking(2PL)hastwo [Link]:
Growing phase:Inthegrowingphase,anewlockonthedataitemmaybeacquiredbythe transaction, but
none can be released.(Only get new locks but no release of locks).
Example:
Time T1 T2
0 LOCK-S(A)
1 LOCK-S(A)
2 Read(A)
3 Read(A)
4 LOCK-X(B)
5 --
6 Read(B)
7 B=B +100
8 Write(B)
9 UNLOCK(A)
10 LOCK-X(C)
11 UNLOCK(B) --
12 Read(C)
13 C=C+500
14 Write(C)
15 COMMIT
16 UNLOCK(A)
17 UNLOCK(C)
18 COMMIT
Thefollowingwayshowshow unlockingandlockingworkwith2-PL.
Department of CSE, NRCM Page 99
TransactionT1: TransactionT2:
Growingphase:fromstep1-5 (Afterfirstlockonwards) Growingphase:fromstep2-11 (Afterfirstlockonwards)
Shrinkingphase:fromstep10-12(Afterfirstunlockonwards) Shrinkingphase:fromstep17-18(Afterfirstunlockonwards)
Lockpoint:at5 (Nomorenewlocks) Lockpoint:at11 (Nomorenewlocks)
GrowingPhase
BeginofTra Endof
nsaction Transaction
T3:Read(X)
X R-Timestamp(X)=TS(T2)
T2:Write(X)
(1). BasicTimestampOrdering
CheckthefollowingconditionwheneveratransactionTiissuesaRead(X)operation:
o IfW_timestamp(X)>TS(Ti)thentheoperationisrejected.
o IfW_timestamp(X)<= TS(Ti)thentheoperationisexecuted.
(ReadisnotallowedbyTi, ifanyyoungertransactionsthanTiwriteX)
o If TS(Ti) < W_ timestamp(X) then the operation is rejected and Ti is rolled back
otherwisetheoperationisexecuted.(WriteisnotallowedbyTi,ifanyyoungertransactionsthan Ti write
X and also Ti should be rolled back and restarted later)
(2) Thomas'sWriteRule
ThomasWriteRuleisatimestamp-basedconcurrencycontrolprotocolwhichignores outdated writes.
It follows the following steps:
RejectandRollbackT1
To perform the validation test, we need to know when the various phases of transaction Tatook
place. We shall therefore associate three different timestamps with transaction Ta.
(i). Start(Ta): thetime whenTa,started its execution.
(ii). Validation (Ta): the time when Tafinished its execution and started its
validation phase.
Department of CSE, NRCM Page 103
(iii). Finish (Ta):thetime whenTafinished itswrite phase.
Theserializabilityorderisdetermined bychangingthetimestampof Tas TS(T)=Validation(T).
Hence the serializability is determined at the validation process and cannot be decided in
advance. Therefore it ensures greater degree of concurrency while executing the transactions.
Cell/FieldValue
A database contains multiple tables. Each table contains multiple records. Each record contains
multiple field values. It is shown in the above figure. For example, consider Table D and Record
R2. These two are not mutually exclusive. R2 is a part of D. So granularity means differentlevels
ofdatawhere as smallerlevels arenestedinsidethehigherlevels. Insidedatabase wehave tables.
Inside table we have records. Inside record we have field values. This can be represented with a
tree as shown below.
DB
A B C D Tables
d1 d2 d3 d4 . .. Data values
... .. . ...
The larger the object size on which lock is applied, the lower the degree of concurrency
permitted. Onthe other hand, thesmallerthe object sizeon which lockis applied, thesystem has to
maintain larger number of locks. More locks cause a higher overhead and needs more disk space.
So, what is the best object size on which lock can be applied? It depends on the types of
transactions involved. If a typical transaction accesses data values from a record, it is
advantageous to have the lock to that one record. On the other hand, if a transaction typically
accesses many records in the same table, it may be better to have lock at that table.
Locking at higher levels needs lock details at lower levels. This information is provided by
additional types of locks called intention locks. The idea behind intention locks is for a
transaction to indicate, along the path from the root to the desired node, what type of lock(shared
or exclusive) it will require from one of the node’s descendants. There are three types of
intention locks:
(1) Intention-shared(IS):Itindicatesthatoneormoresharedlockswillberequestedon some
descendant node(s).
(2) Intention-exclusive (IX): It indicates that one or more exclusive locks will be requested
on some descendant node(s).
(3) Shared-intention-exclusive (SIX): It indicates that the current node is locked in shared
modebutthatoneormoreexclusivelockswillberequestedonsomedescendantnode(s).
The compatibility table of the three intention locks, the shared and exclusive locks, is shown in
Figure.
Mode IS IX S SIX X
IS Yes Yes Yes Yes No
IX Yes Yes No No No
S Yes No Yes No No
SIX Yes No No No No
X No No No No No
It uses the intention lock modes to ensure serializability. It requires that if a transaction attempts
to lock a node, then that node must follow these protocols:
Department of CSE, NRCM Page 105
TransactionT1shouldfollowthelock-compatibilitymatrix.
TransactionT1 firstlylockstheroot [Link] lockit in anymode.
IfT1currentlyhastheparentofthenodelockedineitherIXorISmode,thenthe transaction T1
will lock a node in S or IS mode only.
IfT1currentlyhastheparentofthenodelockedineitherIXorSIXmodes,thenthe transaction T1
will lock a node in X, SIX, or IX mode only.
IfT1hasnotpreviouslyunlockedanynodeonly,thentheTransactionT1canlocka node.
If T1currentlyhasnoneof thechildrenofthenode-lockedonly,thenTransactionT1will unlock
a node.
Note:Inmultiple-granularity,thelocksareacquiredintop-downorder,andlocksmustbe released in
bottom-up order.
Transaction failure: During transaction execution, if it cannot proceed further, then it needs
to abort. This is known as transaction failure. A single transaction failure may influencemany
transactions or processes. The reasons for transaction failure are:
Logicalerrors:Itoccurs duetosomecodeerrororaninternalconditionerror.
Systemerror: Itoccurs whentheDBMSitselfterminatesanactivetransactiondueto
deadlock or resource unavailability.
System crash: The system may crash due to the external factors such as interruptions in
power supply, hardware or software failure. Example:Operating System errors.
Disk failure: In early days of technology evolution, hard-disk drives or storage drives usedto
fail frequently. Disk failure occurs due to the formation of bad sectors, disk head crash,un-
reachable to the disk or any other failure which destroys all or part of disk storage.
When a system crashes, it may have many transactions being executed and many files may be
opened for them. When a DBMS recovers from a crash, it must maintain the following:
Itmustcheck thestatesofall thetransactionsthat werebeing executed.
[Link].
WhenatransactionTistartsexecution, thelogstores:<Ti, Start>
WhenatransactionTimodifiesanitemXfromoldvalueV1tonewvalueV2,thelog stores:<Ti,
X, V1, V2>
WhenthetransactionTiexecutioncompleted, thelogstores: <Ti,commit>
WhenthetransactionTiexecution aborted,the logstores:<Ti,abort>
RecoveryusingLogrecords
Whenthesystemiscrashed,thentheDBMSchecksthelogtofindwhichtransactionsneedsto be undo
and which need to be redo. There are two major techniques for recovery from non- catastrophic
transaction failures. They are deferred updates and immediate updates.
i. Deferred database modification:In this technique, all the changes done by the
transaction are saved in the system log without modifying the actual [Link] the
transaction committed, then onlythe changes are updated in the database. If a transaction
failsbeforereachingitscommitpoint,ithasnotchangedthedatabaseinanywayso
T1
T2
T3
T4
Time
Therecoverysystem reads thelogs backwards from theend to thelast checkpoint.
Itmaintainstwolists,anundo-listandaredo-list.
16. ARIES ALGORITHM (Algorithm for Recovery and Isolation Exploiting Semantics)
AlgorithmforRecoveryandIsolationExploitingSemantics(ARIES)isoneofthelogbased recovery
method. It uses the Write Ahead Log (WAL) protocol.
Startofoldestin-progress SmallestLSNassociated
transaction with dirty page Lastcheckpoint EndofLog Log Time
AnalysisPhase
Redo Phase
Undophase
Database copy is created and stored in the remote area with the help of network. This
database is periodically updated with the current database so that it will be in sync with data and
other details. This remote database can be updated manually called offline backup. It can be
backed up online where the data is updated at current and remote database simultaneously. Inthis
case, as soon as there is a failure of current database, system automatically switches to the
remote database and starts functioning. The user will not know that there was a failure.
Full backup or Normal backup: Full backup is also known as Normal backup. In this,
an exact duplicate copy of the original database is created and stored every time the
backup made. The advantage of this type of backup is that restoring the lost data is very
fast as compared to other. The disadvantage of this method is that it takes more time to
backup.
Incremental Backup:Instead of backup entire database every time, backup only the
files that have been updated since the last full [Link] this at least weekly once
normal backup has to be done. While incremental database backups do run faster, the
recovery process is a bit more complicated.
Differential backup: Differential is similar to incremental backup but the difference is
that the recovery process is simplified by not clear the archive bit. So a file that isupdated
after a normal backup will be archived every time a differential backup is run until the
next normal backup runs and clears the archive bit.
The Secondary storage devices can be fixed or removable. Fixed Storage device is an
internal storage device like hard disk that is fixed inside the computer. Storage devices that are
portable and can be taken outside the computer are termed as removable storage devices such as
CD, DVD, external hard disk, etc.
Magnetic/optical Disk:It supports random and sequential [Link] takes less access time.
Magnetic Tapes:It supports only sequential [Link] takes more access time.
In DBMS, processing a query and getting output need accessing random pages. So, disks
are preferable than magnetic tapes.
2. FILE ORGANIZATION
The database is stored as a collection of files. Each file contains a set of records. Each record
is a collection of fields. For example, a student table (or file) contains many records and each
record belongs to one student with fields (attributes) such as Name, Date of birth, class,
department, address, etc.
File organization defines how file records are mapped onto disk blocks.
The records of a file are stored in the disk blocks because a block is the unit of data transfer
between disk and memory. When the block size is larger than the record size, each block will
contain more than one record. Sometimes, some of the files may have large records that cannot
fit in one [Link] this case,we can store part of a record on one block and the rest on another. A
Department of CSE, NRCM Page 112
pointer at the end of the first block points to the block containing the remainder of the record.
File
Organization
Heap File Organization: When a file is created using Heap File Organization mechanism, the
records are stored in the file in the order in which they are inserted. So the new records are
inserted at the end of the file. In this type of organization inserting new records is more efficient.
It uses linear search to search records.
Sequential File Organization: When a file is created using Sequential File Organization
mechanism, all the records are ordered (sorted) as per the primary key value and placed in the
file. In this type of organization inserting new records is more difficult because the records need
to be sorted after inserting every new [Link] uses binary search to search records.
Hash File Organization: When a file is created using Hash File Organization mechanism, ahash
function is applied on some field of the records to calculate hash value. Based on the hash value,
the corresponding record is placed in the file.
Clustered File Organization: Clustered file organization is not considered good for large
databases. In this mechanism, related records from one or more relations are kept in a same disk
block, that is, the ordering of records is not based on primary key or search key.
3. INDEXING
If the records in the file are in sorted order, then searching will become very fast. But, in
most of the cases, records are placed in the file in the order in which they are inserted, so new
records are inserted at the end of the file. It indicates, the records are not in sorted order. In
order to make searching faster in the files with unsorted records, indexing is used.
Indexing is a data structure technique which allows you to quickly retrieve records from a
database file. An Index is a small table having only two columns. The first column contains a
copy of the primary or candidate key of a table. The second column contains a set of disk block
addresses where the record with that specific key value stored.
Department of CSE, NRCM Page 113
Indexing in DBMScan be of the following types:
Indexing
i. Primary Index
If the index is created by using the primary key of the table, then it is known as primary
indexing.
As primary keys are unique and are stored ina sorted manner,the performance of the
searching operation is quite efficient.
The primary index can be classified into two types: dense index and sparse index.
Dense index
If every record in the table has one index entry in the index table, then it is called
denseindex.
In this, the number of records (rows) in the index table is same as the number of records
(rows) in the main table.
Aseveryrecord hasoneindexentry, searchingbecomes faster.
TS TS Hyderabad KCR
AP AP Amaravathi Jagan
TN TN Madras PalaniSwamy
MH MH Bombay Thackray
Sparse index
Ifonlyfewrecordsinthetablehaveindexentriesintheindextable,thenitiscalled sparse index.
In this, the number of records (rows) in the index table is less than the number of records
(rows) in the main table.
Asnotalltherecordhaveindex entries,searchingbecomes slowforrecordsthatdoesnot have
index entries.
Multi-levelIndex:Whenthemaintablesizebecomestoolarge,creatingsecondarylevelindex
improves the search [Link] if the search process is slow; we can add one more level of
indexing and so on. This type of indexing is called multi-level index.
In above diagram we can see that, indexes are created for each department in the index
file. In the data block, the students of each department are grouped together to form the cluster.
The address in the index file points to the beginning of each cluster.
4. HASHBASED INDEXING
Hashing is a technique to directly search the location of desired data on the disk without
using index structure. Hash function is a function which takes a piece of data ( key) as input and
produces a hash valueas output which maps the data to a particular location in the hash table.
5. STATIC HASHING
Instatichashing,[Link] example
consider the hash function
f(x)= x mod 7
Foranyvalueofx, theabovefunctionproducesoneof thehashvaluefrom{0,1,2, 3,4, 5,6}. It means
static hashing maps search-key values to a fixed set of bucket addresses.
Example:Inserting10,21,16and12inhash table.
HashValue DataRecord
f(10)=10 mod7=3 0 21*
f(21)=21 mod7=0 1
f(16)=16mod 7=2 2 16*
f(12)=12 mod7=5 3 10*
4
5 12*
6
Figure5.1:Statichashing
i. Open Addressing:
o LinearProbing
o QuadraticProbing
o DoubleHashing
LinearProbing:
In linear probing, when there is a collision, we scan forwards for the next empty slot to
fill the key’s record. If you reach last slot, then start from beginning.
Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slotwith
2 is not empty. Then move to next slot (hash value 3), even it is also full, then move once
again to next slot with hash value 4. As it is empty store 23 there. This is shown in the
below diagram.
HashValue DataRecord
0 21*
3 10*
4 23*
5 12*
Figure5.2:LinearProbing
In quadratic probing, when collision occurs, it compute new hash value by taking the
original hash value and adding successive values of quadratic polynomial until an open
slot is found. If here is a collision, it use the following hash function: h(x) = ( f(x) + i2 )
mod n ,where I = 1, 2, 3, 4,….. and f(x) is initial hash value.
Example: Consider figure 1. When we try to insert 23, its hash value is 2. But the slot
with hash value 2 is not empty. Then compute new hash value as (2 +1 2) mod 7 =3, even
it is also full, and then once again compute new hash value as (2 +2 2) mod 7 = 6. As it is
empty store 23 there. This is shown in the below diagram.
Hash
DataRecord
Value
0 21*
3 10*
5 12*
6 23*
Figure5.3:Quadratic Probing
DoubleHashing
In double hashing, there are two hash functions. The second hash function is used to
provide an offset value in case the first function causes a collision. The followingfunction
is an example of double hashing: (firstHash(key) + i * secondHash(key)) % tableSize.
Use i = 1, 2, 3, …
Figure5.4:Double Probing
HashValue DataRecord
f(10)=10 mod7=3 0 21*
f(21)=21 mod7=0 1
f(23)=23 mod7=2 4
Figure5.5:Separatechainingexample
o Extendedhashing
o LinearHashing
i. Extendable hashing
In extendable hashing, a separate directory of pointers to buckets is used. The number
bitsusedindirectoryiscalledglobaldepth(gd)andnumberentriesindirectory=2 [Link] bits used
for locating the record in the buckets is called local depth(ld) and each bucket canstores up to 2ld
entries. The hash function use last few binarybits of the keyto find the bucket. If
abucketoverflows,itsplits,andiflocaldepth greaterthanglobaldepth,thenthetabledoublesin size. It is
one form of dynamic hashing.
Example: Let global depth (gd) = 2. It means the directory contains four entries. Let the local
depth (ld) of each bucket = 2. It means each bucket need two bits to perform search operation.
Let each Bucket capacity is four. Let us insert 21, 15, 28, 17, 16, 13, 19, 12, 10, 24, 25 and 11.
21 = 10101 19 = 10011
15 = 01111 12 = 01100
28 = 11100 10 = 01010
17 = 10001 24 = 11000
16 = 10000 25 = 11101
13 = 01101 11 = 01011
Local depth 2
Globaldepth 28* 16* 12* 24* Bucket A
2 2
00 21* 17* 25* 13* Bucket B
01
2
10 BucketC
11 10*
Directory 2
15* 19* 11* BucketD
Figure6.1:Extendible hashing example
Now, let us consider insertion of data entry 20* (binary 10100). Looking at directory
element00,weareledtobucketA,[Link],wehavetosplitthebucketby
Local depth 3
16* 24* BucketA
Globaldepth 3
28* 12* 20* Bucket A2
2
00 2
01 21* 17* 25* 13* Bucket B
10
2
11 BucketC
10*
Directory 2
BucketD
15*19* 11*
Figure6.2:Afterinserting20andsplittingBucket A
Local depth 3
16* 24* BucketA
Globaldepth 3
28* 12* 20* BucketA2
3
000
001 2 BucketB
010 21* 17* 25* 13*
011
100 2 BucketC
101 10*
110
111 BucketD
2
Directory 15*19* 11*
2
Directory BucketD
15* 19* 11*
Figure6.4:Afterinserting 9
KeyObservations:
ABucketwillhavemorethanonepointerspointingtoitifitslocaldepthislessthan the global
depth.
Whenoverflowconditionoccursinabucket,alltheentriesinthebucketarerehashed with a
new local depth.
IfnewLocalDepthoftheoverflowingbucketisequaltotheglobaldepth,onlythenthe
directories are doubled and the global depth is incremented by 1.
Thesizeof abucketcannot bechangedafterthedata insertionprocess begins.
ii. LinearHashing
Linear hashing is a dynamic hashing technique that linearly grows or shrinks number of
buckets in a hash file without a directory as used in Extendible Hashing. It uses a family of hash
functions instead of single hash function.
Example: Let N = 4, so we use 4 buckets and hash function h0(key) = key % 4 is used to map
any key into one of the four buckets. Let us initially insert 4, 13, 19, 25, 14, 24, 15, 18, 23, 11,
16, 12 and [Link] is shown in the below figure.
Next,when27isinserted,[Link],bucket0(firstbucket)issplited
anditscontent isdistributed betweenbucket 0andnew bucket. This isshown inbelow figure.
4 100 00 4* 12*
5 101 01 25*
6 110 10 14* 30*
1 001 01 13*
5 101 01 25*
The abovementioned concept can be further expanded with the notion of the m-Way
Search Tree, where m represents the number of pointers in a particular node. If m = 3, then each
nodeofthesearchtreecontains3pointers,[Link] use mainly two
tree structure indexes in DBMS. They are:
IndexedSequentialAccessMethods(ISAM)
B+Tree
23 68 42 59
10 20 23 27 68 71 31 35 42 46 59 61
31
23 68 42 59
10 20 23 27 68 71 31 35 42 46 59 61
24 33 36
39
31
23 68 42 59
10 20 23 27 68 31 35 46 59 61
33
39
9. B+ TREE
B+ Tree is an extension of Binary Tree which allows efficient insertion, deletion and search
operations. It is used to implement indexing in DBMS. In B+ tree, data can be stored onlyon the
leaf nodes while internal nodes can store the search key values.
18
11 31 64
8 15 23 42 59 68
2 5 8 9 11 12 15 16 18 20 23 27 31 35 42 46 59 61 64 66 68 71
B+ Insertion:
Example:Insert1,5,3,7,9,2,4,6,8,10intoB+treeofanorder4.
B+treeoforder4indicatestherearemaximum3valuesinanode. Initially
Afterinserting1
1
Afterinserting5
1 5
Afterinserting3
1 3 5
5
Afterinserting7
1 3 5 7
1 3 5 7
Afterinserting9
5
1 3 5 7 9
Afterinserting2
5
1 2 3 5 7 9
1 2 3 4 5 7 9 1 2 3 4 5 7 9
Afterinserting6
3 5
1 2 3 4 5 7 9 6
3 5 7
1 2 3 4 5 6 7 9
Afterinserting8
3 5 7
1 2 3 4 5 6 7 8 9
Afterinserting10
3 5 7
1 2 3 4 5 6 7 8 9 10
3 5 7 9
1 2 3 4 5 6 7 8 9 10
3 5 9
1 2 3 4 5 6 7 8 9 10
Example:Considerthegivenbelowtree anddelete19,
19
5 14 24 33
2 3 5 7 14 16 19 20 22 24 27 29 33 34 38 39
19
5 14 24 33
2 3 5 7 14 16 20 22 24 27 29 33 34 38 39
5 14 27 33
2 3 5 7 14 16 22 24 27 29 33 34 38 39
Delete 24: Half full criteria is not satisfied after deleting 24, bringing a value from its siblings
also not possible. Therefore merge it with its right sibling and change key values in the internal
nodes.
19
5 14 33
2 3 5 7 14 16 22 27 29 33 34 38 39
Delete 5: Half full criteria is not satisfied after deleting 5, bringing a value from its siblings also
not possible. Therefore merge it with its left sibling (you can also merge with right) and change
key values in the internal nodes.
19
14 33
2 3 7 14 16 22 27 29 33 34 38 39
Delete7:Halffullcriteriaissatisfiedevenafterdeleting7,sojustdelete7fromleafnode.
17
14 33
2 3 14 16 22 27 29 33 34 38 39
3 14 16 22 27 29 33 34 38 39
While indexes can improve query execution speed, the price we pay is on index
maintenance. Update and insert operations need to update the index with new data. This means
that writes will slow down slightly with each index we add to a table. We also need to monitor
index usage and identify when an existing index is no longer needed. This allows us to keep our
indexing relevant and trim enough to ensure that we don’t waste disk space and I/O on write
operations to any unnecessary [Link] improve performance of the system,we need to do the
following:
Identifytheunusedindexesand removethem.
Identifytheminimallyused indexesand remove them.
An index that is scanned more frequently, but rarely findsthe required answer. Modifythe
index to reach the answer.
Identifytheindexes that are verysimilarand combinethem.
-oO0Oo–