0% found this document useful (0 votes)
8 views45 pages

DBMS - Module 5 Notes

Module 5 covers advanced SQL queries, focusing on handling NULL values, nested queries, and aggregate functions. It explains the three-valued logic in SQL, the use of EXISTS and UNIQUE functions, and the importance of JOIN operations, including outer joins. Additionally, it discusses grouping and aggregation techniques to summarize data based on specific attributes.

Uploaded by

praveenmb0501
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

DBMS - Module 5 Notes

Module 5 covers advanced SQL queries, focusing on handling NULL values, nested queries, and aggregate functions. It explains the three-valued logic in SQL, the use of EXISTS and UNIQUE functions, and the importance of JOIN operations, including outer joins. Additionally, it discusses grouping and aggregation techniques to summarize data based on specific attributes.

Uploaded by

praveenmb0501
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

MODULE 5

SQL: ADVANCED QUERIES

MORE COMPLEX SQL RETRIEVAL QUERIES

COMPARISON INVOLVING NULL AND THREE VALUED LOGIC

SQL has various rules for dealing with NULL values. Consider the following examples to illustrate each of
the meanings of NULL.
 Unknown value: A person’s date of birth is not known, so it is represented by NULL in the
database. An example of the other case of unknown would be NULL for a person’s home phone
because it is not known whether or not the person has a home phone.
 Unavailable or withheld value. A person has a home phone but does not want it to be listed, so it is
withheld and represented as NULL in the database.
 Not applicable attribute. An attribute LastCollegeDegree would be NULL for a person who has no
college degrees because it does not apply to that person.
It is often not possible to determine which of the meanings is intended;
In general, each individual NULL value is considered to be different from every other NULL value in the
various database records. When a record with NULL in one of its attributes is involved in a comparison
operation, the result is considered to be UNKNOWN (it may be TRUE or it may be FALSE). Hence, SQL
uses a three-valued logic with values TRUE, FALSE, and UNKNOWN instead of the standard two-valued
(Boolean) logic with values TRUE or FALSE.
The Logic Connectives in Three Valued Logic is shown in the figure below.

The rows and columns represent the values of the results of comparison conditions, which would typically
appear in the WHERE clause of an SQL query. Each expression result would have a value of TRUE,
FALSE, or UNKNOWN. The result of combining the two values using the AND logical connective is
shown by the entries in Table 7.1(a). Table 7.1(b) shows the result of using the OR logical connective. For
example, the result of (FALSE AND UNKNOWN) is FALSE, whereas the result of (FALSE OR
UNKNOWN) is UNKNOWN. Table 7.1(c) shows the result of the NOT logical operation.
In select-project-join queries, the general rule is that only those combinations of tuples that evaluate the
logical expression in the WHERE clause of the query to TRUE are selected. Tuple combinations that
evaluate to FALSE or UNKNOWN are not selected. However, there are exceptions to that rule for certain
operations, such as outer joins.
SQL allows queries that check whether an attribute value is NULL. Rather than using = or <> to compare an
attribute value to NULL, SQL uses the comparison operators IS or IS NOT. This is because SQL considers
each NULL value as being distinct from every other NULL value, so equality comparison is not appropriate.
It follows that when a join condition is specified, tuples with NULL values for the join attributes are not
included in the result.

NESTED QUERIES, TUPLES AND SET/MULTISET COMPARISONS


 Some queries require that existing values in the database be fetched and then used in a comparison
condition. Such queries can be conveniently formulated by using nested queries.
 These nested queries can also appear in the WHERE clause or the FROM clause or the SELECT
clause or other SQL clauses as needed.
 Query 4 is formulated in Q4 without a nested query, but it can be rephrased to use nested queries as
shown in Q4A. Q4A introduces the com parison operator IN, which compares a value v with a set (or
multiset) of values V and evaluates to TRUE if v is one of the elements in V.
 In Q4A, the first nested query selects the project numbers of projects that have an employee with last
name ‘Smith’ involved as manager, whereas the second nested query selects the project numbers of
projects that have an employee with last name ‘Smith’ involved as worker. In the outer query, we use
the OR logical connective to retrieve a PROJECT tuple if the PNUMBER value of that tuple is in the
result of either nested query.
 If a nested query returns a single attribute and a single tuple, the query result will be a single (scalar)
value. In such cases, it is permissible to use = instead of IN for the comparison operator.
 SQL allows the use of tuples of values in comparisons by placing them within parentheses. To
illustrate this, consider the following query:

This query will select the Essns of all employees who work the same (project, hours) combination on
some project that employee ‘John Smith’ (whose Ssn = ‘123456789’) works on.
 In addition to the IN operator, a number of other comparison operators can be used to compare a
single value v (typically an attribute name) to a set or multiset v (typically a nested query).
 The = ANY (or = SOME) operator returns TRUE if the value v is equal to some value in the set V
and is hence equivalent to IN. The two keywords ANY and SOME have the same effect. Other
operators that can be combined with ANY (or SOME) include >, >=, <=, and <>. The keyword ALL
can also be combined with each of these operators.
 For example, the comparison condition (v > ALL V) returns TRUE if the value v is greater than all
the values in the set (or multiset) V. An example is the following query, which returns the names of
employees whose salary is greater than the salary of all the employees in department 5:

 For example, in the SELECT clause and WHERE clause of the first nested query of Q4A, a
reference to any unqualified attribute of the PROJECT relation refers to the PROJECT relation
specified in the FROM clause of the nested query. To refer to an attribute of the PROJECT relation
specified in the outer query, we specify and refer to an alias (tuple variable) for that relation. These
rules are similar to scope rules for program variables in most programming languages that allow
nested procedures and functions. To illustrate the potential ambiguity of attribute names in nested
queries, consider Query 16.

CORRELATED NESTED QUERIES


 Whenever a condition in the WHERE clause of a nested query references some attribute of a relation
declared in the outer query, the two queries are said to be correlated.
 For each EMPLOYEE tuple, evaluate the nested query, which retrieves the Essn values for all
DEPENDENT tuples with the same sex and name as that EMPLOYEE tuple; if the Ssn value of the
EMPLOYEE tuple is in the result of the nested query, then select that EMPLOYEE tuple.
Q16A is another way of writing Q16. Q16 is correlated nested query.

THE EXISTS AND UNIQUE FUNCTIONS IN SQL


 EXISTS and UNIQUE are Boolean functions that return TRUE or FALSE; hence, they can be used
in a WHERE clause condition.
 The EXISTS function in SQL is used to check whether the result of a nested query is empty
(contains no tuples) or not. The result of EXISTS is a Boolean value TRUE if the nested query result
contains at least one tuple, or FALSE if the nested query result contains no tuples.
 Query 16 in an alternative form that uses EXISTS as in Q16B:
 We can think of Q16B as follows: For each EMPLOYEE tuple, evaluate the nested query, which
retrieves all DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the
EMPLOYEE tuple; if at least one tuple EXISTS in the result of the nested query, then select that
EMPLOYEE tuple. EXISTS(Q) returns TRUE if there is at least one tuple in the result of the nested
query Q, and returns FALSE otherwise. On the other hand, NOT EXISTS(Q) returns TRUE if there
are no tuples in the result of nested query Q, and returns FALSE other wise.
 Let’s illustrate the use of NOT_EXISTS

For each EMPLOYEE tuple, the correlated nested query selects all DEPENDENT tuples whose Essn
value matches the EMPLOYEE Ssn; if the result is empty, no dependents are related to the
employee, so we select that EMPLOYEE tuple and retrieve its Fname and Lname.
 Example

 The query Q3: Retrieve the name of each employee who works on all the projects controlled by
department number 5 can be written using EXISTS and NOT EXISTS in SQL systems. One way to
write this query is to use the construct (S2 EXCEPT S1). This is shown in query 3A.
In Q3A, the first subquery (which is not correlated with the outer query) selects all projects
controlled by department 5, and the second subquery (which is correlated) selects all projects that the
particular employee being considered works on. If the set difference of the first subquery result
MINUS (EXCEPT) the second sub query result is empty, it means that the employee works on all the
projects and is therefore selected.
 The second option is shown as Q3B. Notice that we need two-level nesting in Q3B and that this
formulation is quite a bit more complex than Q3A.

In Q3B, the outer nested query selects any WORKS_ON (B) tuples whose Pno is of a project
controlled by department 5, if there is not a WORKS_ON (C) tuple with the same Pno and the same
Ssn as that of the EMPLOYEE tuple under consideration in the outer query. If no such tuple exists,
we select the EMPLOYEE tuple.
EXPLICIT SETS AND RENAMING IN SQL
It is also possible to use an explicit set of values in the WHERE clause, rather than a nested query. Such a
set is enclosed in parentheses in SQL.
 In SQL, it is possible to rename any attribute that appears in the result of a query by adding the
qualifier AS followed by the desired new name. Hence, the AS con struct can be used to alias both
attribute and relation names in general, and it can be used in appropriate parts of a query.
 For example, Q8A shows how query Q8 from Section 4.3.2 can be slightly changed to retrieve the
last name of each employee and his or her supervisor while renaming the resulting attribute names as
Employee_name and Supervisor_name.

JOINED TABLES IN SQL AND OUTER JOINS


 The concept of a joined table (or joined relation) was incorporated into SQL to permit users to
specify a table resulting from a join operation in the FROM clause of a query. This construct may be
easier to comprehend than mixing together all the select and join conditions in the WHERE clause.
 For example, consider query Q1, which retrieves the name and address of every employee who
works for the ‘Research’ department. It may be easier to specify the join of the EMPLOYEE and
DEPARTMENT relations in the WHERE clause, and then to select the desired tuples and attributes.
This can be written in SQL as in Q1A:

 The FROM clause in Q1A contains a single joined table. The attributes of such a table are all the
attributes of the first table, EMPLOYEE, followed by all the attributes of the second table,
DEPARTMENT. The concept of a joined table also allows the user to specify different types of join,
such as NATURAL JOIN and various types of OUTER JOIN. In a NATURAL JOIN on two
relations R and S, no join condition is specified.
 If the names of the join attributes are not the same in the base relations, it is possible to rename the
attributes so that they match, and then to apply NATURAL JOIN. In this case, the AS construct can
be used to rename a relation and all its attributes in the FROM clause. This is illustrated in Q1B,
where the DEPARTMENT relation is renamed as DEPT and its attributes are renamed as Dname,
Dno (to match the name of the desired join attribute Dno in the EMPLOYEE table), Mssn, and
Msdate. The implied join condition for this NATURAL JOIN is [Link] = [Link],
because this is the only pair of attributes with the same name after renaming:
 If the user requires that all employees be included, a different type of join called OUTER JOIN must
be used explicitly. There are several variations of OUTER JOIN. In the SQL standard, this is handled
by explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in Q8B:

 LEFT OUTER JOIN (every tuple in the left table must appear in the result; if it does not have a
matching tuple, it is padded with NULL values for the attributes of the right table), RIGHT OUTER
JOIN (every tuple in the right table must appear in the result; if it does not have a matching tuple, it
is padded with NULL values for the attributes of the left table), and FULL OUTER JOIN. In the
latter three options, the keyword OUTER may be omitted. If the join attributes have the same name,
one can also specify the natural join variation of outer joins by using the keyword NATURAL before
the operation (for example, NATURAL LEFT OUTER JOIN).
 The keyword CROSS JOIN is used to specify the CARTESIAN PRODUCT operation
 It is possible that one of the tables in a join may itself be a joined table. This allows the specification
of the join of three or more tables as a single joined table, which is called a multiway join. This is
illustrated in query Q2A.

 Not all SQL implementations have implemented the new syntax of joined tables. In some systems, a
different syntax was used to specify outer joins by using the comparison operators + =, = +, and + =
+ for left, right, and full outer join, respectively.

AGGREAGATE FUNCTIONS IN SQL

Grouping is used to create subgroups of tuples before summarization. Grouping and aggregation are
required in many database applications.
 A number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and AVG. The COUNT
function returns the number of tuples or values as specified in a query. The functions SUM, MAX,
MIN, and AVG can be applied to a set or multiset of numeric values and return, respectively, the
sum, maximum value, minimum value, and average (mean) of those values. These functions can be
used in the SELECT clause or in a HAVING clause.
 The functions MAX and MIN can also be used with attributes that have nonnumeric domains if the
domain values have a total ordering among one another.

 We could use AS to rename the column names in the resulting single-row table; for example, as in
Q19A.

 If we want to get the preceding aggregate function values for employees of a specific department—
say, the ‘Research’ department—we can write Query 20, where the EMPLOYEE tuples are restricted
by the WHERE clause to those employees who work for the ‘Research’ department.

 Queries 21 and 22. Retrieve the total number of employees in the company (Q21) and the number of
employees in the ‘Research’ department (Q22).

 Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of rows in the
result of the query. We may also use the COUNT function to count val ues in a column rather than
tuples, as in the next example.

 If we write COUNT(SALARY) instead of COUNT(DISTINCT SALARY) in Q23, then duplicate


values will not be eliminated. However, any tuples with NULL for SALARY will not be counted.
 to retrieve the names of all employees who have two or more dependents (Query 5), we can write the
following:
The correlated nested query counts the number of dependents that each employee has; if this is
greater than or equal to two, the employee tuple is selected.
 SQL also has aggregate functions SOME and ALL that can be applied to a col lection of Boolean
values; SOME returns TRUE if at least one element in the collection is TRUE, whereas ALL returns
TRUE if all elements in the collection are TRUE.
GROUPING: GROUP BY AND HAVING CLAUSES
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation, where the
subgroups are based on some attribute values. For example, we may want to find the average salary of
employees in each department or the number of employees who work on each project. In these cases we
need to partition the relation into nonoverlapping subsets (or groups) of tuples. Each group (partition) will
consist of the tuples that have the same value of some attribute(s), called the grouping attribute(s). We can
then apply the function to each such group independently to produce summary information about each
group.
 SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies the grouping
attributes, which should also appear in the SELECT clause, so that the value resulting from applying
each aggregate function to a group of tuples appears along with the value of the grouping attribute(s).

 In Q24, the EMPLOYEE tuples are partitioned into groups—each group having the same value for
the GROUP BY attribute Dno. Hence, each group contains the employees who work in the same
department. The COUNT and AVG functions are applied to each such group of tuples. Notice that
the SELECT clause includes only the grouping attribute and the aggregate functions to be applied on
each group of tuples. Figure below illustrates how grouping works and shows the result of Q24.
 If NULLs exist in the grouping attribute, then a separate group is created for all tuples with a NULL
value in the grouping attribute. For example, if the EMPLOYEE table had some tuples that had
NULL for the grouping attribute Dno, there would be a separate group for those tuples in the result
of Q24.

Q25 shows how we can use a join condition in conjunction with GROUP BY. In this case, the
grouping and functions are applied after the joining of the two relations in the WHERE clause.
 One more clause in SQL is HAVING. HAVING provides a condition on the summary information
regarding the group of tuples associated with each value of the grouping attributes. Only the groups
that satisfy the condition are retrieved in the result of the query. This is illustrated by Query 26.

Notice that although selection conditions in the WHERE clause limit the tuples to which functions
are applied, the HAVING clause serves to choose whole groups. Figure below illustrates the use of
HAVING and displays the result of Q26.
 In Q27, we restrict the tuples in the relation (and hence the tuples in each group) to those that satisfy
the condition specified in the WHERE clause—namely, that they work in department number 5.
 For example, suppose that we want to count the total number of employees whose salaries exceed
$40,000 in each department, but only for departments where more than five employees work. Here,
the condition (SALARY > 40000) applies only to the COUNT function in the SELECT clause.
Suppose that we write the following incorrect query:

This is incorrect because it will select only departments that have more than five employees who
each earn more than $40,000.
One way to write this query correctly is to use a nested query, as shown in Query 28.
OTHER SQL CONSTRUCTS: WITH AND CASE

The WITH clause in SQL (Structured Query Language) is a powerful feature that allows you to define a
temporary, named result set that you can reference within a single SQL statement. This temporary result set
is often referred to as a Common Table Expression (CTE).
The WITH clause allows a user to define a table that will only be used in a particular query; it is some what
similar to creating a view that will be used only in one query and then dropped. For example, we can
rewrite Q28 as Q28′:
The query performs two main steps: first, it identifies the "big" departments using the WITH clause, and
then it uses that result to filter and aggregate the main employee data.
 The WITH clause defines a temporary, named result set called BIGDEPTS
Component Code Purpose
Name Definition WITH BIGDEPTS (Dno) AS Defines a CTE named
BIGDEPTS. The column it
returns is explicitly named
Dno.
CTE Query SELECT Dno FROM Selects the department number
EMPLOYEE from the EMPLOYEE table.
Grouping GROUP BY Dno Groups the employees by their
department number.
Filtering (Big Depts) HAVING COUNT (*) > 5 Filters these groups, keeping
only those departments where
the count of employees
(COUNT(*)) is strictly greater
than 5.

The result of the BIGDEPTS CTE is a list of department numbers (Dno) that have more than five
employees.
 The main SELECT statement immediately follows the CTE definition and uses the temporary
BIGDEPTS result to perform the final calculation.

Component Code Purpose

SELECT Dno, Selects the department number and the count of


Selection
COUNT (*) employees in that final group.

Source FROM EMPLOYEE Retrieves data from the main EMPLOYEE table.
Component Code Purpose

Filters the employees based on two conditions: 1.


Filtering WHERE Salary > The employee's Salary is greater than $40,000. 2.
(Salary & 40000 AND Dno IN The employee's Dno must be present in the list of
Department) BIGDEPTS "big" departments generated by the BIGDEPTS
CTE.

Groups the filtered employees by their department


Final
GROUP BY Dno; number to calculate the final COUNT(*) for each
Aggregation
department.

The query returns a list of department numbers (Dno) and the number of employees in those departments
who satisfy both of the following conditions:
1. They earn a salary greater than $40,000.
2. They belong to a department that has more than 5 total employees.
CASE CONSTRUCT
We illustrate this with an example. Suppose we want to give employees different raise amounts depending
on which department they work for; for example, employees in department 5 get a $2,000 raise, those in
department 4 get $1,500 and those in department 1 get $3,000. Then we could re-write the update operation
as

RECURSIVE QUERIES IN SQL


An example of a recursive relationship between tuples of the same type is the relationship between an
employee and a supervisor. This relationship is described by the foreign key Super_ssn of the EMPLOYEE
relation. It relates each employee tuple (in the role of supervisee) to another employee tuple (in the role of
supervisor). An example of a recursive operation is to retrieve all supervisees of a supervisory employee e at
all levels—that is, all employees e′ directly supervised by e, all employees e′ directly supervised by each
employee e′, all employees e″′ directly supervised by each employee e″, and so on. This query can be
written as follows:
In Q29, we are defining a view SUP_EMP that will hold the result of the recursive query. The view is
initially empty. It is first loaded with the first level (supervisor, supervisee) Ssn combinations via the first
part (SELECT SupervisorS, Ssn FROM EMPLOYEE), which is called the base query. This will be
combined via UNION with each successive level of supervisees through the second part, where the view
contents are joined again with the base values to get the second level combinations, which are UNIONed
with the first level. This is repeated with successive levels until a fixed point is reached, where no more
tuples are added to the view. At this point, the result of the recursive query is in the view SUP_EMP.
SPECIFYING CONSTRAINTS AS ASSERTIONS AND ACTIONS AS TRIGGERS

In this section, we introduce two additional features of SQL: the CREATE ASSERTION statement and the
CREATE TRIGGER statement.
SPECIFYING GENERAL CONSTRAINTS AS ASSERTIONS IN SQL
The CREATE ASSERTION statement defines a condition that must always be true for the entire database.
If any database transaction attempts to violate this condition, the transaction is rolled back (rejected). For
example, to specify the constraint that the salary of an employee must not be greater than the salary of the
manager of the department that the employee works for in SQL, we can write the following assertion:

 The constraint name SALARY_CONSTRAINT is followed by the keyword CHECK, which is


followed by a condition in parentheses that must hold true on every data base state for the assertion
to be satisfied.
 Any WHERE clause condition can be used, but many constraints can be specified using the EXISTS
and NOT EXISTS style of SQL conditions. The assertion uses the NOT EXISTS clause, which is a
standard technique to enforce negative constraints (i.e., "there should not exist any case where...").
 The constraint is violated if and only if this SELECT query returns at least one row.
 The query joins three tables, treating them as separate entities:
 EMPLOYEE E: An employee whose salary is being checked (the subordinate).
 EMPLOYEE M: The employee who acts as a manager.
 DEPARTMENT D: The department table used to link the employee to their manager.
 The WHERE clause defines the conditions that, if met, constitute a violation of the rule:

Condition SQL Code Meaning

Salary [Link] > The employee's salary ([Link]) is greater than


Violation [Link] the manager's salary ([Link]).

Employee-
[Link] =
Department Links the employee (E) to their department (D).
[Link]
Link

Manager- Links the manager (M) to the department they


D.Mgr_ssn =
Department manage (D), establishing that M is the manager of
[Link]
Link the department E works in.

 Since the constraint checks that the described invalid situation does NOT exist, it successfully
enforces the rule that no employee can have a salary greater than the salary of their department
manager.
INTRODUCTION TO TRIGGERS IN SQL
Another important statement in SQL is CREATE TRIGGER. In many cases it is convenient to specify the
type of action to be taken when certain events occur and when certain conditions are satisfied. For example,
it may be useful to specify a condition that, if violated, causes some user to be informed of the violation.
A manager may want to be informed if an employee’s travel expenses exceed a certain limit by receiving a
message whenever this occurs. The action that the DBMS must take in this case is to send an appropriate
message to that user. The condition is thus used to monitor the database.
The CREATE TRIGGER statement is used to implement such actions in SQL.
Example: Suppose we want to check whenever an employee’s salary is greater than the salary of his or her
direct supervisor in the COMPANY database.
Several events can trigger this rule: inserting a new employee record, changing an employee’s salary, or
changing an employee’s supervisor. Suppose that the action to take would be to call an external stored
procedure SALARY_VIOLATION,5 which will notify the supervisor. The trigger could then be written as
in R5 below.
 The trigger is given the name SALARY_VIOLATION.
 A typical trigger which is regarded as an ECA (Event, Condition, Action) rule has three components:
o The event(s): These are usually database update operations that are explicitly applied to the
database. In this example the events are: inserting a new employee record, changing an
employee’s salary, or changing an employee’s supervisor. The person who writes the trigger
must make sure that all possible events are accounted for. In some cases, it may be necessary to
write more than one trigger to cover all possible cases. These events are specified after the
keyword BEFORE in our example, which means that the trigger should be executed before the
triggering operation is executed. An alternative is to use the keyword AFTER, which specifies
that the trigger should be executed after the operation specified in the event is completed.
o The condition that determines whether the rule action should be executed: Once the triggering
event has occurred, an optional condition may be evaluated. If no condition is specified, the
action will be executed once the event occurs. If a condition is specified, it is first evaluated, and
only if it evaluates to true will the rule action be executed. The condition is specified in the
WHEN clause of the trigger.
o The action to be taken: The action is usually a sequence of SQL statements, but it could also be a
database transaction or an external program that will be automatically executed. In this example,
the action is to execute the stored procedure INFORM_SUPERVISOR.
VIEWS (VIRTUAL TABLES IN SQL)
In this section, the concept of a view in SQL is described
CONCEPT OF A VIEW IN SQL
A view in SQL terminology is a single table that is derived from other tables.6 These other tables can be
base tables or previously defined views. A view does not necessarily exist in physical form; it is considered
to be a virtual table, in contrast to base tables, whose tuples are always physically stored in the database.
We can think of a view as a way of specifying a table that we need to reference frequently, even though it
may not exist physically. For example, referring to the COMPANY database, we may frequently issue
queries that retrieve the employee name and the project names that the employee works on. Rather than
having to specify the join of the three tables EMPLOYEE, WORKS_ON, and PROJECT every time we
issue this query, we can define a view that is specified as the result of these joins. Then we can issue queries
on the view, which are specified as single table retrievals rather than as retrievals involving two joins on
three tables. We call the EMPLOYEE, WORKS_ON, and PROJECT tables the defining tables of the view.
SPECIFICATION OF VIEWS IN SQL
In SQL, the command to specify a view is CREATE VIEW.
The view is given a (virtual) table name (or view name), a list of attribute names, and a query to specify the
contents of the view. If none of the view attributes results from applying functions or arithmetic operations,
we do not have to specify new attribute names for the view, since they would be the same as the names of
the attributes of the defining tables in the default case. The views in V1 and V2 create virtual tables whose
schemas are illustrated in Figure below.

 In V1, we did not specify any new attribute names for the view WORKS_ON1 (although we
could have); in this case, WORKS_ON1 inherits the names of the view attributes from the
defining tables EMPLOYEE, PROJECT, and WORKS_ON.
 View V2 explicitly specifies new attribute names for the view DEPT_INFO, using a one-to-one
correspondence between the attributes specified in the CREATE VIEW clause and those
specified in the SELECT clause of the query that defines the view.
 We can now specify SQL queries on a view—or virtual table—in the same way we specify
queries involving base tables. For example, to retrieve the last name and first name of all
employees who work on the ‘ProductX’ project, we can utilize the WORKS_ON1 view and
specify the query as in QV1
 A view is supposed to be always up-to-date; if we modify the tuples in the base tables on which
the view is defined, the view must automatically reflect these changes. Hence, the view does not
have to be realized or materialized at the time of view definition but rather at the time when we
specify a query on the view. It is the responsibility of the DBMS and not the user to make sure
that the view is kept up to-date.
 If we do not need a view anymore, we can use the DROP VIEW command to dispose of it. For
example, to get rid of the view V1, we can use the SQL statement in V1A:

VIEW IMPLEMENTATION, VIEW UPDATES AND INLINE VIEWS


 The problem of how a DBMS can efficiently implement a view for efficient querying is complex.
Two main approaches have been suggested.
 One strategy, called query modification, involves modifying or transforming the view query
(submitted by the user) into a query on the underlying base tables. For example, the query QV1
would be automatically modified to the following query by the DBMS:

The disadvantages of this approach are:


o Every time a user queries the view, the DBMS must substitute the view's entire definition
into the user's query, resulting in a single, very large, and potentially time-consuming
query that must be executed against the base tables.
o If the same view is queried multiple times in quick succession (or if the view is used
many times within a larger transaction), the DBMS will re-execute the entire complex
definition of the view each time. This results in redundant computation.
o The resulting expanded query is often much larger and more complex than the original
simple query against the view.
o The view itself is easy to maintain but the application relying on that view inherits the
performance cost of the complex, newly defined logic.
 The second strategy, called view materialization, involves physically creating a temporary or
permanent view table when the view is first queried or created and keeping that table on the
assumption that other queries on the view will follow. In this case, an efficient strategy for
automatically updating the view table when the base tables are updated must be developed in
order to keep the view up-to-date.
 The immediate update strategy updates a view as soon as the base tables are changed; the lazy
update strategy updates the view when needed by a view query; and the periodic update
strategy updates the view periodically.
 Issuing an INSERT, DELETE, or UPDATE command on a view table is in many cases not
possible. To illustrate potential problems with updating a view defined on multiple tables,
consider the WORKS_ON1 view, and suppose that we issue the command to update the PNAME
attribute of ‘John Smith’ from ‘ProductX’ to ‘ProductY’. This view update is shown in UV1:

These types of updates may create problems in the base relation. Here are two possible updates,
(a) and (b), on the base relations corresponding to the view update operation in UV1:

Update (a) relates ‘John Smith’ to the ‘ProductY’ PROJECT tuple instead of the ‘ProductX’
PROJECT tuple and is the most likely desired update. However, (b) would also give the desired
update effect on the view, but it accomplishes this by changing the name of the ‘ProductX’ tuple
in the PROJECT relation to ‘ProductY’.
 Some view updates may not make much sense; for example, modifying the Total_sal attribute of
the DEPT_INFO view does not make sense because Total_sal is defined to be the sum of the
individual employee salaries. This incorrect request is shown as UV2:

 A view with a single defining table is updatable if the view attributes contain the primary key of
the base relation, as well as all attributes with the NOT NULL constraint that do not have default
values specified.
 Views defined on multiple tables using joins are generally not updatable.
 Views defined using grouping and aggregate functions are not updatable.
 In SQL, the clause WITH CHECK OPTION should be added at the end of the view definition if
a view is to be updated by INSERT, DELETE, or UPDATE statements. This allows the system
to reject operations that violate the SQL rules for view updates.
VIEWS ON AUTHORIZATION MECHANISMS
Here, we will just give a couple of simple examples to illustrate how views can be used to hide certain
attributes or tuples from unauthorized users. Suppose a certain user is only allowed to see employee
information for employees who work for department 5; then we can create the following view DEPT5EMP
and grant the user the privilege to query the view but not the base table EMPLOYEE itself. This user will
only be able to retrieve employee information for employee tuples whose Dno = 5, and will not be able to
see other employee tuples when the view is queried.

In a similar manner, a view can restrict a user to only see certain columns; for example, only the first name,
last name, and address of an employee may be visible as follows:
CONCURRENCY CONTROL IN DATABASES
Concurrency control in a DBMS (Database Management System) is the mechanism used to manage
simultaneous access to the database by multiple transactions, ensuring that these operations do not interfere
with each other and that the database remains in a consistent and correct state.

TWO PHASE LOCKING TECHNIQUES FOR CONCURRENCY CONTROL

Some of the main techniques used to control concurrent execution of transactions are based on the concept
of locking data items. A lock is an essentially a temporary flag or signal associated with a piece of data (like
a row, page, or table) that indicates its current state of usage.

TYPES OF LOCKS AND SYSTEM LOCK TABLES

Several types of locks are used in concurrency control.


1. Binary Locks.
 A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A
distinct lock is associated with each database item X. If the value of the lock on X is 1, item X
cannot be accessed by a database operation that requests the item. If the value of the lock on X is 0,
the item can be accessed when requested, and the lock value is changed to 1. We refer to the current
value (or state) of the lock associated with item X as lock(X).
 Two operations, lock_item and unlock_item, are used with binary locking. A transaction requests
access to an item X by first issuing a lock_item(X) operation. If LOCK(X) = 1, the transaction is
forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is
allowed to access item X. When the transaction is through using the item, it issues an
unlock_item(X) operation, which sets LOCK(X) back to 0 (unlocks the item) so that X may be
accessed by other transactions. Hence, a binary lock enforces mutual exclusion on the data item. A
description of the lock_item(X) and unlock_item(X) operations is shown in Figure below.
 It is simple to implement a binary lock; all that is needed is a binary-valued variable, LOCK,
associated with each data item X in the database. In its simplest form, each lock can be a record with
three fields: <Data_item_name, LOCK, Locking_transaction> plus a queue for transactions that are
waiting to access the item. The system needs to maintain only these records for the items that are
currently locked in a lock table, which could be organized as a hash file on the item name. Items not
in the lock table are considered to be unlocked. The DBMS has a lock manager subsystem to keep
track of and control access to locks.
 Every transaction must obey the following rules:
o A transaction T must issue the operation lock_item(X) before any read_item(X) or
write_item(X) operations are performed in T.
o A transaction T must issue the operation unlock_item(X) after all read_item(X) and
write_item(X) operations are completed in T.
o A transaction T will not issue a lock_item(X) operation if it already holds the lock on item X.
o A transaction T will not issue an unlock_item(X) operation unless it already holds the lock on
item X.
2. Shared/Exclusive (or Read/Write) Locks.
 The preceding binary locking scheme is too restrictive for database items because at most one
transaction can hold a lock on a given item. We should allow several transactions to access the
same item X if they all access X for reading purposes only. This is because read operations on
the same item by different transactions are not conflicting. However, if a transaction is to write
an item X, it must have exclusive access to X.
 For this purpose, a different type of lock, called a multiple-mode lock, is used. In this scheme—
called shared/exclusive or read/write locks—there are three locking operations: read_lock(X),
write_lock(X), and unlock(X). A lock associated with an item X, LOCK(X), now has three
possible states: read-locked, write-locked, or unlocked. A read-locked item is also called
share-locked because other transactions are allowed to read the item, whereas a write-locked
item is called exclusive-locked because a single transaction exclusively holds the lock on the
item.
 Each record in the lock table will have four fields. The system needs to maintain lock records
only for locked items in the lock table. The value (state) of LOCK is either read locked or write-
locked, suitably coded (if we assume no records are kept in the lock table for unlocked items). If
LOCK(X) = write-locked, the value of locking_transaction(s) is a single transaction that holds
the exclusive (write) lock on X. If LOCK(X)=read-locked, the value of locking transaction(s) is
a list of one or more transactions that hold the shared (read) lock on X. The three operations
read_lock(X), write_lock(X), and unlock(X) are described in Figure below
 When we use the shared/exclusive locking scheme, the system must enforce the following rules:
o A transaction T must issue the operation read_lock(X) or write_lock(X) before any
read_item(X) operation is performed in T.
o A transaction T must issue the operation write_lock(X) before any write_item(X)
operation is performed in T.
o A transaction T must issue the operation unlock(X) after all read_item(X) and
write_item(X) operations are completed in T.
o A transaction T will not issue a read_lock(X) operation if it already holds a read (shared)
lock or a write (exclusive) lock on item X.
o A transaction T will not issue a write_lock(X) operation if it already holds a read
(shared) lock or write (exclusive) lock on item X.
o A transaction T will not issue an unlock(X) operation unless it already holds a read
(shared) lock or a write (exclusive) lock on item X.
3. Conversion (Upgrading, Downgrading) of Locks.
 Lock Conversion: Definition: Lock conversion is the process where a transaction that already
holds a lock on a data item can change the lock from one state to another (e.g., from Shared to
Exclusive). The standard, strict locking rules often prohibit a transaction from altering a lock it
already holds. Relaxing these rules allows for more dynamic transaction behavior.
 There are two types of conversion:
o Upgrading a Lock (S-Lock to X-Lock): Transaction T initially needs to read item X, so it
acquires a Shared (S) lock via a read_lock(X) operation. Later in its execution, T
decides it must write to X. T issues a write_lock(X) to upgrade the lock to Exclusive
(X). The upgrade is only granted immediately if T is the only transaction currently
holding an S-lock on X. If other transactions also hold S-locks, T must wait until they
release their locks.
o Downgrading a Lock (X-lock to S-lock): Transaction T initially modifies item X and
holds an Exclusive lock via a write_lock(X). After the modification is complete, T
might still need to read the item but no longer needs to prevent others from reading it. T
issues a read_lock(X) to downgrade the lock to S. This can increase concurrency by
immediately allowing other transactions to read X while T holds the less restrictive lock.
 If a transaction releases its lock on an item too early (i.e., before the transaction is finished), it
opens a "window of vulnerability" where a non-serializable schedule can occur. Figure below
illustrates this:
 The solution to this is Two Phase Locking. 2PL enforces a simple yet powerful rule: a
transaction cannot release any lock until it has acquired all the locks it will ever need. This
prevents the "unlocking too early" problem and ensures that once a transaction releases a lock, it
will not request any more, guaranteeing serializable execution.
GUARANTEEING SERIALIZABILITY BY TWO PHASE LOCKING
Definition: A transaction is said to follow the 2PL protocol if all locking operations (read_lock,
write_lock) precede the first unlock operation in that transaction.
This rule logically divides the transaction's lifetime into two distinct phases:
 Phase 1: Expanding or Growing Phase (Lock Acquisition): In this the transaction can acquire new
locks on data items. It cannot release any existing locks. If lock conversion is allowed, upgrading a
lock (e.g., S-lock to X-lock) must happen during this phase.
 Phase 2: Shrinking Phase (Lock Release): In this the transaction can release existing locks. It
cannot acquire any new locks. If lock conversion is allowed, downgrading a lock (X-lock to S-
lock) must happen during this phase.
Why 2PL is necessary:
 T1 and T2 in the figure above are shown to violate 2PL because a write_lock(X) or
write_lock(Y) operation occurs after an unlock(Y) or unlock(X) operation, respectively. This
interleaving creates an invalid schedule.
 The solution is shown in the figure below. By enforcing 2PL and rewriting the transactions (as T1’
and T2’ in Figure below), the locking and unlocking operations are reordered to comply with the two
phases. This forces one transaction to wait for the other, preventing the nonserializable schedule
from occurring.

Problems with 2PL


While 2PL guarantees correctness, it does so at the expense of concurrency and introduces the risk of
deadlock.
 Reduced Concurrency: The requirement to hold locks during the expanding phase can significantly
limit how much work can be performed concurrently. A transaction T might finish using an item X
early, but if T needs to lock another item Y later in its expanding phase, X must remain locked by T
until after Y is also locked. A second transaction seeking to access X is forced to wait, even though
T is completely done with X. Conversely, Y might be locked earlier than needed, forcing others to
wait unnecessarily. This waiting is the price paid for guaranteeing serializability.
 Deadlock: Forcing transactions to wait on each other due to the strict lock-holding rules of 2PL
creates the potential for deadlock, where a set of transactions are all waiting for locks held by others
in the set, resulting in a circular dependency. Deadlock requires separate detection and recovery
mechanisms.

Transaction T1 Transaction T2
Step Time Resulting Locks Held
Action Action

1 t1 LOCK_X(A) T1 holds exclusive lock on A.

2 t2 LOCK_X(B) T1 holds X(A)$, T2 holds X(B).

T1 requests X(B). T1 is blocked


3 t3 LOCK_X(B)
(waiting for T2 to release B).

T2 requests X(A). T2 is blocked


4 t4 LOCK_X(A)
(waiting for T1 to release A).

Variations of Two-Phase Locking


1. Basic 2PL: The standard protocol where locking and unlocking operations are simply separated into
expanding and shrinking phases. It guarantees serializability but not necessarily recoverability.
2. Conservative 2PL (Static 2PL): It requires a transaction to predeclare its read-set and write-set and
acquire all necessary locks before the transaction begins execution. If any needed item cannot be
immediately locked, the transaction does not lock any item and waits until all are available. This is
a deadlock-free protocol because the circular waiting condition cannot arise (a transaction either
gets all its locks or none). It is Difficult to use in practice because knowing the exact read-set and
write-set upfront is often impossible for complex, data-dependent transactions. It also limits
concurrency due to early lock acquisition.
3. Strict 2PL: A transaction T does not release any of its exclusive (write) locks (X-lock) until after it
commits or aborts. This guarantees strict schedules. This ensures that if a transaction T writes an
item, no other transaction can read or write that item until T has committed. This is vital for
recoverability (ensuring that if a transaction aborts, its changes can be undone cleanly). But it is not
deadlock-free.
4. Rigorous 2PL: A transaction T does not release any of its locks until after it commits or aborts.
This Guarantees strict schedules and is the easiest to implement because the DBMS simply
releases all locks at the transaction termination point.

When are locks Deadlock-


Variation When are locks released?
acquired? Free?

Throughout expanding
Basic 2PL Throughout shrinking phase No
phase

Conservative
All at the start Throughout shrinking phase Yes
2PL

Throughout expanding S-locks in shrinking; X-locks only at


Strict 2PL No
phase commit/abort

Throughout expanding
Rigorous 2PL ALL locks only at commit/abort No
phase

Automatic Lock Management by DBMS


The DBMS's concurrency control subsystem is responsible for automatically generating the low-level
read_lock and write_lock requests on behalf of the transaction. This is hidden from the programmer.
Process Example (Enforcing Strict 2PL)
1. If T issues read_item(X):
o The system calls read_lock(X).
o If X is write_locked by another T, T is placed in a waiting queue.
o Otherwise, the read_lock is granted, and read_item(X) executes.
2. If T issues write_item(X):
o The system calls write_lock(X).
o Case 1: X is write_locked or read_locked by another T': T is placed in a waiting queue.
o Case 2: X is read_locked only by T itself: The system upgrades the lock from S to X and
permits write_item(X).
o Case 3: X is unlocked: The write_lock is granted, and write_item(X) executes.
Problems Introduced by Locking
The reliance on locking, especially the strict variations, has two main associated costs and problems:
1. High Overhead: Every single read or write operation must be preceded by a system lock request and
lock table update, increasing execution time.
2. Deadlock: The most severe problem, where transactions get into a circular wait state.
3. Starvation: A transaction may continually be bypassed by other transactions for the resources it
needs, leading to it waiting indefinitely.
DEALING WITH DEADLOCK AND STARVATION
Deadlock: Deadlock is a state where a set of two or more transactions are in a circular waiting condition.
Each transaction is waiting for a resource (data item) that is locked by another transaction in the set, and
none can proceed. In the example shown in the figure below transaction T1' waits for item X (locked by T2',
while T2' waits for item Y (locked by T1'). Since both are waiting, neither will release its lock, and the items
become permanently inaccessible to the deadlocked transactions.

Deadlock Prevention Protocols


These methods use rules to ensure that a deadlock cycle can never form
1. Non-Timestamp Based Methods
o Conservative Two-Phase Locking (2PL): A transaction must lock all items it needs in
advance (before execution). If any item is unavailable, the transaction locks none and waits to
retry later. Deadlock-free because a transaction never waits for a lock after having acquired
others. But this is not practical, as transactions often cannot predeclare all the items they
need, and it reduces concurrency due to long waits.
o Item Ordering: All items in the database are assigned a global order, and transactions must
lock them strictly following that sequence. This method is Deadlock-free. This is impractical
for large databases, as the system or programmer must know and adhere to the global
ordering.
o No Waiting (NW): If a transaction is unable to obtain a lock, it is immediately aborted and
restarted after a delay. This is Deadlock-free because no transaction ever waits. But it can
cause excessive and needless aborts/restarts.
o Cautious Waiting (CW): Ti wants a lock held by Tj. If Tj is not blocked (not waiting for
another item), Ti is allowed to wait (blocked). If Tj is blocked, Ti aborts. This is Deadlock-
free because no transaction ever waits for a transaction that is already waiting, ensuring no
cycle can form.
2. Timestamp-Based Methods: These use a unique timestamp (TS) assigned at the start of the
transaction, where the older transaction has the smaller timestamp. They strategically choose
between blocking (waiting) and aborting to prevent cycles. An older transaction (smaller TS) has
higher priority.

Deadlock Detection:
This approach allows transactions to wait and enter a deadlock state, but the system actively checks for the
condition.
1. Wait-For Graph (WFG): A graph is maintained where each node is an executing transaction. A
directed edge (Ti -> Tj) is drawn if Ti is waiting for a lock held by Tj. A state of deadlock exists if
and only if the wait-for graph has a cycle. The system must decide how often to check for cycles
(e.g., every time a new edge is added, periodically, or when a certain number of transactions are
waiting). Once a deadlock is detected, the system must choose a victim (a transaction to abort) to
break the cycle.
2. Timeouts: If a transaction waits for a lock longer than a system-defined time limit, the system
assumes it is deadlocked and aborts it. It has Low overhead and is simple. It Can cause needless
aborts, as the transaction might not have been deadlocked.

Starvation:
Starvation occurs when a transaction is repeatedly denied access to a resource and is unable to proceed
indefinitely, while other transactions continue normally.
 This causes unfair waiting schemes (prioritizing some transactions over others) or a flawed victim
selection algorithm that repeatedly chooses the same transaction to abort.
 This problem can be solved by the following three ways
o Fair Waiting Schemes: Using a First-Come-First-Served (FCFS) queue for lock requests.
o Priority Escalation (Aging): Increasing a waiting transaction's priority over time until it
eventually gets the highest priority and proceeds.
o Timestamp Persistence: Schemes like Wait-Die and Wound-Wait prevent starvation because
an aborted transaction restarts with its original, older timestamp. This ensures its priority
against newer transactions continually increases, guaranteeing it will eventually complete.
CONCURRENCY CONTROL BASED ON TIMESTAMP ORDERING
Timestamp Ordering (TO) is an alternative to locking protocols for concurrency control.
Timestamps: A timestamp (TS) is a unique identifier generated by the DBMS for each transaction (T). It
acts as the transaction's unique ID and represents its conceptual start time.
 Timestamps are typically generated using an incrementing counter or the system's current clock time.
The key is to ensure that no two transactions receive the exact same timestamp and that the order of
the timestamps reflects the order in which the transactions were submitted.
 If transaction T1 starts before T2, then TS((T1) <TS(T2). This means the older transaction always
has the smaller timestamp.
 Concurrency control techniques based on timestamp ordering do not use locks. Consequently, they
are inherently deadlock-free, eliminating the overhead of deadlock detection and recovery.
 Unlike Two-Phase Locking (2PL), where a schedule is equivalent to some serial schedule, the TO
algorithm strictly enforces a serial schedule that follows the exact order of the transaction
timestamps.
Timestamp Ordering (TO) Algorithm:
The basic TO algorithm ensures that for any pair of conflicting operations (read/write or write/write), the
operations are executed in the order of their transaction timestamps. To achieve this, the system associates
two timestamp values with every data item X:
Basic Timestamp Ordering
The Basic Timestamp Ordering (TO) algorithm's goal is to ensure that a schedule is serializable by making it
equivalent to the particular serial schedule defined by the transactions' timestamps. For every pair of
conflicting operations (read/write), the order in which the operations are executed must follow the
timestamp order TS(T). If a transaction $T$'s operation violates this order, $T$ is immediately aborted and
rolled back. T is then resubmitted to the system as a new transaction, receiving a new, later timestamp. The
algorithm relies on two timestamps associated with every data item X:
The algorithm checks conflicting operations against the item's timestamps.
1. Write Operation Check (write_item(X)): T attempts to write X. This operation is rejected if a
younger transaction has already read or written X, as T's write would violate the time sequence.

2. Read Operation Check (read_item(X)): T attempts to read X. This operation is rejected if a younger
transaction has already written a new value, meaning T is trying to read a value that chronologically
shouldn't exist yet for T.
Problems and Limitations of this method:
1. The schedules produced by basic TO are conflict serializable.
2. The protocol does not use locks, so deadlock cannot occur.
3. Cascading Rollback (Major Problem): If T is aborted, any other transaction (T1, T2, etc.) that
subsequently read a value written by T must also be rolled back. This cascading rollback means that
the schedules produced by basic TO are not guaranteed to be recoverable, requiring additional
protocols (like Strict TO) to be enforced.
4. Although deadlock is avoided, cyclic restart can still occur if a transaction is continually chosen to be
the victim of a timestamp conflict, leading to starvation.
Strict Timestamp Ordering (Strict TO)
Strict TO is a variation designed to overcome the major drawback of Basic TO. i.e To ensure that the
schedules are both conflict serializable (correct order) and strict (easily recoverable). A strict schedule
means a transaction T cannot read or write an item X that was written by another transaction T' until T'
commits or aborts.
 The rule is that a transaction T that issues a read or write operation that would violate the timestamp
order (specifically, an operation where TS(T) > write_TS(X) is delayed.
 The operation is delayed until the transaction T' that performed the last write on X (where TS(T') =
write_TS(X)) has definitively committed or aborted.
 To achieve this delay, the system must temporarily simulate the locking of item X. This "lock"
prevents any younger transaction from reading or writing X until the writing transaction T' has
finished.
Advantages of this algorithm
 This algorithm does not cause deadlock.
 The reason is that a transaction T only waits for T' if TS(T) > TS(T'). In other words, a younger
transaction is waiting for an older transaction. Since a younger transaction never waits for another
younger transaction, a circular waiting condition (deadlock) cannot form.
Thomas’s Write Rule (TWR)
Thomas’s Write Rule is a modification to Basic TO aimed at improving concurrency by avoiding
unnecessary transaction aborts.
When transaction T attempts a write_item(X), the rules are modified, primarily dealing with the case where
T's write is already obsolete.
MULTIVERSION CONCURRENCY CONTROL TECHNIQUES
 MVCC protocols are concurrency control techniques that keep multiple versions (values) of a data
item when it is updated. When a transaction writes an item, it writes a new version of that item, and
the old version(s) are retained. When a transaction reads an item, the system chooses the appropriate
version to read.
Advantages of MVCC
The primary benefit of MVCC stems from its ability to retain old, consistent values. The biggest advantage
is that read operations that would normally be rejected or blocked in single-version systems (due to conflicts
with an ongoing write operation) can instead be accepted. The reading transaction is simply directed to read
an older, committed version of the data item, avoiding any interference with the writing transaction. This
drastically reduces read/write contention.
Storage Drawbacks and Mitigations
MVCC requires more storage space, but the necessity depends on the application:
 The obvious drawback is the need for more storage to maintain multiple versions of the data items.
 Mitigation:
o Older versions can sometimes be kept in a temporary store and discarded after they are no
longer needed by any active transaction.
o If the database already needs to maintain older versions for recovery purposes (undoing
aborted transactions) or to maintain a history of changes (e.g., in a temporal database), the
storage penalty for MVCC is minimal or nonexistent.
MVCC Protocol Schemes
MVCC is not a single protocol but a concurrency framework that can be implemented in conjunction with
other methods:
1. Multiversion Timestamp Ordering (MVTO): An extension of the Timestamp Ordering protocol that
utilizes multiple versions.
2. Multiversion Two-Phase Locking (MV2PL): An extension of the 2PL protocol that incorporates
versioning to allow readers access while a writer holds an exclusive lock.
3. Validation (Optimistic) Protocols: Often implemented by using multiple versions, where transactions
read from a consistent snapshot until validation.
4. Snapshot Isolation (SI): A widely used commercial technique that uses multiple versions (stored in a
temporary version store) to provide each transaction with a consistent snapshot of the data at its start
time.
MULTIVERSION TECHNIQUE BASED ON TIMESTAMP ORDERING (MVTO)
The MVTO protocol maintains multiple versions of each data item to allow read operations to always
succeed by finding a consistent, older version, thereby eliminating read/write conflicts.
 Data Item Structure: Instead of a single item X, the system maintains several versions: X1, X2,…..
Xk. Each version Xi stores its value and two specific timestamps:
o write_TS(Xi): The timestamp TS of the transaction that wrote the value of version Xi.
o read_TS(Xi): The largest TS of any transaction that has successfully read version Xi.
 Rules for Write Operations (write_item(X)):
o When a transaction T wants to write item X, the system checks the latest relevant version of
X to see if T's write would violate the timestamp order.
o The system first finds the version Xi that has the highest write_TS(Xi) ≤ TS(T) (i.e., the
version written by the oldest transaction that is still younger than or the same age as T).
o If read_TS(Xi) > TS(T), the transaction T is aborted and rolled back as a younger transaction
has already read the value of version Xi. If T were allowed to write now, it would be as if T's
write happened before that younger transaction's read, violating the serial order defined by the
timestamps.
o If the abort condition is not met, a new version Xj is created. Both write_TS(Xj) and
read_TS(Xj) are set to TS(T).
 Rules for Read Operations (read_item(X))
o The goal of MVTO is to ensure that a read operation always succeeds by finding the
appropriate committed version.
o The system finds the version Xi that has the highest write_TS(Xi) ≤ TS}(T). This is the
version written by the youngest transaction that is still chronologically older than T. This
ensures T sees the value that it should see based on the timestamp order. The value of Xi is
returned to T.
o read_TS(Xi) is updated to the larger of its current value and TS(T).
Recoverability and Cascading Rollback
While MVTO significantly improves concurrency, it does not inherently solve the problem of recoverability:
 Cascading Rollback: If T is rolled back after its abort, cascading rollback may still occur (any
transaction that subsequently read a version written by T must also be rolled back).
 Recoverability Protocol: To ensure the schedule is recoverable, an additional protocol must be
enforced: a transaction T should not be allowed to commit until after all transactions that wrote any
version T has read have themselves committed. This ensures that T only reads committed data,
preventing the cascading effect.
MULTIVERSION TWO PHASE LOCKING USING CERTIFY LOCKS
MV2PL modifies standard 2PL to allow reads to proceed concurrently with a single write operation,
drastically improving throughput.
MV2PL introduces a third lock mode: Certify. The state of an item X can be:
1. Read-Locked (S)
2. Write-Locked (X)
3. Certify-Locked (C)
4. Unlocked
To allow concurrent reading and writing, the system maintains two versions for each item X:
1. Committed Version (X): The committed version is the last value written by a committed transaction.
Other transactions read from this version.
2. Local Version (X'): This version is created when a transaction T acquires a Write lock (X-lock) on
X. Transaction T performs all of its write operations on this local version X', without affecting the
committed version X.
The core of MV2PL lies in the compatibility matrix , which reflects the new goals:
 A transaction (T') can request a Read lock (S) on X even if another transaction (T) holds a Write lock
(X) on X. T' is simply allowed to read the committed version $X$.
 The Certify (C) lock is not compatible with the Read lock (S). This is the crucial point for
commitment.
Certify Lock
When the writing transaction T is ready to commit its changes, it must perform a lock upgrade:
1. Lock Upgrade: T attempts to obtain a Certify lock (C) on all items it currently holds X-locks on.
2. Waiting for Reads: Since the C-lock is not compatible with the S-lock, T may be forced to delay its
commit if any other transaction is currently reading the committed version X (and thus holds an S-
lock).
3. Final Write: Once all S-locks are released and T successfully obtains the exclusive C-locks:
o The committed version X is updated with the value of the local version X'.
o The local version X' is discarded.
o The C-locks are released.
Advantages and costs

VALIDATION (OPTIMISTIC) TECHNIQUES AND SNAPSHOT CONCURRENCY


CONTROL

VALIDATION BASED (OPTIMISTIC) CONCURRENCY CONTROL

 Optimistic concurrency control is named "optimistic" because it assumes that little interference
(conflicts) will occur among concurrent transactions. Transactions are allowed to execute freely
without any checks or locks until they are ready to commit. This minimizes overhead during the
execution phase. If interference is high, many transactions will be aborted and restarted, meaning
optimistic control works poorly under heavy contention but excels when interference is low.
 The Three Phases of an Optimistic Transaction are
o Read Phase (Execution): The transaction executes all its logic, reading values from the
database. Updates are applied only to local copies (versions) of the data items kept in a private
workspace. The database is not affected yet. The system records the transaction's read-set (items
read) and write-set (items written).
o Validation Phase (Checking): The crucial phase where the system performs checks to ensure
that applying the transaction's updates will not violate serializability. The protocol checks the
transaction (Ti) against all other recently committed transactions (Tj) or concurrent transactions
that are also in their validation phase. If validation is successful, Ti is committed. If validation
fails, Ti is aborted.
o Write Phase (Commit): If Validation Succeeds, the updates from the local copies are applied to
the actual database items on disk. If Validation Fails, the updates in the local copies are
discarded, and the transaction is aborted and restarted later.
 Validation Logic: The validation phase for a transaction Ti checks three conditions against every
relevant concurrent transaction Tj to ensure that Ti's read and write operations maintain the required
serial order. If any one of the three conditions holds for all Tj's, Ti is successfully validated. The
system uses start and end times for the three phases (Read, Validation, Write) and the read_set and
write_set for each transaction to perform these checks. For transaction Ti to validate successfully
against Tj, one of the following must be true:

CONCURRENCY CONTROL BASED ON SNAPSHOT ISOLATION


Snapshot Isolation operates on the principle of providing each transaction with a consistent, committed view
(snapshot) of the database as it existed when the transaction started.
 When a transaction starts, it receives a database snapshot. All subsequent read operations for that
transaction only see the values of items that were committed in that snapshot.
 Isolation Guarantee:
o A transaction never reads uncommitted changes from another active transaction.
o A transaction always reads the same value for an item, because its view (the snapshot) is
fixed.
o A transaction won't see records inserted or deleted after its snapshot time, effectively solving
the phantom problem.
Implementation with Multiversioning
SI achieves this by utilizing a form of Multiversion Concurrency Control (MVCC):
 Write operations require write locks (X-locks) to prevent concurrent transactions from writing the
same item.
 When an item is written, the system keeps track of older versions in a temporary version store
(tempstore).
 If a transaction T starts before item X is written by T', T needs to read the older version of X. The
temporary version store allows T to access the version that existed at its start time, even if X has
been updated many times since.
 Read operations do not require read locks (S-locks). This significantly reduces the overhead
associated with Two-Phase Locking (2PL), leading to better performance, especially for read-heavy
transactions.
Anomalies and Trade-offs
While SI offers strong isolation and performance, it is not guaranteed to enforce full serializability in all
cases.
 Anomalies: SI is susceptible to rare anomalies that violate serializability. The most common is a
Write Skew, where two transactions read overlapping data and then write different, non-overlapping
data, and both commit, resulting in a final state that could not have occurred serially. These
anomalies are difficult to detect and can potentially lead to an inconsistent database state. Because
the DBMS does not guarantee full serializability, developers using basic SI must sometimes add
custom checks to their transaction logic to prevent known anomalies.
Serializable Snapshot Isolation (SSI)
To resolve the serializability issue while retaining the performance benefits of SI, a stronger variant was
developed:
 Serializable Snapshot Isolation (SSI): This is a variation that adds extra checks and restrictions to
basic SI to guarantee full serializability.
 Trade-off: DBMSs like PostgreSQL offer users a choice: use basic SI for high performance with a
risk of rare anomalies, or use SSI to ensure full serializability, trading a slight performance hit for
total correctness guarantee.
GRANULARITY OF DATA ITEMS AND MULTIPLE GRANULARITY LOCKING

GRANULARITY LEVEL CONSIDERATIONS FOR LOCKING

Definition: Data item granularity refers to the size of the data unit that the concurrency control protocol
(e.g., locking, timestamping) treats as a single, lockable item.
 Fine Granularity: Small item sizes (e.g., a single field, a record, or a tuple).
 Coarse Granularity: Large item sizes (e.g., a disk block, a page, a file, or the entire database).
The optimal choice of granularity involves balancing concurrency and system overhead.
1. The Benefit of Fine Granularity (Small Items)
o Higher Degree of Concurrency: If the data item size is small (e.g., a record), two transactions
(T and S) that need different records can proceed simultaneously, even if those records
happen to reside in the same larger physical block.
Disadvantages of Fine Granularity
 Higher System Overhead: When item size is small, the total number of items in the database is
large.
 Lock Management: The system's lock manager must handle a larger number of active locks,
requiring more storage space for the lock table.
 Processing: More lock and unlock operations are performed, increasing the processing
overhead for both locking and timestamp protocols (which must store read_TS and write_TS
for every item).
2. The Benefit of Coarse Granularity (Large Items)
 Lower System Overhead: Fewer data items mean fewer locks to manage and fewer lock/unlock
operations, leading to reduced processing and storage overhead.
3. The Cost of Coarse Granularity (Large Items)
 Lower Degree of Concurrency: Locking a large item (like an entire file) forces all other
transactions to wait, even if they only need a small, non-conflicting part of that item.
MULTIPLE GRANULARITY LEVEL LOCKING
The goal of MGL is to optimize performance for a mix of transactions—those that are short and need only a
few records, and those that are long and need entire files.
The Need for Multiple Granularity
The optimal size for a data item lock (granularity) depends on the transaction.
 Small (Short) Transactions: Need to lock only a few individual records (fine granularity).
 Large (Long) Transactions: Need to access many records, perhaps an entire file (coarse granularity).
The Efficiency Problem in Checking Locks
While MGL allows mixed-granularity locking, it creates an efficiency challenge for the lock manager:
 If a short transaction (T2) locks a single record (r1nj), and later a long transaction (T1) requests a large
file-level lock (f1), the lock manager must efficiently check all descendant nodes of f1 (pages and
records) to ensure no conflicting lock (like T2's lock on r1nj) is held.
 Traversing and checking thousands of descendant nodes to verify a single high-level lock request is
extremely time-consuming and defeats the purpose of MGL.
The Solution: Intention Locks
To make MGL practical, special intention locks are used. A transaction places these locks on ancestor nodes
(moving down the tree from the root) to declare the type of lock it intends to set on a descendant node.
Intention Lock
Abbreviation Meaning
Type

Intention- The transaction will set Shared (S) locks on one or more descendant
IS
Shared nodes.

Intention- The transaction will set Exclusive (X) locks on one or more
IX
Exclusive descendant nodes (indicating an update/write).

The transaction is locking the current node in Shared (S) mode


Shared-
(reading the entire file/page) AND it intends to set Exclusive (X)
Intention- SIX
locks on some descendant nodes (e.g., updating a few records
Exclusive
within the file).

Multiple Granularity Locking Protocol Rules


The multiple granularity locking (MGL) protocol consists of the following rules:
To illustrate the MGL protocol with the database hierarchy in Figure above, consider the following three
transactions:

Figure below shows a possible serializable schedule for these three transactions. Only the lock and unlock
operations are shown.

You might also like