DBMS - Module 5 Notes
DBMS - Module 5 Notes
SQL has various rules for dealing with NULL values. Consider the following examples to illustrate each of
the meanings of NULL.
Unknown value: A person’s date of birth is not known, so it is represented by NULL in the
database. An example of the other case of unknown would be NULL for a person’s home phone
because it is not known whether or not the person has a home phone.
Unavailable or withheld value. A person has a home phone but does not want it to be listed, so it is
withheld and represented as NULL in the database.
Not applicable attribute. An attribute LastCollegeDegree would be NULL for a person who has no
college degrees because it does not apply to that person.
It is often not possible to determine which of the meanings is intended;
In general, each individual NULL value is considered to be different from every other NULL value in the
various database records. When a record with NULL in one of its attributes is involved in a comparison
operation, the result is considered to be UNKNOWN (it may be TRUE or it may be FALSE). Hence, SQL
uses a three-valued logic with values TRUE, FALSE, and UNKNOWN instead of the standard two-valued
(Boolean) logic with values TRUE or FALSE.
The Logic Connectives in Three Valued Logic is shown in the figure below.
The rows and columns represent the values of the results of comparison conditions, which would typically
appear in the WHERE clause of an SQL query. Each expression result would have a value of TRUE,
FALSE, or UNKNOWN. The result of combining the two values using the AND logical connective is
shown by the entries in Table 7.1(a). Table 7.1(b) shows the result of using the OR logical connective. For
example, the result of (FALSE AND UNKNOWN) is FALSE, whereas the result of (FALSE OR
UNKNOWN) is UNKNOWN. Table 7.1(c) shows the result of the NOT logical operation.
In select-project-join queries, the general rule is that only those combinations of tuples that evaluate the
logical expression in the WHERE clause of the query to TRUE are selected. Tuple combinations that
evaluate to FALSE or UNKNOWN are not selected. However, there are exceptions to that rule for certain
operations, such as outer joins.
SQL allows queries that check whether an attribute value is NULL. Rather than using = or <> to compare an
attribute value to NULL, SQL uses the comparison operators IS or IS NOT. This is because SQL considers
each NULL value as being distinct from every other NULL value, so equality comparison is not appropriate.
It follows that when a join condition is specified, tuples with NULL values for the join attributes are not
included in the result.
This query will select the Essns of all employees who work the same (project, hours) combination on
some project that employee ‘John Smith’ (whose Ssn = ‘123456789’) works on.
In addition to the IN operator, a number of other comparison operators can be used to compare a
single value v (typically an attribute name) to a set or multiset v (typically a nested query).
The = ANY (or = SOME) operator returns TRUE if the value v is equal to some value in the set V
and is hence equivalent to IN. The two keywords ANY and SOME have the same effect. Other
operators that can be combined with ANY (or SOME) include >, >=, <=, and <>. The keyword ALL
can also be combined with each of these operators.
For example, the comparison condition (v > ALL V) returns TRUE if the value v is greater than all
the values in the set (or multiset) V. An example is the following query, which returns the names of
employees whose salary is greater than the salary of all the employees in department 5:
For example, in the SELECT clause and WHERE clause of the first nested query of Q4A, a
reference to any unqualified attribute of the PROJECT relation refers to the PROJECT relation
specified in the FROM clause of the nested query. To refer to an attribute of the PROJECT relation
specified in the outer query, we specify and refer to an alias (tuple variable) for that relation. These
rules are similar to scope rules for program variables in most programming languages that allow
nested procedures and functions. To illustrate the potential ambiguity of attribute names in nested
queries, consider Query 16.
For each EMPLOYEE tuple, the correlated nested query selects all DEPENDENT tuples whose Essn
value matches the EMPLOYEE Ssn; if the result is empty, no dependents are related to the
employee, so we select that EMPLOYEE tuple and retrieve its Fname and Lname.
Example
The query Q3: Retrieve the name of each employee who works on all the projects controlled by
department number 5 can be written using EXISTS and NOT EXISTS in SQL systems. One way to
write this query is to use the construct (S2 EXCEPT S1). This is shown in query 3A.
In Q3A, the first subquery (which is not correlated with the outer query) selects all projects
controlled by department 5, and the second subquery (which is correlated) selects all projects that the
particular employee being considered works on. If the set difference of the first subquery result
MINUS (EXCEPT) the second sub query result is empty, it means that the employee works on all the
projects and is therefore selected.
The second option is shown as Q3B. Notice that we need two-level nesting in Q3B and that this
formulation is quite a bit more complex than Q3A.
In Q3B, the outer nested query selects any WORKS_ON (B) tuples whose Pno is of a project
controlled by department 5, if there is not a WORKS_ON (C) tuple with the same Pno and the same
Ssn as that of the EMPLOYEE tuple under consideration in the outer query. If no such tuple exists,
we select the EMPLOYEE tuple.
EXPLICIT SETS AND RENAMING IN SQL
It is also possible to use an explicit set of values in the WHERE clause, rather than a nested query. Such a
set is enclosed in parentheses in SQL.
In SQL, it is possible to rename any attribute that appears in the result of a query by adding the
qualifier AS followed by the desired new name. Hence, the AS con struct can be used to alias both
attribute and relation names in general, and it can be used in appropriate parts of a query.
For example, Q8A shows how query Q8 from Section 4.3.2 can be slightly changed to retrieve the
last name of each employee and his or her supervisor while renaming the resulting attribute names as
Employee_name and Supervisor_name.
The FROM clause in Q1A contains a single joined table. The attributes of such a table are all the
attributes of the first table, EMPLOYEE, followed by all the attributes of the second table,
DEPARTMENT. The concept of a joined table also allows the user to specify different types of join,
such as NATURAL JOIN and various types of OUTER JOIN. In a NATURAL JOIN on two
relations R and S, no join condition is specified.
If the names of the join attributes are not the same in the base relations, it is possible to rename the
attributes so that they match, and then to apply NATURAL JOIN. In this case, the AS construct can
be used to rename a relation and all its attributes in the FROM clause. This is illustrated in Q1B,
where the DEPARTMENT relation is renamed as DEPT and its attributes are renamed as Dname,
Dno (to match the name of the desired join attribute Dno in the EMPLOYEE table), Mssn, and
Msdate. The implied join condition for this NATURAL JOIN is [Link] = [Link],
because this is the only pair of attributes with the same name after renaming:
If the user requires that all employees be included, a different type of join called OUTER JOIN must
be used explicitly. There are several variations of OUTER JOIN. In the SQL standard, this is handled
by explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in Q8B:
LEFT OUTER JOIN (every tuple in the left table must appear in the result; if it does not have a
matching tuple, it is padded with NULL values for the attributes of the right table), RIGHT OUTER
JOIN (every tuple in the right table must appear in the result; if it does not have a matching tuple, it
is padded with NULL values for the attributes of the left table), and FULL OUTER JOIN. In the
latter three options, the keyword OUTER may be omitted. If the join attributes have the same name,
one can also specify the natural join variation of outer joins by using the keyword NATURAL before
the operation (for example, NATURAL LEFT OUTER JOIN).
The keyword CROSS JOIN is used to specify the CARTESIAN PRODUCT operation
It is possible that one of the tables in a join may itself be a joined table. This allows the specification
of the join of three or more tables as a single joined table, which is called a multiway join. This is
illustrated in query Q2A.
Not all SQL implementations have implemented the new syntax of joined tables. In some systems, a
different syntax was used to specify outer joins by using the comparison operators + =, = +, and + =
+ for left, right, and full outer join, respectively.
Grouping is used to create subgroups of tuples before summarization. Grouping and aggregation are
required in many database applications.
A number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and AVG. The COUNT
function returns the number of tuples or values as specified in a query. The functions SUM, MAX,
MIN, and AVG can be applied to a set or multiset of numeric values and return, respectively, the
sum, maximum value, minimum value, and average (mean) of those values. These functions can be
used in the SELECT clause or in a HAVING clause.
The functions MAX and MIN can also be used with attributes that have nonnumeric domains if the
domain values have a total ordering among one another.
We could use AS to rename the column names in the resulting single-row table; for example, as in
Q19A.
If we want to get the preceding aggregate function values for employees of a specific department—
say, the ‘Research’ department—we can write Query 20, where the EMPLOYEE tuples are restricted
by the WHERE clause to those employees who work for the ‘Research’ department.
Queries 21 and 22. Retrieve the total number of employees in the company (Q21) and the number of
employees in the ‘Research’ department (Q22).
Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of rows in the
result of the query. We may also use the COUNT function to count val ues in a column rather than
tuples, as in the next example.
In Q24, the EMPLOYEE tuples are partitioned into groups—each group having the same value for
the GROUP BY attribute Dno. Hence, each group contains the employees who work in the same
department. The COUNT and AVG functions are applied to each such group of tuples. Notice that
the SELECT clause includes only the grouping attribute and the aggregate functions to be applied on
each group of tuples. Figure below illustrates how grouping works and shows the result of Q24.
If NULLs exist in the grouping attribute, then a separate group is created for all tuples with a NULL
value in the grouping attribute. For example, if the EMPLOYEE table had some tuples that had
NULL for the grouping attribute Dno, there would be a separate group for those tuples in the result
of Q24.
Q25 shows how we can use a join condition in conjunction with GROUP BY. In this case, the
grouping and functions are applied after the joining of the two relations in the WHERE clause.
One more clause in SQL is HAVING. HAVING provides a condition on the summary information
regarding the group of tuples associated with each value of the grouping attributes. Only the groups
that satisfy the condition are retrieved in the result of the query. This is illustrated by Query 26.
Notice that although selection conditions in the WHERE clause limit the tuples to which functions
are applied, the HAVING clause serves to choose whole groups. Figure below illustrates the use of
HAVING and displays the result of Q26.
In Q27, we restrict the tuples in the relation (and hence the tuples in each group) to those that satisfy
the condition specified in the WHERE clause—namely, that they work in department number 5.
For example, suppose that we want to count the total number of employees whose salaries exceed
$40,000 in each department, but only for departments where more than five employees work. Here,
the condition (SALARY > 40000) applies only to the COUNT function in the SELECT clause.
Suppose that we write the following incorrect query:
This is incorrect because it will select only departments that have more than five employees who
each earn more than $40,000.
One way to write this query correctly is to use a nested query, as shown in Query 28.
OTHER SQL CONSTRUCTS: WITH AND CASE
The WITH clause in SQL (Structured Query Language) is a powerful feature that allows you to define a
temporary, named result set that you can reference within a single SQL statement. This temporary result set
is often referred to as a Common Table Expression (CTE).
The WITH clause allows a user to define a table that will only be used in a particular query; it is some what
similar to creating a view that will be used only in one query and then dropped. For example, we can
rewrite Q28 as Q28′:
The query performs two main steps: first, it identifies the "big" departments using the WITH clause, and
then it uses that result to filter and aggregate the main employee data.
The WITH clause defines a temporary, named result set called BIGDEPTS
Component Code Purpose
Name Definition WITH BIGDEPTS (Dno) AS Defines a CTE named
BIGDEPTS. The column it
returns is explicitly named
Dno.
CTE Query SELECT Dno FROM Selects the department number
EMPLOYEE from the EMPLOYEE table.
Grouping GROUP BY Dno Groups the employees by their
department number.
Filtering (Big Depts) HAVING COUNT (*) > 5 Filters these groups, keeping
only those departments where
the count of employees
(COUNT(*)) is strictly greater
than 5.
The result of the BIGDEPTS CTE is a list of department numbers (Dno) that have more than five
employees.
The main SELECT statement immediately follows the CTE definition and uses the temporary
BIGDEPTS result to perform the final calculation.
Source FROM EMPLOYEE Retrieves data from the main EMPLOYEE table.
Component Code Purpose
The query returns a list of department numbers (Dno) and the number of employees in those departments
who satisfy both of the following conditions:
1. They earn a salary greater than $40,000.
2. They belong to a department that has more than 5 total employees.
CASE CONSTRUCT
We illustrate this with an example. Suppose we want to give employees different raise amounts depending
on which department they work for; for example, employees in department 5 get a $2,000 raise, those in
department 4 get $1,500 and those in department 1 get $3,000. Then we could re-write the update operation
as
In this section, we introduce two additional features of SQL: the CREATE ASSERTION statement and the
CREATE TRIGGER statement.
SPECIFYING GENERAL CONSTRAINTS AS ASSERTIONS IN SQL
The CREATE ASSERTION statement defines a condition that must always be true for the entire database.
If any database transaction attempts to violate this condition, the transaction is rolled back (rejected). For
example, to specify the constraint that the salary of an employee must not be greater than the salary of the
manager of the department that the employee works for in SQL, we can write the following assertion:
Employee-
[Link] =
Department Links the employee (E) to their department (D).
[Link]
Link
Since the constraint checks that the described invalid situation does NOT exist, it successfully
enforces the rule that no employee can have a salary greater than the salary of their department
manager.
INTRODUCTION TO TRIGGERS IN SQL
Another important statement in SQL is CREATE TRIGGER. In many cases it is convenient to specify the
type of action to be taken when certain events occur and when certain conditions are satisfied. For example,
it may be useful to specify a condition that, if violated, causes some user to be informed of the violation.
A manager may want to be informed if an employee’s travel expenses exceed a certain limit by receiving a
message whenever this occurs. The action that the DBMS must take in this case is to send an appropriate
message to that user. The condition is thus used to monitor the database.
The CREATE TRIGGER statement is used to implement such actions in SQL.
Example: Suppose we want to check whenever an employee’s salary is greater than the salary of his or her
direct supervisor in the COMPANY database.
Several events can trigger this rule: inserting a new employee record, changing an employee’s salary, or
changing an employee’s supervisor. Suppose that the action to take would be to call an external stored
procedure SALARY_VIOLATION,5 which will notify the supervisor. The trigger could then be written as
in R5 below.
The trigger is given the name SALARY_VIOLATION.
A typical trigger which is regarded as an ECA (Event, Condition, Action) rule has three components:
o The event(s): These are usually database update operations that are explicitly applied to the
database. In this example the events are: inserting a new employee record, changing an
employee’s salary, or changing an employee’s supervisor. The person who writes the trigger
must make sure that all possible events are accounted for. In some cases, it may be necessary to
write more than one trigger to cover all possible cases. These events are specified after the
keyword BEFORE in our example, which means that the trigger should be executed before the
triggering operation is executed. An alternative is to use the keyword AFTER, which specifies
that the trigger should be executed after the operation specified in the event is completed.
o The condition that determines whether the rule action should be executed: Once the triggering
event has occurred, an optional condition may be evaluated. If no condition is specified, the
action will be executed once the event occurs. If a condition is specified, it is first evaluated, and
only if it evaluates to true will the rule action be executed. The condition is specified in the
WHEN clause of the trigger.
o The action to be taken: The action is usually a sequence of SQL statements, but it could also be a
database transaction or an external program that will be automatically executed. In this example,
the action is to execute the stored procedure INFORM_SUPERVISOR.
VIEWS (VIRTUAL TABLES IN SQL)
In this section, the concept of a view in SQL is described
CONCEPT OF A VIEW IN SQL
A view in SQL terminology is a single table that is derived from other tables.6 These other tables can be
base tables or previously defined views. A view does not necessarily exist in physical form; it is considered
to be a virtual table, in contrast to base tables, whose tuples are always physically stored in the database.
We can think of a view as a way of specifying a table that we need to reference frequently, even though it
may not exist physically. For example, referring to the COMPANY database, we may frequently issue
queries that retrieve the employee name and the project names that the employee works on. Rather than
having to specify the join of the three tables EMPLOYEE, WORKS_ON, and PROJECT every time we
issue this query, we can define a view that is specified as the result of these joins. Then we can issue queries
on the view, which are specified as single table retrievals rather than as retrievals involving two joins on
three tables. We call the EMPLOYEE, WORKS_ON, and PROJECT tables the defining tables of the view.
SPECIFICATION OF VIEWS IN SQL
In SQL, the command to specify a view is CREATE VIEW.
The view is given a (virtual) table name (or view name), a list of attribute names, and a query to specify the
contents of the view. If none of the view attributes results from applying functions or arithmetic operations,
we do not have to specify new attribute names for the view, since they would be the same as the names of
the attributes of the defining tables in the default case. The views in V1 and V2 create virtual tables whose
schemas are illustrated in Figure below.
In V1, we did not specify any new attribute names for the view WORKS_ON1 (although we
could have); in this case, WORKS_ON1 inherits the names of the view attributes from the
defining tables EMPLOYEE, PROJECT, and WORKS_ON.
View V2 explicitly specifies new attribute names for the view DEPT_INFO, using a one-to-one
correspondence between the attributes specified in the CREATE VIEW clause and those
specified in the SELECT clause of the query that defines the view.
We can now specify SQL queries on a view—or virtual table—in the same way we specify
queries involving base tables. For example, to retrieve the last name and first name of all
employees who work on the ‘ProductX’ project, we can utilize the WORKS_ON1 view and
specify the query as in QV1
A view is supposed to be always up-to-date; if we modify the tuples in the base tables on which
the view is defined, the view must automatically reflect these changes. Hence, the view does not
have to be realized or materialized at the time of view definition but rather at the time when we
specify a query on the view. It is the responsibility of the DBMS and not the user to make sure
that the view is kept up to-date.
If we do not need a view anymore, we can use the DROP VIEW command to dispose of it. For
example, to get rid of the view V1, we can use the SQL statement in V1A:
These types of updates may create problems in the base relation. Here are two possible updates,
(a) and (b), on the base relations corresponding to the view update operation in UV1:
Update (a) relates ‘John Smith’ to the ‘ProductY’ PROJECT tuple instead of the ‘ProductX’
PROJECT tuple and is the most likely desired update. However, (b) would also give the desired
update effect on the view, but it accomplishes this by changing the name of the ‘ProductX’ tuple
in the PROJECT relation to ‘ProductY’.
Some view updates may not make much sense; for example, modifying the Total_sal attribute of
the DEPT_INFO view does not make sense because Total_sal is defined to be the sum of the
individual employee salaries. This incorrect request is shown as UV2:
A view with a single defining table is updatable if the view attributes contain the primary key of
the base relation, as well as all attributes with the NOT NULL constraint that do not have default
values specified.
Views defined on multiple tables using joins are generally not updatable.
Views defined using grouping and aggregate functions are not updatable.
In SQL, the clause WITH CHECK OPTION should be added at the end of the view definition if
a view is to be updated by INSERT, DELETE, or UPDATE statements. This allows the system
to reject operations that violate the SQL rules for view updates.
VIEWS ON AUTHORIZATION MECHANISMS
Here, we will just give a couple of simple examples to illustrate how views can be used to hide certain
attributes or tuples from unauthorized users. Suppose a certain user is only allowed to see employee
information for employees who work for department 5; then we can create the following view DEPT5EMP
and grant the user the privilege to query the view but not the base table EMPLOYEE itself. This user will
only be able to retrieve employee information for employee tuples whose Dno = 5, and will not be able to
see other employee tuples when the view is queried.
In a similar manner, a view can restrict a user to only see certain columns; for example, only the first name,
last name, and address of an employee may be visible as follows:
CONCURRENCY CONTROL IN DATABASES
Concurrency control in a DBMS (Database Management System) is the mechanism used to manage
simultaneous access to the database by multiple transactions, ensuring that these operations do not interfere
with each other and that the database remains in a consistent and correct state.
Some of the main techniques used to control concurrent execution of transactions are based on the concept
of locking data items. A lock is an essentially a temporary flag or signal associated with a piece of data (like
a row, page, or table) that indicates its current state of usage.
Transaction T1 Transaction T2
Step Time Resulting Locks Held
Action Action
Throughout expanding
Basic 2PL Throughout shrinking phase No
phase
Conservative
All at the start Throughout shrinking phase Yes
2PL
Throughout expanding
Rigorous 2PL ALL locks only at commit/abort No
phase
Deadlock Detection:
This approach allows transactions to wait and enter a deadlock state, but the system actively checks for the
condition.
1. Wait-For Graph (WFG): A graph is maintained where each node is an executing transaction. A
directed edge (Ti -> Tj) is drawn if Ti is waiting for a lock held by Tj. A state of deadlock exists if
and only if the wait-for graph has a cycle. The system must decide how often to check for cycles
(e.g., every time a new edge is added, periodically, or when a certain number of transactions are
waiting). Once a deadlock is detected, the system must choose a victim (a transaction to abort) to
break the cycle.
2. Timeouts: If a transaction waits for a lock longer than a system-defined time limit, the system
assumes it is deadlocked and aborts it. It has Low overhead and is simple. It Can cause needless
aborts, as the transaction might not have been deadlocked.
Starvation:
Starvation occurs when a transaction is repeatedly denied access to a resource and is unable to proceed
indefinitely, while other transactions continue normally.
This causes unfair waiting schemes (prioritizing some transactions over others) or a flawed victim
selection algorithm that repeatedly chooses the same transaction to abort.
This problem can be solved by the following three ways
o Fair Waiting Schemes: Using a First-Come-First-Served (FCFS) queue for lock requests.
o Priority Escalation (Aging): Increasing a waiting transaction's priority over time until it
eventually gets the highest priority and proceeds.
o Timestamp Persistence: Schemes like Wait-Die and Wound-Wait prevent starvation because
an aborted transaction restarts with its original, older timestamp. This ensures its priority
against newer transactions continually increases, guaranteeing it will eventually complete.
CONCURRENCY CONTROL BASED ON TIMESTAMP ORDERING
Timestamp Ordering (TO) is an alternative to locking protocols for concurrency control.
Timestamps: A timestamp (TS) is a unique identifier generated by the DBMS for each transaction (T). It
acts as the transaction's unique ID and represents its conceptual start time.
Timestamps are typically generated using an incrementing counter or the system's current clock time.
The key is to ensure that no two transactions receive the exact same timestamp and that the order of
the timestamps reflects the order in which the transactions were submitted.
If transaction T1 starts before T2, then TS((T1) <TS(T2). This means the older transaction always
has the smaller timestamp.
Concurrency control techniques based on timestamp ordering do not use locks. Consequently, they
are inherently deadlock-free, eliminating the overhead of deadlock detection and recovery.
Unlike Two-Phase Locking (2PL), where a schedule is equivalent to some serial schedule, the TO
algorithm strictly enforces a serial schedule that follows the exact order of the transaction
timestamps.
Timestamp Ordering (TO) Algorithm:
The basic TO algorithm ensures that for any pair of conflicting operations (read/write or write/write), the
operations are executed in the order of their transaction timestamps. To achieve this, the system associates
two timestamp values with every data item X:
Basic Timestamp Ordering
The Basic Timestamp Ordering (TO) algorithm's goal is to ensure that a schedule is serializable by making it
equivalent to the particular serial schedule defined by the transactions' timestamps. For every pair of
conflicting operations (read/write), the order in which the operations are executed must follow the
timestamp order TS(T). If a transaction $T$'s operation violates this order, $T$ is immediately aborted and
rolled back. T is then resubmitted to the system as a new transaction, receiving a new, later timestamp. The
algorithm relies on two timestamps associated with every data item X:
The algorithm checks conflicting operations against the item's timestamps.
1. Write Operation Check (write_item(X)): T attempts to write X. This operation is rejected if a
younger transaction has already read or written X, as T's write would violate the time sequence.
2. Read Operation Check (read_item(X)): T attempts to read X. This operation is rejected if a younger
transaction has already written a new value, meaning T is trying to read a value that chronologically
shouldn't exist yet for T.
Problems and Limitations of this method:
1. The schedules produced by basic TO are conflict serializable.
2. The protocol does not use locks, so deadlock cannot occur.
3. Cascading Rollback (Major Problem): If T is aborted, any other transaction (T1, T2, etc.) that
subsequently read a value written by T must also be rolled back. This cascading rollback means that
the schedules produced by basic TO are not guaranteed to be recoverable, requiring additional
protocols (like Strict TO) to be enforced.
4. Although deadlock is avoided, cyclic restart can still occur if a transaction is continually chosen to be
the victim of a timestamp conflict, leading to starvation.
Strict Timestamp Ordering (Strict TO)
Strict TO is a variation designed to overcome the major drawback of Basic TO. i.e To ensure that the
schedules are both conflict serializable (correct order) and strict (easily recoverable). A strict schedule
means a transaction T cannot read or write an item X that was written by another transaction T' until T'
commits or aborts.
The rule is that a transaction T that issues a read or write operation that would violate the timestamp
order (specifically, an operation where TS(T) > write_TS(X) is delayed.
The operation is delayed until the transaction T' that performed the last write on X (where TS(T') =
write_TS(X)) has definitively committed or aborted.
To achieve this delay, the system must temporarily simulate the locking of item X. This "lock"
prevents any younger transaction from reading or writing X until the writing transaction T' has
finished.
Advantages of this algorithm
This algorithm does not cause deadlock.
The reason is that a transaction T only waits for T' if TS(T) > TS(T'). In other words, a younger
transaction is waiting for an older transaction. Since a younger transaction never waits for another
younger transaction, a circular waiting condition (deadlock) cannot form.
Thomas’s Write Rule (TWR)
Thomas’s Write Rule is a modification to Basic TO aimed at improving concurrency by avoiding
unnecessary transaction aborts.
When transaction T attempts a write_item(X), the rules are modified, primarily dealing with the case where
T's write is already obsolete.
MULTIVERSION CONCURRENCY CONTROL TECHNIQUES
MVCC protocols are concurrency control techniques that keep multiple versions (values) of a data
item when it is updated. When a transaction writes an item, it writes a new version of that item, and
the old version(s) are retained. When a transaction reads an item, the system chooses the appropriate
version to read.
Advantages of MVCC
The primary benefit of MVCC stems from its ability to retain old, consistent values. The biggest advantage
is that read operations that would normally be rejected or blocked in single-version systems (due to conflicts
with an ongoing write operation) can instead be accepted. The reading transaction is simply directed to read
an older, committed version of the data item, avoiding any interference with the writing transaction. This
drastically reduces read/write contention.
Storage Drawbacks and Mitigations
MVCC requires more storage space, but the necessity depends on the application:
The obvious drawback is the need for more storage to maintain multiple versions of the data items.
Mitigation:
o Older versions can sometimes be kept in a temporary store and discarded after they are no
longer needed by any active transaction.
o If the database already needs to maintain older versions for recovery purposes (undoing
aborted transactions) or to maintain a history of changes (e.g., in a temporal database), the
storage penalty for MVCC is minimal or nonexistent.
MVCC Protocol Schemes
MVCC is not a single protocol but a concurrency framework that can be implemented in conjunction with
other methods:
1. Multiversion Timestamp Ordering (MVTO): An extension of the Timestamp Ordering protocol that
utilizes multiple versions.
2. Multiversion Two-Phase Locking (MV2PL): An extension of the 2PL protocol that incorporates
versioning to allow readers access while a writer holds an exclusive lock.
3. Validation (Optimistic) Protocols: Often implemented by using multiple versions, where transactions
read from a consistent snapshot until validation.
4. Snapshot Isolation (SI): A widely used commercial technique that uses multiple versions (stored in a
temporary version store) to provide each transaction with a consistent snapshot of the data at its start
time.
MULTIVERSION TECHNIQUE BASED ON TIMESTAMP ORDERING (MVTO)
The MVTO protocol maintains multiple versions of each data item to allow read operations to always
succeed by finding a consistent, older version, thereby eliminating read/write conflicts.
Data Item Structure: Instead of a single item X, the system maintains several versions: X1, X2,…..
Xk. Each version Xi stores its value and two specific timestamps:
o write_TS(Xi): The timestamp TS of the transaction that wrote the value of version Xi.
o read_TS(Xi): The largest TS of any transaction that has successfully read version Xi.
Rules for Write Operations (write_item(X)):
o When a transaction T wants to write item X, the system checks the latest relevant version of
X to see if T's write would violate the timestamp order.
o The system first finds the version Xi that has the highest write_TS(Xi) ≤ TS(T) (i.e., the
version written by the oldest transaction that is still younger than or the same age as T).
o If read_TS(Xi) > TS(T), the transaction T is aborted and rolled back as a younger transaction
has already read the value of version Xi. If T were allowed to write now, it would be as if T's
write happened before that younger transaction's read, violating the serial order defined by the
timestamps.
o If the abort condition is not met, a new version Xj is created. Both write_TS(Xj) and
read_TS(Xj) are set to TS(T).
Rules for Read Operations (read_item(X))
o The goal of MVTO is to ensure that a read operation always succeeds by finding the
appropriate committed version.
o The system finds the version Xi that has the highest write_TS(Xi) ≤ TS}(T). This is the
version written by the youngest transaction that is still chronologically older than T. This
ensures T sees the value that it should see based on the timestamp order. The value of Xi is
returned to T.
o read_TS(Xi) is updated to the larger of its current value and TS(T).
Recoverability and Cascading Rollback
While MVTO significantly improves concurrency, it does not inherently solve the problem of recoverability:
Cascading Rollback: If T is rolled back after its abort, cascading rollback may still occur (any
transaction that subsequently read a version written by T must also be rolled back).
Recoverability Protocol: To ensure the schedule is recoverable, an additional protocol must be
enforced: a transaction T should not be allowed to commit until after all transactions that wrote any
version T has read have themselves committed. This ensures that T only reads committed data,
preventing the cascading effect.
MULTIVERSION TWO PHASE LOCKING USING CERTIFY LOCKS
MV2PL modifies standard 2PL to allow reads to proceed concurrently with a single write operation,
drastically improving throughput.
MV2PL introduces a third lock mode: Certify. The state of an item X can be:
1. Read-Locked (S)
2. Write-Locked (X)
3. Certify-Locked (C)
4. Unlocked
To allow concurrent reading and writing, the system maintains two versions for each item X:
1. Committed Version (X): The committed version is the last value written by a committed transaction.
Other transactions read from this version.
2. Local Version (X'): This version is created when a transaction T acquires a Write lock (X-lock) on
X. Transaction T performs all of its write operations on this local version X', without affecting the
committed version X.
The core of MV2PL lies in the compatibility matrix , which reflects the new goals:
A transaction (T') can request a Read lock (S) on X even if another transaction (T) holds a Write lock
(X) on X. T' is simply allowed to read the committed version $X$.
The Certify (C) lock is not compatible with the Read lock (S). This is the crucial point for
commitment.
Certify Lock
When the writing transaction T is ready to commit its changes, it must perform a lock upgrade:
1. Lock Upgrade: T attempts to obtain a Certify lock (C) on all items it currently holds X-locks on.
2. Waiting for Reads: Since the C-lock is not compatible with the S-lock, T may be forced to delay its
commit if any other transaction is currently reading the committed version X (and thus holds an S-
lock).
3. Final Write: Once all S-locks are released and T successfully obtains the exclusive C-locks:
o The committed version X is updated with the value of the local version X'.
o The local version X' is discarded.
o The C-locks are released.
Advantages and costs
Optimistic concurrency control is named "optimistic" because it assumes that little interference
(conflicts) will occur among concurrent transactions. Transactions are allowed to execute freely
without any checks or locks until they are ready to commit. This minimizes overhead during the
execution phase. If interference is high, many transactions will be aborted and restarted, meaning
optimistic control works poorly under heavy contention but excels when interference is low.
The Three Phases of an Optimistic Transaction are
o Read Phase (Execution): The transaction executes all its logic, reading values from the
database. Updates are applied only to local copies (versions) of the data items kept in a private
workspace. The database is not affected yet. The system records the transaction's read-set (items
read) and write-set (items written).
o Validation Phase (Checking): The crucial phase where the system performs checks to ensure
that applying the transaction's updates will not violate serializability. The protocol checks the
transaction (Ti) against all other recently committed transactions (Tj) or concurrent transactions
that are also in their validation phase. If validation is successful, Ti is committed. If validation
fails, Ti is aborted.
o Write Phase (Commit): If Validation Succeeds, the updates from the local copies are applied to
the actual database items on disk. If Validation Fails, the updates in the local copies are
discarded, and the transaction is aborted and restarted later.
Validation Logic: The validation phase for a transaction Ti checks three conditions against every
relevant concurrent transaction Tj to ensure that Ti's read and write operations maintain the required
serial order. If any one of the three conditions holds for all Tj's, Ti is successfully validated. The
system uses start and end times for the three phases (Read, Validation, Write) and the read_set and
write_set for each transaction to perform these checks. For transaction Ti to validate successfully
against Tj, one of the following must be true:
Definition: Data item granularity refers to the size of the data unit that the concurrency control protocol
(e.g., locking, timestamping) treats as a single, lockable item.
Fine Granularity: Small item sizes (e.g., a single field, a record, or a tuple).
Coarse Granularity: Large item sizes (e.g., a disk block, a page, a file, or the entire database).
The optimal choice of granularity involves balancing concurrency and system overhead.
1. The Benefit of Fine Granularity (Small Items)
o Higher Degree of Concurrency: If the data item size is small (e.g., a record), two transactions
(T and S) that need different records can proceed simultaneously, even if those records
happen to reside in the same larger physical block.
Disadvantages of Fine Granularity
Higher System Overhead: When item size is small, the total number of items in the database is
large.
Lock Management: The system's lock manager must handle a larger number of active locks,
requiring more storage space for the lock table.
Processing: More lock and unlock operations are performed, increasing the processing
overhead for both locking and timestamp protocols (which must store read_TS and write_TS
for every item).
2. The Benefit of Coarse Granularity (Large Items)
Lower System Overhead: Fewer data items mean fewer locks to manage and fewer lock/unlock
operations, leading to reduced processing and storage overhead.
3. The Cost of Coarse Granularity (Large Items)
Lower Degree of Concurrency: Locking a large item (like an entire file) forces all other
transactions to wait, even if they only need a small, non-conflicting part of that item.
MULTIPLE GRANULARITY LEVEL LOCKING
The goal of MGL is to optimize performance for a mix of transactions—those that are short and need only a
few records, and those that are long and need entire files.
The Need for Multiple Granularity
The optimal size for a data item lock (granularity) depends on the transaction.
Small (Short) Transactions: Need to lock only a few individual records (fine granularity).
Large (Long) Transactions: Need to access many records, perhaps an entire file (coarse granularity).
The Efficiency Problem in Checking Locks
While MGL allows mixed-granularity locking, it creates an efficiency challenge for the lock manager:
If a short transaction (T2) locks a single record (r1nj), and later a long transaction (T1) requests a large
file-level lock (f1), the lock manager must efficiently check all descendant nodes of f1 (pages and
records) to ensure no conflicting lock (like T2's lock on r1nj) is held.
Traversing and checking thousands of descendant nodes to verify a single high-level lock request is
extremely time-consuming and defeats the purpose of MGL.
The Solution: Intention Locks
To make MGL practical, special intention locks are used. A transaction places these locks on ancestor nodes
(moving down the tree from the root) to declare the type of lock it intends to set on a descendant node.
Intention Lock
Abbreviation Meaning
Type
Intention- The transaction will set Shared (S) locks on one or more descendant
IS
Shared nodes.
Intention- The transaction will set Exclusive (X) locks on one or more
IX
Exclusive descendant nodes (indicating an update/write).
Figure below shows a possible serializable schedule for these three transactions. Only the lock and unlock
operations are shown.