Adanced Database Systems (Modified)
Adanced Database Systems (Modified)
1.1. Transaction
A transaction can be defined as a group of tasks. A single task is the minimum
processing unit which cannot be divided further. It is an action, or a series of
actions, carried out by a single user or an application program, which reads or
updates the contents of a database.
A transaction is a collection of operations that performs a single logical
function in a database application. The transaction-management component
ensures that the database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating system crashes) and
transaction failures.
Page 1 of 57
Advanced Database Management System
Page 2 of 57
Advanced Database Management System
Example of transaction
Transfer 50 birr from account A to account B This transaction can be described according
to the four properties as follows:
Atomicity - shouldn’t take money from A
Read(A)
without giving it to B
A = A - 50
Consistency - money isn’t lost or gained
Write(A)
A single transaction Isolation - other queries shouldn’t see A or B
Read(B)
change until completion
B = B+50
Durability - the money does not go back to A
Write(B)
Page 3 of 57
Advanced Database Management System
Page 4 of 57
Advanced Database Management System
Both transactions, T1 and T2, read X. T1 subtracts 5 from X and updates the
database and T1 commits. T2 also adds 5 to X (which is read before T1 updates
its value) and again updates the database and commits. At the end of both
transactions, the value of X is increased by 5.
As you can observe, the update T1 has made to X is meaningless and this is how
a lost update problem may occur.
The change made by T1 is undone because it rolls back. The expected result is a
final addition of 5 to the old value of X because already what T1 is made is
undone. But, according to the above scenario, no change is made to the value of
X though T1 is rolled back. This kind of problem is called uncommitted update
problem.
Incorrect Analysis
Observe the following transaction processing, T1 and T2.
T1 takes 5 from X and adds it to Y. But, before T1 updates Y, T2 reads X and Y.
Page 5 of 57
Advanced Database Management System
What would you expect on the sum, X+Y? According to T1, it has got no change;
but, according to T2, the sum X+Y is lesser by 5 than the sum T1 expects. This
kind of problem is called incorrect analysis problem.
Page 6 of 57
Advanced Database Management System
Page 7 of 57
Advanced Database Management System
Two-phase locking has two phases, one is growing, where all the locks are
being acquired by the transaction; and the second phase is shrinking, where
the locks held by the transaction are being released.
Page 8 of 57
Advanced Database Management System
Page 9 of 57
Advanced Database Management System
In addition, every data item is given the latest read and write-timestamp. This
lets the system know when the last ‘read and write’ operation was performed
on the data item.
The timestamp-ordering protocol ensures serializability among transactions in
their conflicting read and writes operations. This is the responsibility of the
protocol system that the conflicting pair of tasks should be executed according
to the timestamp values of the transactions.
The timestamp of transaction Ti is denoted as TS(Ti).
Read time-stamp of data-item X is denoted by R-timestamp(X).
Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows −
If a transaction Ti issues a read(X) operation −
o If TS(Ti) < W-timestamp(X)
Operation rejected.
o If TS(Ti) >= W-timestamp(X)
Operation executed.
o All data-item timestamps updated.
If a transaction Ti issues a write(X) operation −
o If TS(Ti) < R-timestamp(X)
Operation rejected.
o If TS(Ti) < W-timestamp(X)
Operation rejected and Ti rolled back.
o Otherwise, operation executed.
Page 10 of 57
Advanced Database Management System
Page 11 of 57
Advanced Database Management System
Schedule
Recoverable Non-Recoverable
Page 12 of 57
Advanced Database Management System
Page 13 of 57
Advanced Database Management System
Schedule S
T1 T2
Read1(A)
...
A=A+500
...
Write1(A)
...
...
Read2(A)
...
A=A-300
...
Write2(A)
In the above schedule, the two operations Write1(A) and Read2(A) are
conflicting operations because they fulfill all the above three conditions.
Steps on how to check if a serializable schedule is conflict serializable or not:
i) Identify all conflicting operations
ii) Start creating a precedence graph, also called serialization graph, by
drawing one node for each transaction
iii)Draw an edge for each conflicting pair directing from the first operation
to the second operation in the conflicting pair and this ensures the order
of execution of the operations
iv) Checking the graph, if there is cycle then the schedule is NOT conflict
serialiazable because a cycle between transactions shows that one is
dependent on the other which means the schedule is NOT serializable
Page 14 of 57
Advanced Database Management System
Page 15 of 57
Advanced Database Management System
Schedule S
T1 T2 T3
Read1(D)
… …
… Write2(D) …
Write1(D) … …
… … Write3(D)
The above schedule S is view serializable because it is view equivalent with the
serial schedule T1->T2 -> T3. It is NOT conflict serializable because there exists
a cycle between T1 and T2 in its precedence graph as shown below:
T1 T2 T3
Page 16 of 57
Advanced Database Management System
Schedule S
T1 T2
Read1(A) ...
A=A+500 ...
Write1(A) ...
... Read2(A)
... A=A-300
... Write2(A)
...
COMMIT
COMMIT
Page 17 of 57
Advanced Database Management System
Schedule S
T1 T2
Read1(A) ...
A=A+500 ...
Write1(A) ...
COMMIT
... Read2(A)
... A=A-300
... Write2(A)
COMMIT
Schedule S
T1 T2
Read1(A)
A=A+500 ...
Write1(A) ...
Write2(A)
...
...
COMMIT
... COMMIT
Page 18 of 57
Advanced Database Management System
Schedule S
T1 T2
Read1(A) ...
A=A+500 ...
Write1(A) ...
COMMIT ...
... Write2(A)
...
...
COMMIT
Page 19 of 57
Advanced Database Management System
The differences are the level of locking they use; setting those flags on and off
costs time and resources. If you lock the whole database, then you have a serial
batch processing system since only one transaction at a time is active. In
practice, you would do this only for system maintenance work—there are no
other transactions that involve the whole database.
If you lock at the table level, then performance can suffer because users must
wait for the most common tables to become available. However, there are
transactions that do involve the whole table and this will use only one flag.
If you lock the table at the row level, then other users can get to the rest of the
table and you will have the best possible shared access. You will also have a
huge number of flags to process and performance will suffer. This approach is
generally not practical.
Page locking is in between table and row locking. This approach puts a lock on
subsets of rows within the table that include the desired values. The name
comes from the fact that this is usually implemented with pages of physical
disk storage. Performance depends on the statistical distribution of data in
physical storage, but it is generally the best compromise.
Page 20 of 57
Advanced Database Management System
The transaction log is used to redo any changes made since the last backup.
But if the transaction log file is also damaged, there is no means to recover the
database.
To reduce the risk of losing both the log file and your data, it is preferable to
backup the log file and the data on a separate backup device.
Page 21 of 57
Advanced Database Management System
And there are different techniques that prevent failures. Following are some
common techniques:
• Installing reliable operating system
• Implementing strong security systems
• Using UPS and surge protectors
• RAID arrays
Page 22 of 57
Advanced Database Management System
After the system failure, the DBMS should be able to recover the transactions
follows:
• Any transaction that was running at the time of failure needs to be
undone and restarted.
• Any transaction that committed since the last checkpoint need to be
redone.
Using the above logic, the following can be done to the above five transaction
conditions:
• Transactions of type T1 need no recovery, because it is a committed
transaction.
• Transactions of type T3 or T5 need to be undone and restarted, because
they are running transactions at the time of failure.
• Transactions of type T2 and T4 need to be redone because they are
committed after the last checkpoint and before the system fails.
Page 23 of 57
Advanced Database Management System
They use two lists of transactions, UNDO transaction lists and REDO
transaction lists.
In the beginning of the recovery process, both lists are initialized. The recovery
algorithm is given below:
Page 24 of 57
Advanced Database Management System
The next entry in the log says ‘T4’ begins, and therefore it should be added to
the UNDO list:
And again, the next entry in the log says ‘T5’ begins, and therefore it should be
added to the UNDO list:
Page 25 of 57
Advanced Database Management System
Accordingly, the next entry says ‘T2 committed’, and therefore it should be
transferred from the UNDO list to the REDO list.
Accordingly, the next entry says ‘T4 committed’, and therefore it should be
transferred from the UNDO list to the REDO list.
Page 26 of 57
Advanced Database Management System
2.1. Overview
Relational database systems are expected to be equipped with a query
language that can assist its users to query the database instances. One type of
such query language is relational algebra.
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It is used
internally in a DBMS to represent a query evaluation plan. It uses operators to
perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output.
Relational algebra is performed recursively on a relation and intermediate
results are also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Page 27 of 57
Advanced Database Management System
Union
Set different
Cartesian product
Rename
σpr
Where σ stands for selection predicate and r stands for relation. p is a logic
formula which may use connectors like and, or, and not. These terms may use
relational operators like − =, ≠, ≥, <, >, ≤.
Look at the examples in the table below.
SELECT *
FROM student Selects all students whose
σfname=”Abebe” AND dep
WHERE first name is ‘Abebe’ from
= “IT”(student)
fname=”Abebe” AND IT department.
dep=”IT”;
Page 28 of 57
Advanced Database Management System
ΠA1, A2… An r
Relational
An equivalent
algebra What it does:
sql statement:
expression:
Page 29 of 57
Advanced Database Management System
It performs binary union between two given relations. The union operation is
defined as:
r ∪ s = { t | t ∈ r or t ∈ s}
r-s
It finds all the tuples that are present in r but not in s.
Example: Suppose we have two relations Books and Articles, and they have
‘a_Name’ as a common attribute that represents the name of the author.
Page 30 of 57
Advanced Database Management System
rΧs
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
r Χ s is given by
Page 31 of 57
Advanced Database Management System
R1 ⋈a1 = a2 R2
Where, R1 and R2 are relations and a1 and a2 are attributes of R1 and R2,
respectively.
The above join operation is exactly equivalent to the following combination of
select and Cartesian product operations:
Example:
Let ‘Teacher’ and ‘Course’ be two relations defined in the following way:
T_Id ….. teacher ID
C_Num … course number
CT_Id … teacher Id, which is a foreign key in the course relation
Teacher: Course:
Page 32 of 57
Advanced Database Management System
Any query with a join can always be rewritten into cross product followed by
selection.
Page 33 of 57
Advanced Database Management System
Page 34 of 57
Advanced Database Management System
Page 35 of 57
Advanced Database Management System
(1) SELECT *
FROM student
WHERE Batch_Year>1 ANDF_Name=”Abebe” ;
(2) SELECT F_Name, L_Name
FROM student
WHERE Batch_Year>1 AND Department=”IT” ;
2.5. Pipelining
Page 36 of 57
Advanced Database Management System
department
Dep_code Dep_title Dep_faculty
The pipelining technique selects the first student that scored 4.0, then it passes
the tuple of that student to the join operation, i.e. prior to selecting the rest
students that scored 4.0, and then the join operation is performed on that
student and the result is sent to the projection operator. The selection for the
next students that have CGPA of 4.0 also continues in parallel to the join and
projection operations.
Pipelining may not always be applicable. For example, if the final output of the
expression tree is to be sorted in some order pipelining can’t be applied.
Dep_title
CGPA=4.0
Page 37 of 57
department
Advanced Database Management System
3.1. Integrity
Tables − in relational data model, relations are saved in the format of Tables.
This format stores the relation among entities. A table has rows and columns,
where rows represent records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation
is called a tuple.
Relation schema − A relation schema describes the relation name table name,
attributes, and their names.
Page 38 of 57
Advanced Database Management System
Relation key − each row has one or more attributes, known as relation key,
which can identify the row in the relation table uniquely.
Attribute domain − every attribute has some pre-defined value scope, known
as attribute domain.
Keys and Candidate Keys - There must be at least one minimal subset of
attributes in the relation, which can identify a tuple uniquely. This minimal
subset of attributes is called key for that relation. If there are more than one
such minimal subset, these are called candidate keys.
Key Constraints:
Key constraints force the following two conditions:
In a relation with a key attribute, no two tuples/rows can have identical
values for key attributes.
A key attribute cannot have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain constraints:
Attributes have specific values in real-world scenario. For example, age can
only be a positive integer. Every attribute is bound to have a specific range of
Page 39 of 57
Advanced Database Management System
values. For example, age cannot be less than zero and telephone numbers
cannot contain a digit outside 0-9.
Referential integrity constraints:
Referential integrity constraints work on the concept of Foreign Keys.
Referential integrity constraint states that if a relation refers to a key attribute
of a different or same relation, then that key element must exist.
It ensures the integrity of referential relationships between tables as defined
by primary and foreign keys. In a relation between two tables, one table has a
primary key and the other a foreign key. The primary key uniquely identifies
each record in the first table. In other words, there can be only one record in
the first table with the same primary key value.
The foreign key is placed into the second table in the relationship such that the
foreign key contains a copy of the primary key value from the record in the
related table.
So, referential Integrity ensures the integrity of relationships between primary
and foreign key values in related tables. Most relational database engines use
what are often called constraints. Primary and foreign keys are both
constraints.
There are some specific circumstances to consider in terms of how Referential
Integrity is generally enforced:
A primary key table is assumed to be a parent table and a foreign key table a
child table.
When adding a new record to a child table, if a foreign key value is
entered, it must exist in the related primary key field of the parent table.
Foreign key fields can contain NULL values. Primary key field values can never
contain NULL values as they are required to be unique.
When changing a record in a parent table if the primary key is changed,
the change must be cascaded to all foreign key valued records in any
related child tables. Otherwise, the change to the parent table must be
prohibited.
Page 40 of 57
Advanced Database Management System
3.2. Security
When you think of securing your database, the following issues should be
considered:
Who gets the DBA role?
How many users will need access to the database?
Which users will need which privileges and which roles?
How will you remove users who no longer need access to the database?
Page 41 of 57
Advanced Database Management System
Database security can be formed from two levels: system level and object level.
System privileges
System privileges are general permissions to perform functions in managing
the server and the database(s). Hundreds of permissions are supported by
each database vendor, with most of those being system privileges.
Permit the grantee to perform a general database function, such as creating
new user accounts or connecting to the database.
Here are some commonly used Microsoft SQL Server system privileges:
• CREATE DATABASE: Provides the ability to create new databases on the SQL
server
Object Privileges
Object privileges are granted to users with the SQL GRANT statement and
revoked with the REVOKE statement. The database user (login) who receives
the privileges is called the grantee.
Permit the grantee to perform specific actions on specific objects, such as
selecting from the EMPLOYEES table or updating the DEPARTMENTS table.
To reduce the burden of managing privileges, most RDBMSs support storing a
group of privilege definitions as a single named object called a role. Roles may
then be granted to individual users, who then inherit all the privileges
contained in the role. RDBMSs that support roles also typically come with a
number of predefined roles. Oracle, for example, has a role called DBA that
contains all the high-powered system and object privileges a database user
needs in administering a database.
Page 42 of 57
Advanced Database Management System
Page 43 of 57
Advanced Database Management System
Encryption is the translation of data into a secret code that cannot be read with
the use of a password or secret key. Unencrypted data is called plain text,
whereas encrypted data is called cipher text.
Some encryption schemes use a symmetric key, which means that a single key
is used to both encrypt plain text and to decrypt cipher text. This form is
considered less secure compared with the use of asymmetric keys, where a
pair of keys is used— one called the public key and the other the private key.
What the public key encrypts, the private key can decrypt, and vice versa. The
names come from the expected use of the keys—the public key is given to
anyone with whom an enterprise does business, and the private key remains
confidential and internal to the enterprise.
Page 44 of 57
Advanced Database Management System
processing. The different sites may not be aware of each other and may provide
only limited facilities for cooperation in transaction processing.
Generally, in heterogeneous distributed databases, difference in the following
issues can exist:
Data models (relational, object, hierarchical, network etc)
Transaction commit protocols may be incompatible
Concurrency control techniques may vary (locking and timestamp based)
System level issues such as character set (ASCII and EBCDIC), data types,
precisions etc may vary
Replication:
Page 45 of 57
Advanced Database Management System
Fragmentation:
Fragmentation of relation r are divisions such as r1, r2… rn which contain
sufficient information to reconstruct the original relation r.
Relation fragmentation can be horizontal or vertical fragmentation.
In horizontal fragmentation, each tuple of a relation r is assigned to one or
more fragments.
And in vertical fragmentation, the schema for relation r is split into several
smaller schemas. All schemas must contain a common candidate key (or
superkey) to ensure lossless join property. A special attribute, the tuple-id
attribute may be added to each schema to serve as a candidate key.
Following is an example student relation schema to illustrate how a horizontal
and vertical schema can be formed.
Student schema:
Page 46 of 57
Advanced Database Management System
Two possible horizontal fragmentation of the student relation can be: (let’s
name them stdent1 and student2)
Two possible vertical fragmentation of the student relation can be: (let’s name
them list1 and list2)
1 Abera Tadese
2 Zinash Demere
Page 47 of 57
Advanced Database Management System
3 Mohammed Ahmed
4 Adem Ali
5 Hilina Sitota
6 Zahra Muktar
1 IT Regular
2 COTM Weekend
3 CS Regular
4 Accounting Extension
5 IT Extension
6 CS Weekend
Page 48 of 57
Advanced Database Management System
S1 S3 Page 49 of 57
course
department
Advanced Database Management System
For a query issued at site S1, the system needs to produce the result at site S1.
Possible processing strategies:
Strategy 1:
Transfer copies of all three relations to site S1 and choose a strategy for
processing the entire locally at site S1.
Strategy 2:
Transfer a copy of the course relation to site S2 and compute
at S2.
Strategy 3:
Devise similar strategies, exchanging the roles S1, S2, S3
Page 51 of 57
Advanced Database Management System
Page 52 of 57
Advanced Database Management System
class employee {
/*Variables */
string name;
string address; Page 53 of 57
date start-date;
int salary;
/* Messages */
Advanced Database Management System
Methods to read and set the other variables (start-date, salary…) can also be
added with strict encapsulation.
Methods are defined separately, i.e. outside the class definition.
Following are, for example, two of the method definitions of the above class.
int employment-length() {
return today() – start-date;
}
Inheritance:
Inheritance is the concept when a subclass inherits the definition of another
general class. This meant that an object of a subclass need not carry its own
definition of data and methods that are generic to the class of which it is a part;
it can use/inherit the general class’s data and methods. Doing so has
Page 54 of 57
Advanced Database Management System
Company Class
Department Class
Person Class
Page 56 of 57
Advanced Database Management System
Page 57 of 57