0% found this document useful (0 votes)
23 views69 pages

DBMS vs File System: Key Differences

Uploaded by

pooja pande
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views69 pages

DBMS vs File System: Key Differences

Uploaded by

pooja pande
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT No 1

Sr. Uni Text of Question


no t No

1 Unit Differentiate:
1 I . GeneralfilesystemVSDBMS.

File system DBMS

File system is a collection of data. Any DBMS is a collection of data and user is not
management with the file system, user has to required to write the procedures for
write the procedures. managing the database.

File system gives the details of the data DBMS provides an abstract view of data that
representation and Storage of data. hides the details.

DBMS is efficient to use since there are


In File system storing and retrieving of data
wide varieties of sophisticated techniques to
cannot be done efficiently.
store and retrieve the data.

Concurrent access to the data in the file system


has many problems like : Reading the file while DBMS takes care of Concurrent access
other form of locking. deleting some information, using some form of locking.
updating some information

File system doesn’t provide crash recovery


DBMS has crash recovery mechanism,
mechanism.
DBMS protects user from the effects of
Eg. While we are entering some data into the file
system failures.
if System crashes then content of the file is lost

1
Protecting a file under file system is very
DBMS has a good protection mechanism.
difficult.

II. DDLandDML.

DDL DML

It stands for Data Manipulation


It stands for Data Definition Language.
Language.

It is used to create database schema and can be It is used to add, retrieve or update the
used to define some constraints as well. data.

It basically defines the column (Attributes) of It add or update the row of the table.
the table. These rows are called as tuple.

It is further classified into Procedural


It doesn’t have any further classification.
and Non-Procedural DML.

Basic command present in DDL are CREATE, BASIC command present in DML are
DROP, RENAME, ALTER etc. UPDATE, INSERT, MERGE etc.

III. ExplainDCL.
DCL(Data Control Language) : DCL includes commands such as GRANT and REVOKE
which mainly deals with the rights, permissions and other controls of the database system.

Examples of DCL commands:

 GRANT-gives user’s access privileges to database.


 REVOKE-withdraw user’s access privileges given by using the GRANT command.

2 Unit Givetheconceptofdataabstraction?
1 Also,Explainthedifferentlevelsofabstractionwithdiagraminadatabasesystem.
Data abstraction refers to providing only essential information about the data to the outside
world, hiding the background details or implementation. The are three levels of abstraction are
as follows: Physical level: The lowest level of abstraction describes how data are stored.
Logical level: The next higher level of abstraction, describes what data are stored in database
and what relationship among those data

Physical level: describes how a record (e.g., customer) is stored.

Logical level: describes data stored in database, and the relationships among the data.

type customer = record

customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;

end;

View level: application programs hide details of data types. Views can also hide information
(such as an employee’s salary) for security purposes.
3 Unit Whatarethedifferentsymbolsused forE-Rdiagram? State their usage.
1

.
4 Unit List fourdutiesofDBA.
1 -Designing the logical scheme
-Creating the structure of the entire database
-Monitor usage and create necessary index structures to speed
up query execution
-Grant / Revoke data access permissions to other users etc

5 Unit What is participation role? When it is necessary to role names in the description of relationship
1 type?

Participation role is defined as the part of relationship in which each entity participates in a
relationship. It is necessary to use role name in the description of relationship type when the same
entity type participates more than once in a relationship type in different roles.

6 Unit Defineentity,entityset,andattribute,key.
1
An entity is an ‘object’ in the ‘actual world’, and that can be identified from the other
[Link] entity in a database is represented as a set of attributes. Each entity has a different set
of values for some set of attributes.

An entity set is a group of entities that posses the same set of attributes.

Attributes are the properties of entities. Attributes are represented by means of ellipses.

A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a
relation(table). They allow you to find the relation between two tables. Keys help you uniquely
identify a row in a table by a combination of one or more columns in that table.

7 Unit Explaindifferencebetweenprimarykeyandforeign key.


1

The PRIMARY KEY constraint uniquely identifies each record in a table.


Primary keys must contain UNIQUE values, and cannot contain NULL values.

A table can have only ONE primary key; and in the table, this primary key can consist of single
or multiple columns (fields).

A FOREIGN KEY is a key used to link two tables together.

A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY
KEY in another table.

The table containing the foreign key is called the child table, and the table containing the
candidate key is called the referenced or parent table.

8 Unit WhatisERDiagram?Explainwith anexample.


1 An E-R model is normally expressed as an entity-relationship diagram (called E-R diagram). E-R
diagram is graphical representation of an E-R model.
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has
attributes such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as
Col_ID & Col_Name.
9 Unit Definethefollowingterms:
1 (a).Instances- A database instance is a state of operational database with data at any given
time. It contains a snapshot of the database. Database instances tend to change with time. A
DBMS ensures that its every instance (state) is in a valid state, by diligently following all the
validations, constraints, and conditions that the database designers have imposed.
(b)schema- The database schema of a database is its structure described in a formal language
supported by the database management system (DBMS). The term "schema" refers to the
organization of data as a blueprint of how the database is constructed (divided into database
tables in the case of relational databases).
(c) logicalschema-This schema defines all the logical constraints that need to be applied on
the data stored. It defines tables, views, and integrity constraints.
(d)physical schema-This schema pertains to the actual storage of data and its form of storage
like files, indices, etc. It defines how the data will be stored in a secondary storage.

10 Unit Whatismeantbydataindependence?Explainitstypes.
1 Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that
make use of [Link] are:
Physical data independence - when the physical schema is altered without having the need to
rewrite application programs for the alteration.
Logical data independence -the ability to modify the logical schema without having to rewrite
the application program.

11 Unit [Link].
1
Types of Attributes Description

Simple attributes can't be


divided any further. For
Simple attribute example, a student's contact
number. It is also called an
atomic value.

It is possible to break down


composite attribute. For
example, a student's full
Composite attribute
name may be further divided
into first name, second
name, and last name.

Derived attribute This type of attribute does


not include in the physical
database. However, their
values are derived from
other attributes present in
the database. For example,
age should not be stored
directly. Instead, it should be
derived from the DOB of
that employee.

Multivalued attributes can


have more than one values.
Multivalued attribute For example, a student can
have more than one mobile
number, email address, etc.

12 Unit [Link]
1 witheachpatientalogofthevarioustestsandexaminationconducted.

13 Unit Consider a university database for the scheduling of classrooms for -final exams. This database
1 could be modeled as the single entity set exam, with attributes course-name, section-number,
room-number, and time. Alternatively, one or more additional entity sets could be defined, along
with relationship sets to replace some of the attributes of the exam entity set, as
 course with attributes name, department, and c-number
 section with attributes s-number and enrollment, and dependent as a weak entity set on
course
 room with attributes r-number, capacity, and building
Show an E-R diagram illustrating the use of all three additional entity sets listed.

UNIT 2
1 Unit DescribetheconceptandusefulnessofAssertions.
2
An assertion is a predicate expressing a condition that we wish the database always
to [Link] assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
When an assertion is made, the system tests it for validity, and tests it again on every
update that may violate the assertion. This testing may introduce a significant
amount of overhead; hence assertions should be used with great care.
Asserting
for all X, P(X)
is achieved in a round-about fashion using
not exists X such that not P(X)

Example : Every loan has at least one borrower who maintains an account with a
minimum balance or $1000.00
create assertion balance_constraint check
(not exists (
select *
from loan
where not exists (
select *
from borrower, depositor, account
where loan.loan_number = borrower.loan_number
and borrower.customer_name =
depositor.customer_name
and depositor.account_number =
account.account_number
and [Link] >= 1000)))

2 Unit Elaboratetheconcept ofJoin?ExplaindifferenttypesofJoinswith an example.


2
Join operations take two relations and return as a result another relation.
These additional operations are typically used as subquery expressions in the from clause
Join condition – defines which tuples in the two relations match, and what attributes are
present in the result of the join.
Join type – defines how tuples in each relation that do not match any tuple in the other
relation (based on the join condition) are treated.

Example : consider two relations loan and borrower :

loan inner join borrower on


loan.loan_number = borrower.loan_number
loan left outer join borrower on
loan.loan_number = borrower.loan_number

loan natural inner join borrower

loan natural right outer join borrower

loan full outer join borrower using (loan_number)

3 Unit Whatareaggregate functions? enlisttheaggregatefunctionssupportedbySQL giving


2 appropriate example.
Aggregate functions operate on the multiset of values of a column of a relation, and
return a value. AggregatefunctionssupportedbySQL are:
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
Example :
[Link] the total, maximum, minimum and average account balance at the Perryridge
branch.

select sum(balance) ,max(balance), min(balance), avg (balance)from


accountwhere branch_name = 'Perryridge'
2. Find the number of tuples in the customer relation.
select count (*) from customer

3. Find the number of depositors in the bank.

select count (distinct customer_name) from depositor

4 Unit What is meant by weak entities? Give example


2
In a relational database, a weak entity is an entity that cannot be uniquely identified by its
attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a
primary key. The foreign key is typically a primary key of an entity it is related to.

In entity relationship diagrams (ER diagrams), a weak entity set is indicated by a bold (or double-
lined) rectangle (the entity) connected by a bold (or double-lined) type arrow to a bold (or
double-lined) diamond (the relationship).In the below ER Diagram, ‘Payment’ is the weak entity.
‘Loan Payment’ is the identifying relationship and ‘Payment Number’ is the partial key. Primary
Key of the Loan along with the partial key would be used to identify the records.

5 Unit What isthedifferencebetweencandidatekeyandsuperkey?


2
6 Unit What is the difference between composite and simple attributes?
2
Simple attribute − Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.

Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.

7 Unit What isderivedattribute?Giveanexample?


2 Derived attribute − Derived attributes are the attributes that do not exist in the physical database,
but their values are derived from other attributes present in the database. For example,
average_salary in a department should not be saved directly in the database, instead it can be
derived. For another example, age can be derived from data_of_birth.

8 Unit WhyDataControlLanguage(DCL) is used?Explain.


2 1) A data control language (DCL) is a syntax similar to a computer programming
language used to control access to data stored in a database (Authorization). In particular, it is a
component of Structured Query Language (SQL).DCL used in SQL to:

 Monitor database transactions


 Solving deadlock situations
 Control access to data stored in a database
 Control database size and archiving schedule

9 Unit Considerthefollowingrelationaldatabase:
2 employee(emp_no,ename,salary,dept_no,job_title)
department(dept_no,dname,location )
Foreach ofthequery givenbelow, giveanexpressioninSQL.
a) Listtheemployeenameanddepartmentname for employeewhosesalaryisgreaterthan1500$.
Select [Link], [Link] from emploee e, department d where e.dept_no=d.dept_no and
e,salary> 1500.
b) For each department number find how many employees are working

Select count(emp_no), dept_no from employees group by dept_no.

c) Calculatetheremainderofsalaryifitisdividedby500fortheemployeewhosejobtitleis manager.
Select mod(salary,500) from employee where job_title=’manager’
10 Unit EMP (Employee-no. Dept-no, Employee-name, Salary)
2 DEPT (Dept-no. Dept-name, Location)

Write an SQL query to:


(a) Find all employee names who work in departments located at ‘Calcutta’ and whose salary
is greater than Rs.50,000.

Select e.employee_name from Emp e, dept d where d. dept_no= e.dept_no and [Link] !
=’Calcutta’ and [Link]>50000

(b) Calculate, for each department number, the number of employees with a salary greater
than Rs.1,00,000.
Select count( employee_no) , dept_no from Emp group by dept_no having salary>100000

11 Unit DifferentiatebetweenSQLcommandsDROPTABLEandDROPVIEW.
2
Drop table command removes all the data in the table and the table [Link] you DROP anything
using the SQL DROP command, something is removed permanently from the database. However, with
by dropping a view, the original data is unaffected. This is why the View is such a great tool for end users in
the first place the table behind the scenes is safe.

12 Unit What isthedifferencebetweenselectandprojectoperation?Giveexample.


2
Select Operation : This operation is used to select rows from a table (relation) that specifies a
given logic, which is called as a predicate. The predicate is a user defined condition to select rows
of user's choice.

Project Operation : If the user is interested in selecting the values of a few attributes, rather than
selection all attributes of the Table (Relation), then one should go for PROJECT Operation.

Selection is for rows σsubject = "database"(Books) retrieve all database book details

Projection is for columns ∏subject, author (Books)retrieve subject and author of each book

13 Unit What is the difference between the WHERE and HAVING clause
2

14 Unit Consider a relation REPAYMENT with the following schema;


2

REPAYMENT(BORROWER_ID, NAME, ADDRESS, LOANAMOUNT, REQUESTDATE,


REPAYMENT_DATE, REPAYMENT_AMOUNT)

Assume that this table records the repayment of loans by the borrowers. A borrower may have
multiple entries if he/she has paid multiple installments.

Write SQL statements (queries) to achieve the following;

Question (a)

Find all the records with information on repayments from the borrower with id equal to 42, and
where the lent amount exceeds 1000.

Answer (a)

SELECT *
FROM Repayment
WHERE borrower_id=42 AND loanamount>1000;

Question (b)

Find the total amount repaid for every address in the repayment table.

Answer (b)
SELECT address, SUM(repayment_amount)
FROM Repayment
GROUP BY address;

Question (c)

Delete all information on the completed loans. (Note: you can find the status of the loan by
summing the total repaid amount. If the total repaid amount is equal to the loan amount, then you
would say that the loan is ended.)

Answer (c)

DELETE FROM Repayment A


WHERE loanamount=
(SELECT SUM(repayment_amount)
FROM Repayment B
WHERE B.borrower_id=A.borrower_id AND [Link]=[Link]);

Question (d)

Find all the borrower names who has unique address. (ie., you should not count the borrowers
who are from the same address)
Answer (d)

SELECT name
FROM Repayment A
WHERE 1=
(SELECT COUNT(DISTINCT name)
FROM Repayment B
WHERE [Link]=[Link]);

Question (e)

Find the total number of repayments made by every borrower.

Answer (e)

SELECT borrower_id, count(*)


FROM repayment
GROUP BY borrower_id;

UNIT 3
1 Unit Explain in detail the concept of functional dependency and state significance of
3 normalization

A functional dependency (FD) is a relationship between two attributes, typically between


the PK and other non-key attributes within a table. For any relation R, attribute Y is
functionally dependent on attribute X (usually the PK), if for every valid instance of X,
that value of X uniquely determines the value of Y. This relationship is indicated by the
representation below :
X –> Y
The left side of the above FD diagram is called the determinant, and the right side is the
dependent.
In the following example,SIN determines Name, Address and Birthdate. Given SIN, we
can determine any of the other attributes within the table.
SIN -> Name, Address, Birthdate.

Normalization is a process to eliminate the flaws of a database with bad design. A poorly
designed database is inconsistent and create issues while adding, deleting or updating
[Link] following makes Database Normalization a crucial step in database
design process −
Resolving the database anomalies
The forms of Normalization i.e. 1NF, 2NF, 3NF, BCF, 4NF and 5NF remove all the
Insert, Update and Delete anomalies.
Insertion Anomaly occurs when you try to insert data in a record that does not exist.
Deletion Anomaly is when a data is to be deleted and due to the poor deign of database,
other record also deletes.
Eliminate Redundancy of Data
Storing same data item multiple times is known as Data Redundancy. A normalized table
do not have the issue of redundancy of data.
Data Dependency
The data gets stored in the correct table and ensures normalization.
Isolation of Data
A good designed database states that the changes in one table or field do not affect other.
This is achieved through Normalization.
Data Consistency
While updating if a record is left, it can led to inconsistent data, Normalization resolves it
and ensures Data Consistency.

2 Unit What is multivalued dependencies? How it is represented? And how to identify which given
3 MVD is trivial or non trivial?
Multi-valued dependency (MVD) represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a set of values for B and a set of
value for C. However, the set of values for B and C are independent of each other.
multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A --> B
in relation R is defined as being trivial if B is a subset of Aor A U B = R
A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.
3 Unit Differentiate between 3NF and BCNF
3
Difference between 3NF and BCNF :

[Link]. 3NF BCNF

In 3NF there should be no transitive


In BCNF for any relation A->B,
dependency that is no non prime attribute
1. A should be a super key of
should be transitively dependent on the
relation.
candidate key.

It is comparatively more
2. It is less stronger than BCNF.
stronger than 3NF.

In BCNF the functional


In 3NF the functional dependencies are already
3. dependencies are already in
in 1NF and 2NF.
1NF, 2NF and 3NF.

The redundancy is
4. The redundancy is high in 3NF.
comparatively low in BCNF.

In BCNF there may or may not


In 3NF there is preservation of all functional
5. be preservation of all functional
dependencies.
dependencies.

6. It is comparatively easier to achieve. It is difficult to achieve.

Lossless decomposition can be achieved by Lossless decomposition is hard


7.
3NF. to achieve in BCNF.

4 Unit Which normalformis consideredadequatefornormalrelationaldatabasedesign?


3 Justifyyouranswerwithappropriateexample.
(a) 2NF(b) BCNF(c) 4NF (d) 3NF

3NF is sufficient because because most of the 3NF tables are free of insertion, update, and deletion
anomalies. Moreover, 3NF always ensures functional dependency preserving and lossless.
5 Unit ConsideraschemaR(A,B, C, D)andfunctionaldependenciesA->BandC->D. C O M M E N T
3 O N decompositionofRintoR1(A,B)andR2(C,D)is?

Decomposition of R into R1 and R2 is a dependency preserving decomposition if closure of


functional dependencies after decomposition is same as closure of of FDs before decomposition.
A simple way is to just check whether we can derive all the original FDs from the FDs present
after decomposition.

In the above question R(A, B, C, D) is decomposed into R1 (A, B) and R2(C, D) and there are
only two FDs A -> B and C -> D. So, the decomposition is dependency preserving
6 Unit Discuss insertion,deletion and modification anomalies. Why are they considered bad? Illustrate
3 with examples.

The Problems resulting from data redundancy in an unnormalized database table are collectively
known as update anomalies. So any database insertion, deletion or modification that leaves the
database in an inconsistent state is said to have caused an update anomaly. They are classified as:

• Insertion anomalies:To insert the details of a new member of staff located at branch B1 into the
Tbl_Staff_Branch Table shown above, we must enter the correct details of branch numner B1 so
that the branch details are consistent with the values for branch B1 in other rows. To insert the
details of a new branch that currently has no members of staff into the Tbl_Staff_Branch table, it is
necessory to enter nulls for the staff details which is not allowed as staffID is the primary key. But
if you normalize Tbl_Staff_Branch, which is in Second Normal Form (2NF) to Third Normal
Dorm (3NF), you end up with Tbl_Staff and Tbl_Branch and you shouldn't have the problems
mentioned above.

• Deletion anomalies: If we delete a row from the Tbl_Staff_Branch table that represents the last
member of staff located at that branch, (for e.g. row with Branch numbers B", B3 or B4) the
details about that branch are also lost from the Database.

Modification anomalies: Should we need to change the address of a perticular branch in the
Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this
modification is not carried out on all the relevent rows, the database will become inconsistent.
They are considered bad because, they lead to:

• It is difficult to maintain consistency of data in the database

• It leads to redundant data

• It causes unnecessary updates of data

• Memory space will be wasted at the storage level

7 Unit STATE Armstrong’sAxioms?


3 The term Armstrong axioms refer to the sound and complete set of inference rules or axioms,
introduced by William W. Armstrong, that is used to test the logical implication of functional
dependencies. If F is a set of functional dependencies then the closure of F, denoted as , is
the set of all functional dependencies logically implied by F. Armstrong’s Axioms are a set of
rules, that when applied repeatedly, generates a closure of functional dependencies.

Armstrong’s Inference Rules :


1. Reflexive rule
F ⊨ {X → Y | Y ⊆ X} for any X. Trivial FDs

{X → Y} ⊨ {XZ → YZ}, Z ⊆ R. Here XZ denotes X ⋃ Z


2. Augmentation rule

3. Transitive rule
{X → Y, Y → Z} ⊨ {X → Z}
4. Decomposition or Projective rule
{X → YZ} ⊨ {X → Y}
5. Union or Additive rule
{X → Y, X → Z} ⊨ {X → YZ}
6. Pseudo transitive rule
{X → Y, WY → Z} ⊨ {WX → Z}

8 Unit ExplainPJNF :
3
PJNF-Fifth normal form (5NF), is also known as project-join normal form (PJNF). It is a level of
database normalization designed to reduce redundancy in relational databases. A relation is said
to be in 5NF if and only if it satisfies 4NF and no join dependency exists. A relation is said to
have join dependency if it can be recreated by joining multiple sub relations and each of these sub
relations has a subset of the attributes of the original [Link] for join dependency:
If the join of R1 and R2 over Q is equal to relation R then we can say that a join dependency
exists, where R1 and R2 are the decomposition R1 (P, Q) and R2 (Q, S) of a given relation R (P,
Q, S). R1 and R2 are a lossless decomposition of R.

Properties of 5NF:
A relation R is in 5NF if and only if it satisfies following conditions:
R should be in 4NF (no multi-valued dependency exists).
It cannot undergo lossless decomposition (join dependency)
9 Unit Consider a relation R with five attributes ABCDE. You are given the following dependencies: A
3 → B, BC → E, and ED → A.

1. List all keys for R.

2.

2. Is R in 3NF?

3. Is R in BCNF?

Answer

1. CDE, ACD, BCD

2. R is in 3NF because B, E and A are all parts of keys.

3. R is not in BCNF because none of A, BC and ED contain a key.

10 Unit What do you understand by attribute closure, give an example.


3
Attribute Closure: The set of attributes that are functionally dependent on the attribute A is
called Attribute Closure of A and it can be represented as A+.

Steps to Find the Attribute Closure of A

Given FD set of a Relation R, The attribute closure set S be the set of A

1. Add A to S.
2. Recursively add attributes which can be functionally determined from attributes of the set
S until done.

Given R(E-ID, E-NAME, E-CITY, E-STATE)


FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE }

The attribute closure of E-ID can be calculated as:

1. Add E-ID to the set {E-ID}


2. Add Attributes which can be derived from any attribute of set. In this case, E-NAME and
E-CITY, E-STATE can be derived from E-ID. So these are also a part of closure.
3. As there is one other attribute remaining in relation to be derived from E-ID. So result is:

(E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }

Similarly,

(E-NAME)+ = {E-NAME}(E-CITY)+ = {E-CITY, E_STATE}

11 Unit GiveasetofFD’sfortherelationschemaR(A,B,C, D)withPrimarykeyAB underwhichRisin2NFbut


3 notin3NF.

Consider the set of FD: AB → CD and C → D. AB is obviously a key for this

relation since AB → CD implies AB → ABCD. It is a primary key since there are

no smaller subsets of keys that hold over R(A,B,C,D). The FD: C → D violates

3NF but not 2NF since:

D ∈ C is false; that is, it is not a trivial FD

C is not a superkey

D is not part of some key for R

UNIT 4
1. Unit Draw a state diagram and discuss in detail the typical states that a query processing goes
4 through during execution.
[Link] and translation
[Link]
[Link]

Parsing and translation

translate the query into its internal form. This is then translated into relational algebra.
Parser checks syntax, verifies relations

Evaluation
The query-execution engine takes a query-evaluation plan, executes that plan, and returns the
answers to the query.

A relational algebra expression may have many equivalent expressions

E.g., ssalary<75000(Õsalary(instructor)) is equivalent to


Õsalary(ssalary<75000(instructor))
Each relational algebra operation can be evaluated using one of several different algorithms
Correspondingly, a relational-algebra expression can be evaluated in many ways.
Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.
E.g., can use an index on salary to find instructors with salary < 75000,
or can perform complete relation scan and discard instructors with salary ³ 75000

Query Optimization:
Amongst all equivalent evaluation plans choose the one with lowest cost.
Cost is estimated using statistical information from the
database catalog

e.g. number of tuples in each relation, size of tuples, etc.

2. Unit What is meant by the term Query Processing?


4 What are the various steps involved in this process?

Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved are:
Parsing and translation. Optimization.
Query Processing
Query Processing would mean the entire process or activity which involves query translation
into low level instructions, query optimization to save resources, cost estimation or evaluation of
query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize
the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which
typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;
Figure 1 - Steps in Database Query Processing

Let us discuss the whole process with an example. Let us consider the following two relations as
the example tables for our discussion;

Example:

Employee(Eno, Ename, Phone)


Proj_Assigned(Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working in a
project which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE [Link] = Proj_Assigned.Eno AND DOP > 10;
Input:
A query written in SQL is given as input to the query processor. For our case, let us consider the
SQL query written above.

Step 1: Parsing

In this step, the parser of the query processor module checks the syntax of the query, the user’s
privileges to execute the query, the table names and attribute names, etc. The correct table
names, attribute names and the privilege of the users can be taken from the system catalog (data
dictionary).

Step 2: Translation

If we have written a valid query, then it is converted from high level language SQL to low level
instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ [Link]=Proj_Assigned.Eno(Employee X Prof_Assigned))

Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the table,
etc. Optimizer also checks for the conditions and conditional attributes which are parts of the
query.

Step 4: Execution Plan


A query can be expressed in many ways. The query processor module, at this stage, using the
information collected in step 3 to find different relational algebra expressions that are equivalent
and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given

πEname(Employee ⋈Eno (σDOP>10 (Prof_Assigned)))


below;

So far, we have got two execution plans. Only condition is that both plans should give the same
result.

Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return
same result (obvious), they differ in terms of Time consumption to execute the query, or the
Space required executing the query. Hence, it is mandatory choose one plan which obviously
consumes less cost.

At this stage, we choose one execution plan of the several we have developed. This Execution
plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined relation.
This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second plan
is the best (with the information known, not always).

3. Give the flow of processing of following query with the help of query processing steps
SELECT Ename
FROM Employee, Proj_Assigned
WHERE [Link] = Proj_Assigned.Eno AND DOP > 10;

Answer:
4. Describe three techniques commonly used when developing algorithms for relational
operators. Explain how these techniques can be used to design algorithms for the selection,
projection, and join operators.

1. The three techniques commonly used are indexing, iteration, and partitioning:
Indexing: If a selection or join condition is specified, use an index to examine just the tuples that
satisfy the condition.

 Iteration: Examine all tuples in an input table, one after the other. If we need only
a few fields from each tuple and there is an index whose key contains all these fields,
instead of examining data tuples, we can scan all index data entries.

 Partitioning: By partitioing tuples on a sort key, we can often decompose an operation


into a less expensive collection of operations on partitions. Sorting and hashing are two
commonly used partitioning techniques.
 They can be used to design algorithms for selection, projection, and join operators as
follows:

 Selection: For a selection with more than one tuple matching the query (in general, at
least 5%), indexing like B+ Trees are very useful. This comes into play often with range
queries. It allows us to not only find the first qualifying tuple quickly, but also the other
qualifying tuples soon after (especially if the index is clustered). For a selection
condition with an equality query where there are only a few (usually 1) matching tuple,
partitioning using hash indexing is often appropriate. It allows us to find the exact tuple
we are searching for with a cost of only a few (typically one or two) I/Os.

 Projection: The projection operation requires us to drop certain fields of the input,
which can result in duplicates appearing in the result set. If we do not need to remove
these duplicates, then the iteration technique can efficiently handle this problem. On ther
other hand, if we do need to elim- inate duplicates, partitioning the data and applying a
sort key is typically performed.

 In the case that there are appropriate indexes available, this can lead to less expensive
plans for sorting the tuples during duplicate elimination since the data may all ready be
sorted on the index (in that case we simply compare adjacent entries to check for
duplicates)
 Join: The most expensive database operation, joins, can combinations of all three
techniques. A join operation typically has multiple selection and projection elements
built into it, so the importance of having appropriate indexes or of partitioning the data is
just as above, if not more so. When possible, the individual selections and projections
are applied to two relations before they are joined, so as to decrease the size of the
intermediate table.

 As an example consider joining two relations with 100,000 tuples each and only 5 % of
qualifying tuples in each table. Joining before applying the selection conditions, would
result in a huge intermediate table size that would then have to be searched for matching
selections. Alternatively, consider applying parts of the selection first. We can then
perform a join of the 5,000 qualifying tuples found after applying the selection to each
table, that can then be searched and handled significantly faster.

5. What is the goal of query optimization? Why is optimization important?

The goal of query optimization is to avoid the worst plans and find a good plan. The goal is
usually not to find the optimal plan. The deference in cost between a good plan and a bad plan
can be several orders of magnitude: a good query plan can evaluate the query in seconds,
whereas a bad query plan might take days

6. Unit What is hash-table overflow? Explain.


4
An overflow occurs when the home bucket for a new pair (key, element) is full. • We may
handle overflows by: ƒ Search the hash table in some systematic fashion for a bucket that is not
full.
7. Explain following Query:

Π[Link] name((Πbranch name, assets(ρT (branch))) _T.assets>[Link]


(Πassets (σ(branch city =_Brooklyn_)(ρS(branch)))))

This expression performs the theta join on the smallest amount of data possible.
It does this by restricting the right hand side operand of the join to only
those branches in Brooklyn, and also eliminating the unneeded attributes from both the
operands.
8. Unit What is the difference between pipe lining and materialization?
4

Materialization v/s Pipelining


Materialization
 Materialized evaluation walks the parse or expression tree of the relational algebra
operation, and performs the innermost or leaf-level operations first
 The intermediate result of each operation is materialized — an actual, but temporary,
relation —and becomes input for subsequent operations.
 The cost of materialization is the sum of the individual operations plus the cost of writing
the intermediate results to disk — a function of the blocking factor (number of records per
block) of the temporaries.
 The problem with materialization is that — lots of temporary files, lots of I/O.

Pipelining
 With pipelined evaluation, operations form a queue, and results are passed from one
operation to another as they are calculated, hence the technique’s name.
 With pipelined evaluation, operations form a queue, and results are passed from one
operation to another as
they are calculated, hence the technique’s name.
 General approach: restructure the individual operation algorithms so that they take streams
of tuples as both input and output.

9. How does pipe lining improve query evaluation efficiency in dbms?

 Pipelined evaluation : evaluate several operations simultaneously, passing the results of


one operation on to the next.
 E.g., in previous expression tree, don’t store result of
o instead, pass tuples directly to the join.. Similarly, don’t store result of join, pass
tuples directly to projection.
 Much cheaper than materialization: no need to store a temporary relation to disk.
 Pipelining may not always be possible – e.g., sort, hash-join.
 For pipelining to be effective, use evaluation algorithms that generate output tuples even as
tuples are received for inputs to the operation.
 Pipelines can be executed in two ways: demand driven and producer driven

10. How to choose the best evaluation plan for query? Explain

 A relational algebra expression may have many equivalent expressions


o E.g., ssalary<75000(Õsalary(instructor)) is equivalent to
Õsalary(ssalary<75000(instructor))
 Each relational algebra operation can be evaluated using one of several different algorithms
o Correspondingly, a relational-algebra expression can be evaluated in many ways.
 Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.
o E.g., can use an index on salary to find instructors with salary < 75000,
o or can perform complete relation scan and discard instructors with salary ³ 7500

11. Unit Explain following algorithms in query processing


4 (a) Sort-merge join

(b) Nested Loop Join


UNIT 5

1. Unit Explain desirable properties of transaction.


5 Desirable Properties of Transactions
Transactions should possess several properties, often called the ACID properties; they should be
enforced by the concurrency control and recovery methods of the DBMS. The following are the
ACID properties:
 Atomicity. A transaction is an atomic unit of processing; it should either
be performed in its entirety or not performed at all.

 Consistency preservation. A transaction should be consistency preserving, meaning


that if it is completely executed from beginning to end without interference from
other transactions, it should take the database from one consistent state to another.

 Isolation. A transaction should appear as though it is being executed in isolation


from other transactions, even though many transactions are executing concurrently.
That is, the execution of a transaction should not be interfered with by any other
transactions executing concurrently.

 Durability or permanency. The changes applied to the database by a com-mitted


transaction must persist in the database. These changes must not be lost because of
any failure.
The atomicity property requires that we execute a transaction to completion. It is the
responsibility of the transaction recovery subsystem of a DBMS to ensure atomicity. If a
transaction fails to complete for some reason, such as a system crash in the midst of transaction
execution, the recovery technique must undo any effects of the transaction on the database. On
the other hand, write operations of a committed transaction must be eventually written to disk.

The preservation of consistency is generally considered to be the responsibility of the


programmers who write the database programs or of the DBMS module that enforces integrity
constraints. Recall that a database state is a collection of all the stored data items (values) in the
database at a given point in time. A consistent state of the database satisfies the constraints
specified in the schema as well as any other constraints on the database that should hold. A
database program should be written in a way that guarantees that, if the database is in a consistent
state before executing the transaction, it will be in a consistent state after the complete execution
of the transaction, assuming that no interference with other transactions occurs.

The isolation property is enforced by the concurrency control subsystem of the DBMS.8 If every
transaction does not make its updates (write operations) visible to other transactions until it is
committed, one form of isolation is enforced that solves the temporary update problem and
eliminates cascading rollbacks (see Chapter 23) but does not eliminate all other problems. There
have been attempts to define the level of isolation of a transaction. A transaction is said to have
level 0 (zero) isola-tion if it does not overwrite the dirty reads of higher-level transactions. Level
1 (one) isolation has no lost updates, and level 2 isolation has no lost updates and no dirty reads.
Finally, level 3 isolation (also called true isolation) has, in addition to level 2 properties,
repeatable reads.

2. Unit How Share, exclusive and update locks different? Explain.


5
Shared Lock (S)

A Shared Lock is basically a read-only lock for a row-level. Any number of resources can fetch
the data to read when the shared lock is present on the resource. That means that many process
IDs can have a shared lock on the same resource to read the respective data.

Exclusive Lock (X)


The Exclusive Lock is used and valid on a single transaction, that locks either row or a page
depending on the data. The mechanism for understanding is simple, where an exclusive lock can
be applied only on a single resource. There cannot be more than one exclusive lock on the same
resource. Either Insert, Update or Delete commands happen over with the Exclusive lock, and
these commands will not be in effect until the exclusive lock is released from the resource.

Update Lock (U)

An Update Lock is used and valid when there is a shared lock applicable for a resource. In other
words, the update lock cannot be placed until there are no other offending exclusive or update
locks for the fetching resource. Additionally, the update lock happens to be acquiring an
exclusive when all other locks are released from a resource.

3. Unit How precedence graph can be used to detect deadlock?


5
Precedence Graph or Serialization Graph is used commonly to test Conflict Serializability of a
schedule.
It is a directed Graph (V, E) consisting of a set of nodes V = {T1, T2, T3……….Tn} and a set of
directed edges E = {e1, e2, e3………………em}.
The graph contains one node for each Transaction Ti. An edge ei is of the form Tj –> Tk where
Tj is the starting node of ei and Tk is the ending node of ei. An edge ei is constructed between
nodes Tj to Tk if one of the operations in Tj appears in the schedule before some conflicting
operation in Tk .
The Algorithm can be written as:
1. Create a node T in the graph for each participating transaction in the schedule.
2. For the conflicting operation read_item(X) and write_item(X) – If a Transaction T j executes
a read_item (X) after Ti executes a write_item (X), draw an edge from Ti to Tj in the graph.
3. For the conflicting operation write_item(X) and read_item(X) – If a Transaction T j executes
a write_item (X) after Ti executes a read_item (X), draw an edge from Ti to Tj in the graph.
4. For the conflicting operation write_item(X) and write_item(X) – If a Transaction
Tj executes a write_item (X) after Ti executes a write_item (X), draw an edge from Ti to
Tj in the graph.
5. The Schedule S is serializable if there is no cycle in the precedence graph.
If there is no cycle in the precedence graph, it means we can construct a serial schedule S’ which
is conflict equivalent to the schedule S.
The serial schedule S’ can be found by Topological Sorting of the acyclic precedence graph. Such
schedules can be more than 1.

S : r1(x) r1(y) w2(x) w1(x) r2(y)


Creating Precedence graph:
1. Make two nodes corresponding to Transaction T1 and T2.

2. For the conflicting pair r1(x) w2(x), where r1(x) happens before w2(x), draw an edge from
T1 to T2.

3. For the conflicting pair w2(x) w1(x), where w2(x) happens before w1(x), draw an edge
from T2 to T1.

Since the graph is cyclic, we can conclude that it is not conflict serializable to any schedule
serial schedule.
Let us try to infer a serial schedule from this graph using topological ordering.
The edge T1–>T2 tells that T1 should come before T2 in the linear ordering.
The edge T2 –> T1 tells that T2 should come before T1 in the linear ordering.
So, we can not predict any particular order (when the graph is cyclic). Therefore, no serial
schedule can be obtained from this graph.

Consider the another schedule S1 :


S1: r1(x) r3(y) w1(x) w2(y) r3(x) w2(x)
The graph for this schedule is :
Since the graph is acyclic, the schedule is conflict serializable. Performing Topological Sort on
this graph would give us a possible serial schedule which is conflict equivalent to schedule S1.
In Topological Sort, we first select the node with indegree 0, which is T1. This would be
followed by T3 and T2.
So, S1 is conflict serializable since it is conflict equivalent to the serial schedule T1 T3 T2

4. Unit Describe the concept of following “two phase”


5 locking protocol in brief:

1) Strict 2-PL –

This requires that in addition to the lock being 2-Phase all Exclusive(X) Locks held by the
transaction be released until after the Transaction Commits.
Following Strict 2-PL ensures that our schedule is:
 Recoverable
 Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and
moreover guarantee Strict Schedules but still Deadlocks are possible!

2) Rigorous 2-PL –

This requires that in addition to the lock being 2-Phase all Exclusive(X) and Shared(S)
Locks held by the transaction be released until after the Transaction Commits.
Following Rigorous 2-PL ensures that our schedule is:
 Recoverable
 Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and
moreover guarantee Strict Schedules but still Deadlocks are possible!
Note the difference between Strict 2-PL and Rigorous 2-PL is that Rigorous is more restrictive, it
requires both Exclusive and Shared locks to be held until after the Transaction commits and this
is what makes the implementation of Rigorous 2-PL more easy.

3) Conservative 2-PL –

Conservative 2-PL is Deadlock free and but it does not ensure Strict schedule(More
about this here!). However, it is difficult to use in practice because of need to predeclare the
read-set and the write-set which is not possible in many situations. In practice, the most
popular variation of 2-PL is Strict 2-PL.

Static 2-PL, this protocol requires the transaction to lock all the items it access before the
Transaction begins execution by predeclaring its read-set and write-set. If any of the
predeclared items needed cannot be locked, the transaction does not lock any of the items,
instead it waits until all the items are available for locking.

The Venn Diagram below shows the classification of schedules which are rigorous and
strict. The universe represents the schedules which can be serialized as 2-PL. Now as the
diagram suggests, and it can also be logically concluded, if a schedule is Rigorous then it is
Strict. We can also think in another way, say we put a restriction on a schedule which makes
it strict, adding another to the list of restrictions make it Rigorous. Take a moment to again
analyze the diagram and you’ll definitely get it.
5. Describe Lock Based Concurrency Control Protocol in DBMS.

Concurrency control techniques are used to ensure that the Isolation (or non-interference)
property of concurrently executing transactions is maintained.
.
Concurrency-control protocols: allow concurrent schedules, but ensure that the schedules are
conflict/view serializable, and are recoverable and maybe even cascadeless.
These protocols do not examine the precedence graph as it is being created, instead a protocol
imposes a discipline that avoids non-seralizable schedules.

Different concurrency control protocols provide different advantages between the amount of
concurrency they allow and the amount of overhead that they impose.:

Different categories of protocols:


 Lock Based Protocol
 Basic 2-PL
 Conservative 2-PL
 Strict 2-PL
 Rigorous 2-PL
 Graph Based Protocol
 Time-Stamp Ordering Protocol
 Multiple Granularity Protocol
 Multi-version Protocol

6. Unit Describe the concept of multiple Granularity.


5 In the various Concurrency Control schemes have used different methods and every individual
Data item as the unit on which synchronization is performed. A certain drawback of this
technique is if a transaction Ti needs to access the entire database, and a locking protocol is used,
then Ti must lock each item in the database. It is less efficient, it would be more simpler if
Ti could use a single lock to lock the entire database. But, if it consider the second proposal, this
should not in fact overlook the certain flaw in the proposed method. Suppose another transaction
just needs to access a few data items from a database, so locking the entire database seems to be
unnecessary moreover it may cost us loss of Concurrency, which was our primary goal in the first
place. To bargain between Efficiency and Concurrency. Use Granularity.
Let’s start by understanding what is meant by Granularity.

Granularity – It is the size of data item allowed to lock. Now Multiple Granularity means
hierarchically breaking up the database into blocks which can be locked and can be track what
need to lock and in what fashion. Such a hierarchy can be represented graphically as a tree.
For example, consider the tree, which consists of four levels of nodes. The highest level
represents the entire database. Below it are nodes of type area; the database consists of exactly
these areas. Area has children nodes which are called files. Every
area has those files that are its child nodes. No file can span more than one area.
Finally, each file has child nodes called records. As before, the file consists of exactly those
records that are its child nodes, and no record can be present in more than one file. Hence, the
levels starting from the top level are:
 database
 area
 file
 record

Figure – Multi Granularity tree Hierarchy

7. Unit How time stamps based protocol can be implemented?


5
Timestamp based Concurrency Control

Concurrency Control can be implemented in different ways. One way to implement it is by


using Locks.

Now, lets discuss about Time Stamp Ordering Protocol.

As earlier introduced, Timestamp is a unique identifier created by the DBMS to identify a


transaction. They are usually assigned in the order in which they are submitted to the system.
Refer to the timestamp of a transaction T as TS(T).

Timestamp Ordering Protocol –


The main idea for this protocol is to order the transactions based on their Timestamps. A schedule
in which the transactions participate is then serializable and the only equivalent serial schedule
permitted has the transactions in the order of their Timestamp Values. Stating simply, the
schedule is equivalent to the particular Serial Order corresponding to the order of the
Transaction timestamps.

Algorithm must ensure that, for each items accessed by Conflicting Operations in the schedule,
the order in which the item is accessed does not violate the ordering. To ensure this, use two
Timestamp Values relating to each database item X.

 W_TS(X) is the largest timestamp of any transaction that executed write(X) successfully.
 R_TS(X) is the largest timestamp of any transaction that executed read(X) successfully.

Basic Timestamp Ordering –


Every transaction is issued a timestamp based on when it enters the system. Suppose, if an old
transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj) such
that TS(Ti) < TS(Tj).The protocol manages concurrent execution such that the timestamps
determine the serializability order. The timestamp ordering protocol ensures that any conflicting
read and write operations are executed in timestamp order. Whenever some Transaction T tries to
issue a R_item(X) or a W_item(X), the Basic TO algorithm compares the timestamp
of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not violated. This describe
the Basic TO protocol in following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the following conditions:

 If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject the
operation. else,
 Execute W_item(X) operation of T and set W_TS(X) to TS(T).

2. Whenever a Transaction T issues a R_item(X) operation, check the following conditions:

 If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
 If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to
the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect order,
it rejects the later of the two operation by aborting the Transaction that issued it. Schedules
produced by Basic TO are guaranteed to be conflict serializable. Already discussed that using
Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollback is still possible. Suppose we
have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted and resubmitted to
the system then, T must also be aborted and rolled back. So the problem of Cascading aborts still
prevails.

Let’s gist the Advantages and Disadvantages of Basic TO protocol:


 Timestamp Ordering protocol ensures serializablity since the precedence graph will be of
the form:

8. Unit Elaborate different types of database schedules with example.


5
TYPES OF SCHEDULES IN DBMS

Schedule, as the name suggests, is a process of lining the transactions and executing them one by
one. When there are multiple transactions that are running in a concurrent manner and the order
of operation is needed to be set so that the operations do not overlap each other, Scheduling is
brought into play and the transactions are timed accordingly.
Here we will discuss various types of schedules.

9. Unit Describe :
5 1) Serial Schedule
2) Non- Serial Schedule

1. Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is
one in which no transaction starts until a running transaction has ended are called serial
schedules.

Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)

W(A)

R(B)

W(B)

R(A)

R(B)

where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —> T2

2. Non-Serial Schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved.
This might lead to a rise in the concurrency problem. The transactions are executed in a
non-serial manner, keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all its operation, in
the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.

The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
a. Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the
other hand, a serial schedule does not need the serializability because it follows a
transaction only when the previous transaction is complete. The non-serial schedule is said
to be in a serializable schedule only when it is equivalent to the serial schedules, for an n
number of transactions. Since concurrency is allowed in this case thus, multiple transactions
can execute concurrently. A serializable schedule helps in improving both resource
utilization and CPU throughput.
These are of two types:

1. Conflict Serializable:

A schedule is called conflict serializable if it can be transformed into a serial schedule


by swapping non-conflicting operations. Two operations are said to be conflicting if
all conditions satisfy:
 They belong to different transactions
 They operate on the same data item
 At Least one of them is a write operation

2. View Serializable:

A Schedule is called view serializable if it is view equal to a serial schedule (no


overlapping transactions). A conflict schedule is a view serializable but if the
serializability contains blind writes, then the view serializable does not conflict
serializable.

b. Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.

1. Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they
read commit are called recoverable schedules. In other words, if some transaction T j is
reading value updated or written by some other transaction Ti, then the commit of
Tj must occur after the commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
T1 T2

R(A)
T1 T2

W(A)

W(A)

R(A)

commit

commit

This is a recoverable schedule since T1 commits before T2, that makes the value read
by T2 correct.

10. Unit Briefly Give the working and importance of View Serializibility in Transaction.
5
VIEW SERIALIZABILITY IN DBMS TRANSACTIONS

Two schedules S1 and S2 are said to be view equal if below conditions are satisfied :

1) Initial Read
If a transaction T1 reading data item A from initial database in S1 then in S2 also T1 should read
A from initial database.
T1 T2 T3

-------------------

R(A)
W(A)

R(A)

R(B)

Transaction T2 is reading A from initial database.

2)Updated Read
If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read A which is updated
by Tj.
T1 T2 T3 T1 T2 T3
------------------- ----------------
W(A) W(A)
W(A) R(A)
R(A) W(A)
Above two schedule are not view equal as in S1 :T3 is reading A updated by T2, in S2 T3 is
reading A updated by T1.
3)Final Write operation
If a transaction T1 updated A at last in S1, then in S2 also T1 should perform final write
operations.
T1 T2 T1 T2
------------ ---------------
R(A) R(A)
W(A) W(A)
W(A) W(A)
Above two schedule are not view as Final write operation in S1 is done by T1 while in S2 done
by T2.
View Serializability: A Schedule is called view serializable if it is view equal to a serial schedule
(no overlapping transactions).

11. Unit Give the difference between Conflict and View Serializability.
5
Serializable Schedule :
A transaction schedule is serializable if its outcome is equal to the outcome of its transactions
executed serially i.e. sequentially without overlapping in time. A serializable schedule always
leaves the database in consistent state. A serial schedule is always a serializiable schedule
because a new transcation only starts when the older one has finished execution.
Example –
Let us consider the following schedule and see if it is serializable.
T1 T2

R(X)

R(X)

R(Y)

W(Y)

R(Y)

W(X)

Now, let us figure out if the above schedule is serializable.


1. Swapping R(X) of T1 and R(Y) of T2.
2. Swapping R(X) of T1 and W(Y) of T2.
T1 T2

R(X)
T1 T2

R(Y)

W(Y)

R(X)

R(Y)

W(X)

Thus, after changing the conflicting operations in the initial schedule we get a serial schedule.
Hence, this schedule is serializable.
Difference between Conflict and View Serializability :
S.N CONFLICT

O. SERIALIZABILITY VIEW SERIALIZABILITY

1. Two schedules are said to be Two schedules are said to be view

conflict equivalent if all the equivalent if the order of initial

conflicting operations in both read, final write and update

the schedule get executed in the operations is the same in both the

same order. If a schedule is a schedules. If a schedule is view

conflict equivalent to its serial equivalent to its serial schedule

schedule then it is called then it is called View Serializable


S.N CONFLICT

O. SERIALIZABILITY VIEW SERIALIZABILITY

Conflict Serializable Schedule. Schedule.

If a schedule is view serializable If a schedule is conflict serializable

then it may or may not be then it is also view serializable

2. conflict serializable. schedule.

View equivalence is rather difficult

Conflict equivalence can be to achieve as both transactions

easily achieved by reordering should perform similar actions in a

the operations of two similar manner. Thus, View

transactions therefore, Conflict Serializability is difficult to

3. Serializability is easy to achieve. achieve.

4. For a transaction T1 writing a If a transaction T1 writes a value A

value A that no one else reads that no other transaction reads

but later some other transactions (because later some other

say T2 write its own value of A, transactions say T2 writes its own

W(A) cannot be placed under value of A) W(A) can be placed in

positions where it is never read. positions of the schedule where it is


S.N CONFLICT

O. SERIALIZABILITY VIEW SERIALIZABILITY

never read.

UNIT-06
1. Unit List and explain commonly accepted threats to database security.
6
Database Security Threats:
Database security begins with physical security for the systems that host the database management
system (DBMS). Database Management system is not safe from intrusion, corruption, or destruction by
people who have physical access to the computers. Once physical security has been established,
database must be protected from unauthorized access by authorized users as well as unauthorized users.
There are three main objects when designing a secure database system, and anything prevents from a
database management system to achieve these goals would be consider a threat to database security.
There are many internal and external threats to database systems.

Some of threats are as follows:

Integrity:
Database integrity refers that information be protected from improper modification. Modification
includes creation, insertion, modification, changing the status of data, and deletion. Integrity is lost if
unauthorized changes are made intentionally or through accidental acts. For example, Students cannot
be allowed to modify their grades.
Availability:
Authorized user or program should not be denied access. For example, an instructor who wishes to
change a student grade should be allowed to do so.
Secrecy:
Data should not be disclosed to unauthorized users. For example, a student should not be allowed to see
and change other student grades.
Denial of service attack:
This attack makes a database server greatly slower or even not available to user at all. DoS attack does
not result in the disclosure or loss of the database information; it can cost the victims much time and
money.
Sniff attack:
To accommodate the e-commerce and advantage of distributed systems, database is designed in a client-
server mode. Attackers can use sniffer software to monitor data streams, and acquire some confidential
information. For example, the credit card number of a customer.
Spoofing attack:
Attackers forge a legal web application to access the database, and then retrieve data from the database
and use it for bad transactions. The most common spoofing attacks are TCP used to get the IP addresses
and DNS spoofing used to get the mapping between IP address and DNS name.
Trojan Horse:
It is a malicious program that embeds into the system. It can modify the database and reside in operating
system.
To achieve these objectives, a clear and consistent security policy should be developed to define what
security measure must be enforced. We must determine what part of data is to be protected and which
users get access to which part of the information. The security mechanisms of the underlying database
management system, as well as external mechanism, such as securing access to buildings, must be
utilized to enforce the policy.

2. Unit Explain types of Privileges?


6
A privilege is a right to execute a particular type of SQL statement or to access another user's object.
Some examples of privileges include the right to:

 Connect to the database (create a session)


 Create a table
 Select rows from another user's table
 Execute another user's stored procedure

You grant privileges to users so these users can accomplish tasks required for their job. You should grant
a privilege only to a user who absolutely requires the privilege to accomplish necessary work. Excessive
granting of unnecessary privileges can compromise security. A user can receive a privilege in two
different ways:

 You can grant privileges to users explicitly. For example, you can explicitly grant the privilege to
insert records into the employees table to the user SCOTT.
 You can also grant privileges to a role (a named group of privileges), and then grant the role to
one or more users. For example, you can grant the privileges to select, insert, update, and delete
records from the employees table to the role named clerk, which in turn you can grant to the
users scott and brian.

Because roles allow for easier and better management of privileges, you should normally grant privileges
to roles and not to specific users.

There are two distinct categories of privileges:

1) SYSTEM PRIVILEGES

2) SCHEMA OBJECT PRIVILEGES

3. Unit Mention and briefly explain the control measures that are used to provide security of database.
6
B. Database Security Countermeasures:
To protect the database system from the above mentioned threats. Here are some
countermeasures which are as follows:

C. Access Control:
A database for an organization contains a great deal of information and usually has several users.
Most of them need to access only a small part of the database. A policy defines the requirements
that are to be implemented within hardware and software and those that are external to the
system, including physical, personal, and procedural controls.

D. Flow Control:
Flow control provides the flow of information among accessible objects. Flow controls check that
information contained in objects does not flow explicitly or implicitly into less protected objects.

E. Encryption:
An encryption algorithm should be applied to the data, using a user-specified encryption key. The
output of the algorithm is the encrypted version. There is also a decryption algorithm, which takes
the encrypted data and a decryption key as input and then returns the original data.
F. RAID:
Redundant Array of Independent Disks which protect against data loss due to disk failure.

G. Authentication:
Access to the database is a matter of authentication. It provides the guidelines how the database is
accessed. Every access should be monitored.

H. Backup:
At every instant, backup should be done. In case of any disaster, Organizations can retrieve their
data.

4. Unit How is the view mechanism used as an authorization mechanism?


6
Any SEQUEL query whose result, is a relation may be used to define a view. Therefore, a view may be: l
a row and column subset of a relation. For example, to define a view consisting of the names and salaries
of the employees who work in the toy department, one can issue the SEQUEL statement DEFINE VIEW
VEMP AS: SELECT NAME, SALARY FROM EMPLOYEE WHERE DEPARTMENT = ‘TOY’ l a
summary of the information in a relation. For example, to define a view consisting of the average salary of
each department, one can issue the SEQUEL statement DEFINE VIEW AVGSAL AS: SELECT AVG
(SALARY) FROM EMPLOYEE GROUP BY DEPARTMENT l a join [5] of the information in two or
more relations.

For example, if the DEPT relation contains the number of the floor on which the department is located,
one may define a view of employee names together with the floors on which they work:

DEFINE VIEW LOCEMP AS: SELECT NAME, FLOOR FROM EMPLOYEE, DEPT WHERE
[Link] = [Link]

It is to be emphasized that views are dynamic windows and not static copies. As the information stored in
a base relation changes, the information visible through views defined on that relation changes with it. The
view mechanism is our means of granting access to row and column subsets, granting “statistical” access,
and granting access to other transformations of relations. In the first part of the paper we describe our
authorization mechanism as it applies to all objects in the system, regardless of whether they are views or
relations. We refer to both views and relations by the collective name “tables.” Later we describe the
extensions necessary to accommodate a dynamic view mechanism.
5. Unit What is meant by granting a privilege?
6
 It’s used to grant the privileges to the users. For e.g. GRANT create table, create session to amit,
Raj with GRANT OPTION;
 It’s also possible to cancel the privileges by using Revoke.

Revoke create table, create session from amit, Raj

 Grant and Revoke commandos are known as data control language commandos DCL granting of
privileges. The grant statement is used to give authorization to any particular user.

Syntax:-

Grant <Privilege list>

On <Table Name> To <User>

 Grant select on Stud-Into 70,41,42,43…

 Grant delete, select on Stud-into 70, 41.]


 Update (amount) on account 70,42;
 Grant all stud-into
 If we want to grant a privilege & to allow the user to pass the privilege to author user we have to
with grant option.
 Grant select on stud-into to all with grant option with this user you can grant select privilege to
any other user.

Revoke:-

 Revoke <privilege list=""> on from <user>

Revoke select on stud-into from 41;

o The revocation of a privilege from a user may cause other user to lose that privilege this
behaviour is called cascading of Revoke. In most database system cascading is default
behaviour.
o Revoke select on stud-into from 41 restrict, with this system returns an error if there are
cascading revoke & revocation with stop.
o Revoke grant option for select on stud-into from 4.
o Due to this command only grant option will be revoke from user 4, that is no, he can no
grant privilege to any other user.

6. Unit Explain:
6
Triggers and its types with significance and example

Triggers can be defined on the table, view, schema, or database with which the event is
associated.
Benefits of Triggers
Triggers can be written for the following purposes −

 Generating some derived column values automatically


 Enforcing referential integrity
 Event logging and storing information on table access
 Auditing
 Synchronous replication of tables
 Imposing security authorizations
 Preventing invalid transactions
I. Creating Triggers
The syntax for creating a trigger is −

CREATE [OR REPLACE ] TRIGGER trigger_name

{BEFORE | AFTER | INSTEAD OF }

{INSERT [OR] | UPDATE [OR] | DELETE}

[OF col_name]

ON table_name
[REFERENCING OLD AS o NEW AS n]

[FOR EACH ROW]

WHEN (condition)

DECLARE

Declaration-statements

BEGIN

Executable-statements

EXCEPTION

Exception-handling-statements

END;

Where,
 CREATE [OR REPLACE] TRIGGER trigger_name − Creates or replaces an existing
trigger with the trigger_name.
 {BEFORE | AFTER | INSTEAD OF} − This specifies when the trigger will be executed.
The INSTEAD OF clause is used for creating trigger on a view.
 {INSERT [OR] | UPDATE [OR] | DELETE} − This specifies the DML operation.
 [OF col_name] − This specifies the column name that will be updated.
 [ON table_name] − This specifies the name of the table associated with the trigger.
 [REFERENCING OLD AS o NEW AS n] − This allows you to refer new and old values
for various DML statements, such as INSERT, UPDATE, and DELETE.
 [FOR EACH ROW] − This specifies a row-level trigger, i.e., the trigger will be executed
for each row being affected. Otherwise the trigger will execute just once when the SQL
statement is executed, which is called a table level trigger.
 WHEN (condition) − This provides a condition for rows for which the trigger would fire.
This clause is valid only for row-level triggers.
Example
To start with, we will be using the CUSTOMERS table we had created and used in the previous
chapters −
Select * from customers;

+----+----------+-----+-----------+----------+

| ID | NAME | AGE | ADDRESS | SALARY |

+----+----------+-----+-----------+----------+

| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |

| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |

| 4 | Chaitali | 25 | Mumbai | 6500.00 |

| 5 | Hardik | 27 | Bhopal | 8500.00 |

| 6 | Komal | 22 | MP | 4500.00 |

+----+----------+-----+-----------+----------+

The following program creates a row-level trigger for the customers table that would fire for
INSERT or UPDATE or DELETE operations performed on the CUSTOMERS table. This
trigger will display the salary difference between the old values and new values −

CREATE OR REPLACE TRIGGER display_salary_changes

BEFORE DELETE OR INSERT OR UPDATE ON customers

FOR EACH ROW

WHEN ([Link] > 0)

DECLARE
sal_diff number;

BEGIN

sal_diff := :[Link] - :[Link];

dbms_output.put_line('Old salary: ' || :[Link]);

dbms_output.put_line('New salary: ' || :[Link]);

dbms_output.put_line('Salary difference: ' || sal_diff);

END;

When the above code is executed at the SQL prompt, it produces the following result −
Trigger created.

The following points need to be considered here −


 OLD and NEW references are not available for table-level triggers, rather you can use
them for record-level triggers.
 If you want to query the table in the same trigger, then you should use the AFTER
keyword, because triggers can query the table or change it again only after the initial
changes are applied and the table is back in a consistent state.
 The above trigger has been written in such a way that it will fire before any DELETE or
INSERT or UPDATE operation on the table, but you can write your trigger on a single
or multiple operations, for example BEFORE DELETE, which will fire whenever a
record will be deleted using the DELETE operation on the table.
J. Triggering a Trigger
Let us perform some DML operations on the CUSTOMERS table. Here is one INSERT
statement, which will create a new record in the table −

INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)


VALUES (7, 'Kriti', 22, 'HP', 7500.00 );

When a record is created in the CUSTOMERS table, the above create


trigger, display_salary_changes will be fired and it will display the following result −
Old salary:

New salary: 7500

Salary difference:

Because this is a new record, old salary is not available and the above result comes as null. Let
us now perform one more DML operation on the CUSTOMERS table. The UPDATE statement
will update an existing record in the table −

UPDATE customers

SET salary = salary + 500

WHERE id = 2;

When a record is updated in the CUSTOMERS table, the above create


trigger, display_salary_changes will be fired and it will display the following result −
Old salary: 1500

New salary: 2000

7. Unit Give four possible benefits of “Views” in databases.


6
1. Views can represent a subset of the data contained in a table. ...
2. Views can join and simplify multiple tables into a single virtual table.
3. Views can act as aggregated tables, where the database engine aggregates data (sum, average,
etc.) ...
4. Views can hide the complexity of data.
8. Unit What is the difference between discretionary and mandatory access control?
6

Discretionary Access Control vs Mandatory Access Control


Discretionary Access Control

In discretionary access control (DAC), the owner of the object specifies which subjects can access the object. Thi
the discretion of the owner.

Most operating systems such as all Windows, Linux, and Macintosh and most flavors of Unix are based on DAC m

In these operating systems, when you create a file, you decide what access privileges you want to give to other
access control decision based on the access privileges you created.

Mandatory Access Control

In mandatory access control (MAC), the system (and not the users) specifies which subjects can access specific d

The MAC model is based on security labels. Subjects are given a security clearance (secret, top secret, confidenti
secret, confidential, etc.). The clearance and classification data are stored in the security labels, which are bound

When the system is making an access control decision, it tries to match the clearance of the subject with the cla
of secret, and he requests a data object with a security classification of top secret, then the user will be denied a
object.

The MAC model is usually used in environments where confidentiality is of utmost importance, such as a militar

Examples of the MAC-based commercial systems are SE Linux and Trusted Solaris.

9. Unit Describe the purpose of performing auditing with its different levels in detail.
6
Auditing is the monitoring and recording of selected user database actions. It can be based on individual
actions, such as the type of SQL statement executed, or on combinations of factors that can include user
name, application, time, and so on. Security policies can trigger auditing when specified elements in an
Oracle database are accessed or altered, including the contents within a specified object.

The present chapter describes the different types of auditing, what it involves, and the resulting audit
trails and records.

Auditing is typically used to:

 Enable future accountability for current actions taken in a particular schema, table, or row, or
affecting specific content
 Deter users (or others) from inappropriate actions based on that accountability
 Investigate suspicious activity

For example, if some user is deleting data from tables, then the security administrator might
decide to audit all connections to the database and all successful and unsuccessful deletions of
rows from all tables in the database.

 Notify an auditor that an unauthorized user is manipulating or deleting data and that the user
has more privileges than expected which can lead to reassessing user authorizations
 Monitor and gather data about specific database activities

For example, the database administrator can gather statistics about which tables are being
updated, how many logical I/Os are performed, or how many concurrent users connect at peak
times.

 Detect problems with an authorization or access control implementation

For example, you can create audit policies that you expect will never generate an audit record
because the data is protected in other ways. However, if these policies do generate audit
records, then you will know the other security controls are not properly implemented.

The first technique is trace-based auditing. This technique is usually built directly into the native
capabilities of the DBMS. Commands or parameters are set to turn on auditing and the DBMS
begins to cut trace records when activity occurs against audited objects. Although each DBMS
offers different auditing capabilities, some common items that can be audited by DBMS audit
facilities
include:

1) Login and logoff attempts (both successful and unsuccessful attempts)


2) Database server restarts
3) Commands issued by users with system administrator privileges
4) Attempted integrity violations (where changed or inserted data does not match a
referential, unique, or check constraint)
5) Select, insert, update, and delete operations
6) Stored procedure executions
7) Unsuccessful attempts to access a database or a table (authorization failures)
8) Changes to system catalog tables
9) Row level operations

[Link] Backups of the database should be taken to protect data. Describe five measures that can be taken to
6 ensure the security and effectiveness of database backups.

Data backups are crucial when it comes to protecting your data; it’s even more important to ensure that
your backups are kept secure and available.

For many people, having data backed up is enough. However, what they fail to understand is that a
backup is not infallible. In fact, there are a whole range of issues that could affect the stability of a
backup. And, if your organization is in the midst of a data disaster, the one factor above all else that you
should be concentrating on is security. Your backup, after all, is your lifeline to resuming productivity, so
it should be as secure as possible.

To get you started, we’re going to look at the five best ways to secure your backups:

1. Password Protect Your Backups


At the very least, you have to make sure that your backups are password protected. While, yes, it’s
possible for a password to be cracked, it still acts as a deterrent and guarantees some level of security.
The stronger the password, of course, the more protection you provide to your backup, so make sure
that it’s unique and contains a mixture of uppercase, lowercase characters and numbers.

2. Restrict Backup Access Rights


The more people that have access to your backups, the more risk there is that they could be
compromised or damaged. Therefore, you should only ever assign access rights to the backup software
to those members of staff who genuinely need it. Nominating those with access in advance will help to
not only protect your backups, but also ensure that restores are completed quickly if the need arises.

3. Integrate Encryption
There’s every chance that your entire backup could be snatched in the case of a data disaster, so it’s vital
that it’s protected. After all, your backup is likely to contain data pertaining to your customers/staff and
this could be highly sensitive. However, by encrypting the data contained within your backup, you’re
rendering it next to useless in the hands of external parties. Along with password protection, it’s a
simple yet highly effective layer of defense.

4. Store Physical Backups in a Safe


Although we live in an age where cloud backups are grabbing all the headlines, it’s still important that
physical backups are also maintained. These can include: DVDs, optical disks and data tapes. Naturally,
due to the data contained on them, these storage methods represent a high security risk and can’t just
be stored on a shelf. The best solution is to invest in a safe, but make sure it’s fireproof as high levels of
heat can easily distort and damage physical storage devices.

5. Log all Backups


Most backup software will log details of the backups carried out, but when it comes to working with
physical backups it’s a little different. As your physical backups will be stored somewhere, there needs to
be a logging process of what is going where. It’s very easy for a single DVD to go missing, but, with a
logging system in place, you should discover this sooner rather than later.

[Link] What are some of the current outstanding challenges for database security?
6

1. Deployment Failures. The most common cause of database vulnerabilities is a lack of due care at the
moment they are deployed. ...
2. Broken databases. ...
3. Data leaks. ...
4. Stolen database backups. ...
5. The abuse of database features. ...
6. A lack of segregation. ...
7. Hopscotch. ...
8. SQL injections.

References

“Fundamentals of Database System” Elmasri & Navathe 5th Edition,


Addison Wesely

“Database System Concepts” Abraham Silberschatz, Henry 6th Edition, (2010)


F. Korth and S. Sudarshan;

You might also like