DBMS vs File System: Key Differences
DBMS vs File System: Key Differences
1 Unit Differentiate:
1 I . GeneralfilesystemVSDBMS.
File system is a collection of data. Any DBMS is a collection of data and user is not
management with the file system, user has to required to write the procedures for
write the procedures. managing the database.
File system gives the details of the data DBMS provides an abstract view of data that
representation and Storage of data. hides the details.
1
Protecting a file under file system is very
DBMS has a good protection mechanism.
difficult.
II. DDLandDML.
DDL DML
It is used to create database schema and can be It is used to add, retrieve or update the
used to define some constraints as well. data.
It basically defines the column (Attributes) of It add or update the row of the table.
the table. These rows are called as tuple.
Basic command present in DDL are CREATE, BASIC command present in DML are
DROP, RENAME, ALTER etc. UPDATE, INSERT, MERGE etc.
III. ExplainDCL.
DCL(Data Control Language) : DCL includes commands such as GRANT and REVOKE
which mainly deals with the rights, permissions and other controls of the database system.
2 Unit Givetheconceptofdataabstraction?
1 Also,Explainthedifferentlevelsofabstractionwithdiagraminadatabasesystem.
Data abstraction refers to providing only essential information about the data to the outside
world, hiding the background details or implementation. The are three levels of abstraction are
as follows: Physical level: The lowest level of abstraction describes how data are stored.
Logical level: The next higher level of abstraction, describes what data are stored in database
and what relationship among those data
Logical level: describes data stored in database, and the relationships among the data.
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;
end;
View level: application programs hide details of data types. Views can also hide information
(such as an employee’s salary) for security purposes.
3 Unit Whatarethedifferentsymbolsused forE-Rdiagram? State their usage.
1
.
4 Unit List fourdutiesofDBA.
1 -Designing the logical scheme
-Creating the structure of the entire database
-Monitor usage and create necessary index structures to speed
up query execution
-Grant / Revoke data access permissions to other users etc
5 Unit What is participation role? When it is necessary to role names in the description of relationship
1 type?
Participation role is defined as the part of relationship in which each entity participates in a
relationship. It is necessary to use role name in the description of relationship type when the same
entity type participates more than once in a relationship type in different roles.
6 Unit Defineentity,entityset,andattribute,key.
1
An entity is an ‘object’ in the ‘actual world’, and that can be identified from the other
[Link] entity in a database is represented as a set of attributes. Each entity has a different set
of values for some set of attributes.
An entity set is a group of entities that posses the same set of attributes.
Attributes are the properties of entities. Attributes are represented by means of ellipses.
A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a
relation(table). They allow you to find the relation between two tables. Keys help you uniquely
identify a row in a table by a combination of one or more columns in that table.
A table can have only ONE primary key; and in the table, this primary key can consist of single
or multiple columns (fields).
A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY
KEY in another table.
The table containing the foreign key is called the child table, and the table containing the
candidate key is called the referenced or parent table.
10 Unit Whatismeantbydataindependence?Explainitstypes.
1 Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that
make use of [Link] are:
Physical data independence - when the physical schema is altered without having the need to
rewrite application programs for the alteration.
Logical data independence -the ability to modify the logical schema without having to rewrite
the application program.
11 Unit [Link].
1
Types of Attributes Description
12 Unit [Link]
1 witheachpatientalogofthevarioustestsandexaminationconducted.
13 Unit Consider a university database for the scheduling of classrooms for -final exams. This database
1 could be modeled as the single entity set exam, with attributes course-name, section-number,
room-number, and time. Alternatively, one or more additional entity sets could be defined, along
with relationship sets to replace some of the attributes of the exam entity set, as
course with attributes name, department, and c-number
section with attributes s-number and enrollment, and dependent as a weak entity set on
course
room with attributes r-number, capacity, and building
Show an E-R diagram illustrating the use of all three additional entity sets listed.
UNIT 2
1 Unit DescribetheconceptandusefulnessofAssertions.
2
An assertion is a predicate expressing a condition that we wish the database always
to [Link] assertion in SQL takes the form
create assertion <assertion-name> check <predicate>
When an assertion is made, the system tests it for validity, and tests it again on every
update that may violate the assertion. This testing may introduce a significant
amount of overhead; hence assertions should be used with great care.
Asserting
for all X, P(X)
is achieved in a round-about fashion using
not exists X such that not P(X)
Example : Every loan has at least one borrower who maintains an account with a
minimum balance or $1000.00
create assertion balance_constraint check
(not exists (
select *
from loan
where not exists (
select *
from borrower, depositor, account
where loan.loan_number = borrower.loan_number
and borrower.customer_name =
depositor.customer_name
and depositor.account_number =
account.account_number
and [Link] >= 1000)))
In entity relationship diagrams (ER diagrams), a weak entity set is indicated by a bold (or double-
lined) rectangle (the entity) connected by a bold (or double-lined) type arrow to a bold (or
double-lined) diamond (the relationship).In the below ER Diagram, ‘Payment’ is the weak entity.
‘Loan Payment’ is the identifying relationship and ‘Payment Number’ is the partial key. Primary
Key of the Loan along with the partial key would be used to identify the records.
Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.
9 Unit Considerthefollowingrelationaldatabase:
2 employee(emp_no,ename,salary,dept_no,job_title)
department(dept_no,dname,location )
Foreach ofthequery givenbelow, giveanexpressioninSQL.
a) Listtheemployeenameanddepartmentname for employeewhosesalaryisgreaterthan1500$.
Select [Link], [Link] from emploee e, department d where e.dept_no=d.dept_no and
e,salary> 1500.
b) For each department number find how many employees are working
c) Calculatetheremainderofsalaryifitisdividedby500fortheemployeewhosejobtitleis manager.
Select mod(salary,500) from employee where job_title=’manager’
10 Unit EMP (Employee-no. Dept-no, Employee-name, Salary)
2 DEPT (Dept-no. Dept-name, Location)
Select e.employee_name from Emp e, dept d where d. dept_no= e.dept_no and [Link] !
=’Calcutta’ and [Link]>50000
(b) Calculate, for each department number, the number of employees with a salary greater
than Rs.1,00,000.
Select count( employee_no) , dept_no from Emp group by dept_no having salary>100000
11 Unit DifferentiatebetweenSQLcommandsDROPTABLEandDROPVIEW.
2
Drop table command removes all the data in the table and the table [Link] you DROP anything
using the SQL DROP command, something is removed permanently from the database. However, with
by dropping a view, the original data is unaffected. This is why the View is such a great tool for end users in
the first place the table behind the scenes is safe.
Project Operation : If the user is interested in selecting the values of a few attributes, rather than
selection all attributes of the Table (Relation), then one should go for PROJECT Operation.
Selection is for rows σsubject = "database"(Books) retrieve all database book details
Projection is for columns ∏subject, author (Books)retrieve subject and author of each book
13 Unit What is the difference between the WHERE and HAVING clause
2
Assume that this table records the repayment of loans by the borrowers. A borrower may have
multiple entries if he/she has paid multiple installments.
Question (a)
Find all the records with information on repayments from the borrower with id equal to 42, and
where the lent amount exceeds 1000.
Answer (a)
SELECT *
FROM Repayment
WHERE borrower_id=42 AND loanamount>1000;
Question (b)
Find the total amount repaid for every address in the repayment table.
Answer (b)
SELECT address, SUM(repayment_amount)
FROM Repayment
GROUP BY address;
Question (c)
Delete all information on the completed loans. (Note: you can find the status of the loan by
summing the total repaid amount. If the total repaid amount is equal to the loan amount, then you
would say that the loan is ended.)
Answer (c)
Question (d)
Find all the borrower names who has unique address. (ie., you should not count the borrowers
who are from the same address)
Answer (d)
SELECT name
FROM Repayment A
WHERE 1=
(SELECT COUNT(DISTINCT name)
FROM Repayment B
WHERE [Link]=[Link]);
Question (e)
Answer (e)
UNIT 3
1 Unit Explain in detail the concept of functional dependency and state significance of
3 normalization
Normalization is a process to eliminate the flaws of a database with bad design. A poorly
designed database is inconsistent and create issues while adding, deleting or updating
[Link] following makes Database Normalization a crucial step in database
design process −
Resolving the database anomalies
The forms of Normalization i.e. 1NF, 2NF, 3NF, BCF, 4NF and 5NF remove all the
Insert, Update and Delete anomalies.
Insertion Anomaly occurs when you try to insert data in a record that does not exist.
Deletion Anomaly is when a data is to be deleted and due to the poor deign of database,
other record also deletes.
Eliminate Redundancy of Data
Storing same data item multiple times is known as Data Redundancy. A normalized table
do not have the issue of redundancy of data.
Data Dependency
The data gets stored in the correct table and ensures normalization.
Isolation of Data
A good designed database states that the changes in one table or field do not affect other.
This is achieved through Normalization.
Data Consistency
While updating if a record is left, it can led to inconsistent data, Normalization resolves it
and ensures Data Consistency.
2 Unit What is multivalued dependencies? How it is represented? And how to identify which given
3 MVD is trivial or non trivial?
Multi-valued dependency (MVD) represents a dependency between attributes (for example, A,
B and C) in a relation, such that for each value of A there is a set of values for B and a set of
value for C. However, the set of values for B and C are independent of each other.
multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A --> B
in relation R is defined as being trivial if B is a subset of Aor A U B = R
A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.
3 Unit Differentiate between 3NF and BCNF
3
Difference between 3NF and BCNF :
It is comparatively more
2. It is less stronger than BCNF.
stronger than 3NF.
The redundancy is
4. The redundancy is high in 3NF.
comparatively low in BCNF.
3NF is sufficient because because most of the 3NF tables are free of insertion, update, and deletion
anomalies. Moreover, 3NF always ensures functional dependency preserving and lossless.
5 Unit ConsideraschemaR(A,B, C, D)andfunctionaldependenciesA->BandC->D. C O M M E N T
3 O N decompositionofRintoR1(A,B)andR2(C,D)is?
In the above question R(A, B, C, D) is decomposed into R1 (A, B) and R2(C, D) and there are
only two FDs A -> B and C -> D. So, the decomposition is dependency preserving
6 Unit Discuss insertion,deletion and modification anomalies. Why are they considered bad? Illustrate
3 with examples.
The Problems resulting from data redundancy in an unnormalized database table are collectively
known as update anomalies. So any database insertion, deletion or modification that leaves the
database in an inconsistent state is said to have caused an update anomaly. They are classified as:
• Insertion anomalies:To insert the details of a new member of staff located at branch B1 into the
Tbl_Staff_Branch Table shown above, we must enter the correct details of branch numner B1 so
that the branch details are consistent with the values for branch B1 in other rows. To insert the
details of a new branch that currently has no members of staff into the Tbl_Staff_Branch table, it is
necessory to enter nulls for the staff details which is not allowed as staffID is the primary key. But
if you normalize Tbl_Staff_Branch, which is in Second Normal Form (2NF) to Third Normal
Dorm (3NF), you end up with Tbl_Staff and Tbl_Branch and you shouldn't have the problems
mentioned above.
• Deletion anomalies: If we delete a row from the Tbl_Staff_Branch table that represents the last
member of staff located at that branch, (for e.g. row with Branch numbers B", B3 or B4) the
details about that branch are also lost from the Database.
Modification anomalies: Should we need to change the address of a perticular branch in the
Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this
modification is not carried out on all the relevent rows, the database will become inconsistent.
They are considered bad because, they lead to:
3. Transitive rule
{X → Y, Y → Z} ⊨ {X → Z}
4. Decomposition or Projective rule
{X → YZ} ⊨ {X → Y}
5. Union or Additive rule
{X → Y, X → Z} ⊨ {X → YZ}
6. Pseudo transitive rule
{X → Y, WY → Z} ⊨ {WX → Z}
8 Unit ExplainPJNF :
3
PJNF-Fifth normal form (5NF), is also known as project-join normal form (PJNF). It is a level of
database normalization designed to reduce redundancy in relational databases. A relation is said
to be in 5NF if and only if it satisfies 4NF and no join dependency exists. A relation is said to
have join dependency if it can be recreated by joining multiple sub relations and each of these sub
relations has a subset of the attributes of the original [Link] for join dependency:
If the join of R1 and R2 over Q is equal to relation R then we can say that a join dependency
exists, where R1 and R2 are the decomposition R1 (P, Q) and R2 (Q, S) of a given relation R (P,
Q, S). R1 and R2 are a lossless decomposition of R.
Properties of 5NF:
A relation R is in 5NF if and only if it satisfies following conditions:
R should be in 4NF (no multi-valued dependency exists).
It cannot undergo lossless decomposition (join dependency)
9 Unit Consider a relation R with five attributes ABCDE. You are given the following dependencies: A
3 → B, BC → E, and ED → A.
2.
2. Is R in 3NF?
3. Is R in BCNF?
Answer
1. Add A to S.
2. Recursively add attributes which can be functionally determined from attributes of the set
S until done.
Similarly,
no smaller subsets of keys that hold over R(A,B,C,D). The FD: C → D violates
C is not a superkey
UNIT 4
1. Unit Draw a state diagram and discuss in detail the typical states that a query processing goes
4 through during execution.
[Link] and translation
[Link]
[Link]
translate the query into its internal form. This is then translated into relational algebra.
Parser checks syntax, verifies relations
Evaluation
The query-execution engine takes a query-evaluation plan, executes that plan, and returns the
answers to the query.
Query Optimization:
Amongst all equivalent evaluation plans choose the one with lowest cost.
Cost is estimated using statistical information from the
database catalog
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved are:
Parsing and translation. Optimization.
Query Processing
Query Processing would mean the entire process or activity which involves query translation
into low level instructions, query optimization to save resources, cost estimation or evaluation of
query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize
the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which
typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;
Figure 1 - Steps in Database Query Processing
Let us discuss the whole process with an example. Let us consider the following two relations as
the example tables for our discussion;
Example:
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the user’s
privileges to execute the query, the table names and attribute names, etc. The correct table
names, attribute names and the privilege of the users can be taken from the system catalog (data
dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low level
instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ [Link]=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are
information about the size of the table, the length of records, the indexes created on the table,
etc. Optimizer also checks for the conditions and conditional attributes which are parts of the
query.
So far, we have got two execution plans. Only condition is that both plans should give the same
result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return
same result (obvious), they differ in terms of Time consumption to execute the query, or the
Space required executing the query. Hence, it is mandatory choose one plan which obviously
consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Execution
plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly
operation) then apply the condition (conditions are considered as filters) on the joined relation.
This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the
Employee table. This join may need to compare less number of records. Hence, the second plan
is the best (with the information known, not always).
3. Give the flow of processing of following query with the help of query processing steps
SELECT Ename
FROM Employee, Proj_Assigned
WHERE [Link] = Proj_Assigned.Eno AND DOP > 10;
Answer:
4. Describe three techniques commonly used when developing algorithms for relational
operators. Explain how these techniques can be used to design algorithms for the selection,
projection, and join operators.
1. The three techniques commonly used are indexing, iteration, and partitioning:
Indexing: If a selection or join condition is specified, use an index to examine just the tuples that
satisfy the condition.
Iteration: Examine all tuples in an input table, one after the other. If we need only
a few fields from each tuple and there is an index whose key contains all these fields,
instead of examining data tuples, we can scan all index data entries.
Selection: For a selection with more than one tuple matching the query (in general, at
least 5%), indexing like B+ Trees are very useful. This comes into play often with range
queries. It allows us to not only find the first qualifying tuple quickly, but also the other
qualifying tuples soon after (especially if the index is clustered). For a selection
condition with an equality query where there are only a few (usually 1) matching tuple,
partitioning using hash indexing is often appropriate. It allows us to find the exact tuple
we are searching for with a cost of only a few (typically one or two) I/Os.
Projection: The projection operation requires us to drop certain fields of the input,
which can result in duplicates appearing in the result set. If we do not need to remove
these duplicates, then the iteration technique can efficiently handle this problem. On ther
other hand, if we do need to elim- inate duplicates, partitioning the data and applying a
sort key is typically performed.
In the case that there are appropriate indexes available, this can lead to less expensive
plans for sorting the tuples during duplicate elimination since the data may all ready be
sorted on the index (in that case we simply compare adjacent entries to check for
duplicates)
Join: The most expensive database operation, joins, can combinations of all three
techniques. A join operation typically has multiple selection and projection elements
built into it, so the importance of having appropriate indexes or of partitioning the data is
just as above, if not more so. When possible, the individual selections and projections
are applied to two relations before they are joined, so as to decrease the size of the
intermediate table.
As an example consider joining two relations with 100,000 tuples each and only 5 % of
qualifying tuples in each table. Joining before applying the selection conditions, would
result in a huge intermediate table size that would then have to be searched for matching
selections. Alternatively, consider applying parts of the selection first. We can then
perform a join of the 5,000 qualifying tuples found after applying the selection to each
table, that can then be searched and handled significantly faster.
The goal of query optimization is to avoid the worst plans and find a good plan. The goal is
usually not to find the optimal plan. The deference in cost between a good plan and a bad plan
can be several orders of magnitude: a good query plan can evaluate the query in seconds,
whereas a bad query plan might take days
This expression performs the theta join on the smallest amount of data possible.
It does this by restricting the right hand side operand of the join to only
those branches in Brooklyn, and also eliminating the unneeded attributes from both the
operands.
8. Unit What is the difference between pipe lining and materialization?
4
Pipelining
With pipelined evaluation, operations form a queue, and results are passed from one
operation to another as they are calculated, hence the technique’s name.
With pipelined evaluation, operations form a queue, and results are passed from one
operation to another as
they are calculated, hence the technique’s name.
General approach: restructure the individual operation algorithms so that they take streams
of tuples as both input and output.
10. How to choose the best evaluation plan for query? Explain
The isolation property is enforced by the concurrency control subsystem of the DBMS.8 If every
transaction does not make its updates (write operations) visible to other transactions until it is
committed, one form of isolation is enforced that solves the temporary update problem and
eliminates cascading rollbacks (see Chapter 23) but does not eliminate all other problems. There
have been attempts to define the level of isolation of a transaction. A transaction is said to have
level 0 (zero) isola-tion if it does not overwrite the dirty reads of higher-level transactions. Level
1 (one) isolation has no lost updates, and level 2 isolation has no lost updates and no dirty reads.
Finally, level 3 isolation (also called true isolation) has, in addition to level 2 properties,
repeatable reads.
A Shared Lock is basically a read-only lock for a row-level. Any number of resources can fetch
the data to read when the shared lock is present on the resource. That means that many process
IDs can have a shared lock on the same resource to read the respective data.
An Update Lock is used and valid when there is a shared lock applicable for a resource. In other
words, the update lock cannot be placed until there are no other offending exclusive or update
locks for the fetching resource. Additionally, the update lock happens to be acquiring an
exclusive when all other locks are released from a resource.
2. For the conflicting pair r1(x) w2(x), where r1(x) happens before w2(x), draw an edge from
T1 to T2.
3. For the conflicting pair w2(x) w1(x), where w2(x) happens before w1(x), draw an edge
from T2 to T1.
Since the graph is cyclic, we can conclude that it is not conflict serializable to any schedule
serial schedule.
Let us try to infer a serial schedule from this graph using topological ordering.
The edge T1–>T2 tells that T1 should come before T2 in the linear ordering.
The edge T2 –> T1 tells that T2 should come before T1 in the linear ordering.
So, we can not predict any particular order (when the graph is cyclic). Therefore, no serial
schedule can be obtained from this graph.
1) Strict 2-PL –
This requires that in addition to the lock being 2-Phase all Exclusive(X) Locks held by the
transaction be released until after the Transaction Commits.
Following Strict 2-PL ensures that our schedule is:
Recoverable
Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and
moreover guarantee Strict Schedules but still Deadlocks are possible!
2) Rigorous 2-PL –
This requires that in addition to the lock being 2-Phase all Exclusive(X) and Shared(S)
Locks held by the transaction be released until after the Transaction Commits.
Following Rigorous 2-PL ensures that our schedule is:
Recoverable
Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and
moreover guarantee Strict Schedules but still Deadlocks are possible!
Note the difference between Strict 2-PL and Rigorous 2-PL is that Rigorous is more restrictive, it
requires both Exclusive and Shared locks to be held until after the Transaction commits and this
is what makes the implementation of Rigorous 2-PL more easy.
3) Conservative 2-PL –
Conservative 2-PL is Deadlock free and but it does not ensure Strict schedule(More
about this here!). However, it is difficult to use in practice because of need to predeclare the
read-set and the write-set which is not possible in many situations. In practice, the most
popular variation of 2-PL is Strict 2-PL.
Static 2-PL, this protocol requires the transaction to lock all the items it access before the
Transaction begins execution by predeclaring its read-set and write-set. If any of the
predeclared items needed cannot be locked, the transaction does not lock any of the items,
instead it waits until all the items are available for locking.
The Venn Diagram below shows the classification of schedules which are rigorous and
strict. The universe represents the schedules which can be serialized as 2-PL. Now as the
diagram suggests, and it can also be logically concluded, if a schedule is Rigorous then it is
Strict. We can also think in another way, say we put a restriction on a schedule which makes
it strict, adding another to the list of restrictions make it Rigorous. Take a moment to again
analyze the diagram and you’ll definitely get it.
5. Describe Lock Based Concurrency Control Protocol in DBMS.
Concurrency control techniques are used to ensure that the Isolation (or non-interference)
property of concurrently executing transactions is maintained.
.
Concurrency-control protocols: allow concurrent schedules, but ensure that the schedules are
conflict/view serializable, and are recoverable and maybe even cascadeless.
These protocols do not examine the precedence graph as it is being created, instead a protocol
imposes a discipline that avoids non-seralizable schedules.
Different concurrency control protocols provide different advantages between the amount of
concurrency they allow and the amount of overhead that they impose.:
Granularity – It is the size of data item allowed to lock. Now Multiple Granularity means
hierarchically breaking up the database into blocks which can be locked and can be track what
need to lock and in what fashion. Such a hierarchy can be represented graphically as a tree.
For example, consider the tree, which consists of four levels of nodes. The highest level
represents the entire database. Below it are nodes of type area; the database consists of exactly
these areas. Area has children nodes which are called files. Every
area has those files that are its child nodes. No file can span more than one area.
Finally, each file has child nodes called records. As before, the file consists of exactly those
records that are its child nodes, and no record can be present in more than one file. Hence, the
levels starting from the top level are:
database
area
file
record
Algorithm must ensure that, for each items accessed by Conflicting Operations in the schedule,
the order in which the item is accessed does not violate the ordering. To ensure this, use two
Timestamp Values relating to each database item X.
W_TS(X) is the largest timestamp of any transaction that executed write(X) successfully.
R_TS(X) is the largest timestamp of any transaction that executed read(X) successfully.
If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject the
operation. else,
Execute W_item(X) operation of T and set W_TS(X) to TS(T).
If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to
the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect order,
it rejects the later of the two operation by aborting the Transaction that issued it. Schedules
produced by Basic TO are guaranteed to be conflict serializable. Already discussed that using
Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollback is still possible. Suppose we
have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted and resubmitted to
the system then, T must also be aborted and rolled back. So the problem of Cascading aborts still
prevails.
Schedule, as the name suggests, is a process of lining the transactions and executing them one by
one. When there are multiple transactions that are running in a concurrent manner and the order
of operation is needed to be set so that the operations do not overlap each other, Scheduling is
brought into play and the transactions are timed accordingly.
Here we will discuss various types of schedules.
9. Unit Describe :
5 1) Serial Schedule
2) Non- Serial Schedule
1. Serial Schedules:
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is
one in which no transaction starts until a running transaction has ended are called serial
schedules.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)
where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —> T2
2. Non-Serial Schedule:
This is a type of Scheduling where the operations of multiple transactions are interleaved.
This might lead to a rise in the concurrency problem. The transactions are executed in a
non-serial manner, keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all its operation, in
the non-serial schedule, the other transaction proceeds without waiting for the previous
transaction to complete. This sort of schedule does not provide any benefit of the concurrent
transaction. It can be of two types namely, Serializable and Non-Serializable Schedule.
The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
a. Serializable:
This is used to maintain the consistency of the database. It is mainly used in the Non-Serial
scheduling to verify whether the scheduling will lead to any inconsistency or not. On the
other hand, a serial schedule does not need the serializability because it follows a
transaction only when the previous transaction is complete. The non-serial schedule is said
to be in a serializable schedule only when it is equivalent to the serial schedules, for an n
number of transactions. Since concurrency is allowed in this case thus, multiple transactions
can execute concurrently. A serializable schedule helps in improving both resource
utilization and CPU throughput.
These are of two types:
1. Conflict Serializable:
2. View Serializable:
b. Non-Serializable:
The non-serializable schedule is divided into two types, Recoverable and Non-recoverable
Schedule.
1. Recoverable Schedule:
Schedules in which transactions commit only after all transactions whose changes they
read commit are called recoverable schedules. In other words, if some transaction T j is
reading value updated or written by some other transaction Ti, then the commit of
Tj must occur after the commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
T1 T2
W(A)
W(A)
R(A)
commit
commit
This is a recoverable schedule since T1 commits before T2, that makes the value read
by T2 correct.
10. Unit Briefly Give the working and importance of View Serializibility in Transaction.
5
VIEW SERIALIZABILITY IN DBMS TRANSACTIONS
Two schedules S1 and S2 are said to be view equal if below conditions are satisfied :
1) Initial Read
If a transaction T1 reading data item A from initial database in S1 then in S2 also T1 should read
A from initial database.
T1 T2 T3
-------------------
R(A)
W(A)
R(A)
R(B)
2)Updated Read
If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read A which is updated
by Tj.
T1 T2 T3 T1 T2 T3
------------------- ----------------
W(A) W(A)
W(A) R(A)
R(A) W(A)
Above two schedule are not view equal as in S1 :T3 is reading A updated by T2, in S2 T3 is
reading A updated by T1.
3)Final Write operation
If a transaction T1 updated A at last in S1, then in S2 also T1 should perform final write
operations.
T1 T2 T1 T2
------------ ---------------
R(A) R(A)
W(A) W(A)
W(A) W(A)
Above two schedule are not view as Final write operation in S1 is done by T1 while in S2 done
by T2.
View Serializability: A Schedule is called view serializable if it is view equal to a serial schedule
(no overlapping transactions).
11. Unit Give the difference between Conflict and View Serializability.
5
Serializable Schedule :
A transaction schedule is serializable if its outcome is equal to the outcome of its transactions
executed serially i.e. sequentially without overlapping in time. A serializable schedule always
leaves the database in consistent state. A serial schedule is always a serializiable schedule
because a new transcation only starts when the older one has finished execution.
Example –
Let us consider the following schedule and see if it is serializable.
T1 T2
R(X)
R(X)
R(Y)
W(Y)
R(Y)
W(X)
R(X)
T1 T2
R(Y)
W(Y)
R(X)
R(Y)
W(X)
Thus, after changing the conflicting operations in the initial schedule we get a serial schedule.
Hence, this schedule is serializable.
Difference between Conflict and View Serializability :
S.N CONFLICT
the schedule get executed in the operations is the same in both the
say T2 write its own value of A, transactions say T2 writes its own
never read.
UNIT-06
1. Unit List and explain commonly accepted threats to database security.
6
Database Security Threats:
Database security begins with physical security for the systems that host the database management
system (DBMS). Database Management system is not safe from intrusion, corruption, or destruction by
people who have physical access to the computers. Once physical security has been established,
database must be protected from unauthorized access by authorized users as well as unauthorized users.
There are three main objects when designing a secure database system, and anything prevents from a
database management system to achieve these goals would be consider a threat to database security.
There are many internal and external threats to database systems.
Integrity:
Database integrity refers that information be protected from improper modification. Modification
includes creation, insertion, modification, changing the status of data, and deletion. Integrity is lost if
unauthorized changes are made intentionally or through accidental acts. For example, Students cannot
be allowed to modify their grades.
Availability:
Authorized user or program should not be denied access. For example, an instructor who wishes to
change a student grade should be allowed to do so.
Secrecy:
Data should not be disclosed to unauthorized users. For example, a student should not be allowed to see
and change other student grades.
Denial of service attack:
This attack makes a database server greatly slower or even not available to user at all. DoS attack does
not result in the disclosure or loss of the database information; it can cost the victims much time and
money.
Sniff attack:
To accommodate the e-commerce and advantage of distributed systems, database is designed in a client-
server mode. Attackers can use sniffer software to monitor data streams, and acquire some confidential
information. For example, the credit card number of a customer.
Spoofing attack:
Attackers forge a legal web application to access the database, and then retrieve data from the database
and use it for bad transactions. The most common spoofing attacks are TCP used to get the IP addresses
and DNS spoofing used to get the mapping between IP address and DNS name.
Trojan Horse:
It is a malicious program that embeds into the system. It can modify the database and reside in operating
system.
To achieve these objectives, a clear and consistent security policy should be developed to define what
security measure must be enforced. We must determine what part of data is to be protected and which
users get access to which part of the information. The security mechanisms of the underlying database
management system, as well as external mechanism, such as securing access to buildings, must be
utilized to enforce the policy.
You grant privileges to users so these users can accomplish tasks required for their job. You should grant
a privilege only to a user who absolutely requires the privilege to accomplish necessary work. Excessive
granting of unnecessary privileges can compromise security. A user can receive a privilege in two
different ways:
You can grant privileges to users explicitly. For example, you can explicitly grant the privilege to
insert records into the employees table to the user SCOTT.
You can also grant privileges to a role (a named group of privileges), and then grant the role to
one or more users. For example, you can grant the privileges to select, insert, update, and delete
records from the employees table to the role named clerk, which in turn you can grant to the
users scott and brian.
Because roles allow for easier and better management of privileges, you should normally grant privileges
to roles and not to specific users.
1) SYSTEM PRIVILEGES
3. Unit Mention and briefly explain the control measures that are used to provide security of database.
6
B. Database Security Countermeasures:
To protect the database system from the above mentioned threats. Here are some
countermeasures which are as follows:
C. Access Control:
A database for an organization contains a great deal of information and usually has several users.
Most of them need to access only a small part of the database. A policy defines the requirements
that are to be implemented within hardware and software and those that are external to the
system, including physical, personal, and procedural controls.
D. Flow Control:
Flow control provides the flow of information among accessible objects. Flow controls check that
information contained in objects does not flow explicitly or implicitly into less protected objects.
E. Encryption:
An encryption algorithm should be applied to the data, using a user-specified encryption key. The
output of the algorithm is the encrypted version. There is also a decryption algorithm, which takes
the encrypted data and a decryption key as input and then returns the original data.
F. RAID:
Redundant Array of Independent Disks which protect against data loss due to disk failure.
G. Authentication:
Access to the database is a matter of authentication. It provides the guidelines how the database is
accessed. Every access should be monitored.
H. Backup:
At every instant, backup should be done. In case of any disaster, Organizations can retrieve their
data.
For example, if the DEPT relation contains the number of the floor on which the department is located,
one may define a view of employee names together with the floors on which they work:
DEFINE VIEW LOCEMP AS: SELECT NAME, FLOOR FROM EMPLOYEE, DEPT WHERE
[Link] = [Link]
It is to be emphasized that views are dynamic windows and not static copies. As the information stored in
a base relation changes, the information visible through views defined on that relation changes with it. The
view mechanism is our means of granting access to row and column subsets, granting “statistical” access,
and granting access to other transformations of relations. In the first part of the paper we describe our
authorization mechanism as it applies to all objects in the system, regardless of whether they are views or
relations. We refer to both views and relations by the collective name “tables.” Later we describe the
extensions necessary to accommodate a dynamic view mechanism.
5. Unit What is meant by granting a privilege?
6
It’s used to grant the privileges to the users. For e.g. GRANT create table, create session to amit,
Raj with GRANT OPTION;
It’s also possible to cancel the privileges by using Revoke.
Grant and Revoke commandos are known as data control language commandos DCL granting of
privileges. The grant statement is used to give authorization to any particular user.
Syntax:-
Revoke:-
o The revocation of a privilege from a user may cause other user to lose that privilege this
behaviour is called cascading of Revoke. In most database system cascading is default
behaviour.
o Revoke select on stud-into from 41 restrict, with this system returns an error if there are
cascading revoke & revocation with stop.
o Revoke grant option for select on stud-into from 4.
o Due to this command only grant option will be revoke from user 4, that is no, he can no
grant privilege to any other user.
6. Unit Explain:
6
Triggers and its types with significance and example
Triggers can be defined on the table, view, schema, or database with which the event is
associated.
Benefits of Triggers
Triggers can be written for the following purposes −
[OF col_name]
ON table_name
[REFERENCING OLD AS o NEW AS n]
WHEN (condition)
DECLARE
Declaration-statements
BEGIN
Executable-statements
EXCEPTION
Exception-handling-statements
END;
Where,
CREATE [OR REPLACE] TRIGGER trigger_name − Creates or replaces an existing
trigger with the trigger_name.
{BEFORE | AFTER | INSTEAD OF} − This specifies when the trigger will be executed.
The INSTEAD OF clause is used for creating trigger on a view.
{INSERT [OR] | UPDATE [OR] | DELETE} − This specifies the DML operation.
[OF col_name] − This specifies the column name that will be updated.
[ON table_name] − This specifies the name of the table associated with the trigger.
[REFERENCING OLD AS o NEW AS n] − This allows you to refer new and old values
for various DML statements, such as INSERT, UPDATE, and DELETE.
[FOR EACH ROW] − This specifies a row-level trigger, i.e., the trigger will be executed
for each row being affected. Otherwise the trigger will execute just once when the SQL
statement is executed, which is called a table level trigger.
WHEN (condition) − This provides a condition for rows for which the trigger would fire.
This clause is valid only for row-level triggers.
Example
To start with, we will be using the CUSTOMERS table we had created and used in the previous
chapters −
Select * from customers;
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 6 | Komal | 22 | MP | 4500.00 |
+----+----------+-----+-----------+----------+
The following program creates a row-level trigger for the customers table that would fire for
INSERT or UPDATE or DELETE operations performed on the CUSTOMERS table. This
trigger will display the salary difference between the old values and new values −
DECLARE
sal_diff number;
BEGIN
END;
When the above code is executed at the SQL prompt, it produces the following result −
Trigger created.
Salary difference:
Because this is a new record, old salary is not available and the above result comes as null. Let
us now perform one more DML operation on the CUSTOMERS table. The UPDATE statement
will update an existing record in the table −
UPDATE customers
WHERE id = 2;
In discretionary access control (DAC), the owner of the object specifies which subjects can access the object. Thi
the discretion of the owner.
Most operating systems such as all Windows, Linux, and Macintosh and most flavors of Unix are based on DAC m
In these operating systems, when you create a file, you decide what access privileges you want to give to other
access control decision based on the access privileges you created.
In mandatory access control (MAC), the system (and not the users) specifies which subjects can access specific d
The MAC model is based on security labels. Subjects are given a security clearance (secret, top secret, confidenti
secret, confidential, etc.). The clearance and classification data are stored in the security labels, which are bound
When the system is making an access control decision, it tries to match the clearance of the subject with the cla
of secret, and he requests a data object with a security classification of top secret, then the user will be denied a
object.
The MAC model is usually used in environments where confidentiality is of utmost importance, such as a militar
Examples of the MAC-based commercial systems are SE Linux and Trusted Solaris.
9. Unit Describe the purpose of performing auditing with its different levels in detail.
6
Auditing is the monitoring and recording of selected user database actions. It can be based on individual
actions, such as the type of SQL statement executed, or on combinations of factors that can include user
name, application, time, and so on. Security policies can trigger auditing when specified elements in an
Oracle database are accessed or altered, including the contents within a specified object.
The present chapter describes the different types of auditing, what it involves, and the resulting audit
trails and records.
Enable future accountability for current actions taken in a particular schema, table, or row, or
affecting specific content
Deter users (or others) from inappropriate actions based on that accountability
Investigate suspicious activity
For example, if some user is deleting data from tables, then the security administrator might
decide to audit all connections to the database and all successful and unsuccessful deletions of
rows from all tables in the database.
Notify an auditor that an unauthorized user is manipulating or deleting data and that the user
has more privileges than expected which can lead to reassessing user authorizations
Monitor and gather data about specific database activities
For example, the database administrator can gather statistics about which tables are being
updated, how many logical I/Os are performed, or how many concurrent users connect at peak
times.
For example, you can create audit policies that you expect will never generate an audit record
because the data is protected in other ways. However, if these policies do generate audit
records, then you will know the other security controls are not properly implemented.
The first technique is trace-based auditing. This technique is usually built directly into the native
capabilities of the DBMS. Commands or parameters are set to turn on auditing and the DBMS
begins to cut trace records when activity occurs against audited objects. Although each DBMS
offers different auditing capabilities, some common items that can be audited by DBMS audit
facilities
include:
[Link] Backups of the database should be taken to protect data. Describe five measures that can be taken to
6 ensure the security and effectiveness of database backups.
Data backups are crucial when it comes to protecting your data; it’s even more important to ensure that
your backups are kept secure and available.
For many people, having data backed up is enough. However, what they fail to understand is that a
backup is not infallible. In fact, there are a whole range of issues that could affect the stability of a
backup. And, if your organization is in the midst of a data disaster, the one factor above all else that you
should be concentrating on is security. Your backup, after all, is your lifeline to resuming productivity, so
it should be as secure as possible.
To get you started, we’re going to look at the five best ways to secure your backups:
3. Integrate Encryption
There’s every chance that your entire backup could be snatched in the case of a data disaster, so it’s vital
that it’s protected. After all, your backup is likely to contain data pertaining to your customers/staff and
this could be highly sensitive. However, by encrypting the data contained within your backup, you’re
rendering it next to useless in the hands of external parties. Along with password protection, it’s a
simple yet highly effective layer of defense.
[Link] What are some of the current outstanding challenges for database security?
6
1. Deployment Failures. The most common cause of database vulnerabilities is a lack of due care at the
moment they are deployed. ...
2. Broken databases. ...
3. Data leaks. ...
4. Stolen database backups. ...
5. The abuse of database features. ...
6. A lack of segregation. ...
7. Hopscotch. ...
8. SQL injections.
References