0% found this document useful (0 votes)

14 views80 pages

PL/SQL Cursors and Triggers Explained

Q: Explain how the Isolation property affects concurrency control in database systems.

Isolation ensures that transactions do not interfere with each other when executed concurrently, which is essential for maintaining data consistency. It prevents other transactions from accessing data that is being modified by a current transaction until the modification is complete, often enforced by the database's concurrency control subsystem. This helps in maintaining integrity and consistency within the database .

Q: Evaluate the advantages and limitations of using XML in database systems, especially in terms of data exchange and storage.

XML's flexibility and platform-independent nature make it ideal for data exchange between disparate systems and for integrating legacy data. However, while XML supports complex data models better than traditional relational databases, it can be verbose and may require significant storage space. Furthermore, processing XML to extract information can be computationally intensive, necessitating powerful systems for efficient management .

Q: What are the ACID properties and how do they collectively ensure reliability in database transactions?

The ACID properties—Atomicity, Consistency, Isolation, and Durability—ensure that database transactions are processed reliably. Atomicity guarantees that a transaction is either fully completed or not at all. Consistency ensures that transactions transform the database from one valid state to another. Isolation ensures that concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially. Durability ensures that once a transaction has been committed, it remains so, even in the case of a system failure .

Q: Discuss the significance of the 'Failed' state in transaction processing and its impact on data consistency.

The 'Failed' state in transaction processing indicates that an error occurred, preventing the transaction from completing successfully. In such a case, to maintain data consistency and integrity, the system often rolls back to revert the database to its prior state or perform necessary corrections. This prevents the database from ending up in an inconsistent state caused by partial completion of transactions .

Q: Analyze how a partially committed state differs from a committed state in a transaction lifecycle.

A partially committed state occurs when a transaction has executed its final operation but has not yet saved the data permanently to the database. It represents a transitional phase before the database reflects the changes made by the transaction. In contrast, a committed state signifies that all operations have been successfully executed, and all changes are now permanently recorded in the database system .

Q: In what ways does the Rollback operation support transaction management in database systems?

The Rollback operation supports transaction management by undoing all operations of a transaction when it fails, which ensures that the database is returned to its previous consistent state. This is critical to maintaining database integrity, as it prevents the application of incomplete or erroneous changes caused by transaction failures .

Q: How does the concept of Serializability play a critical role in ensuring the correctness of concurrent database transactions?

Serializability ensures that the outcome of executing transactions concurrently is the same as if transactions were executed serially. It is a fundamental mechanism in database systems to maintain consistency in concurrent transactions by preventing anomalies and ensuring that concurrent execution does not lead to conflicts or inconsistent system states .

Q: Describe how 'Shared Nothing Architecture' applies to parallel database systems and why it is beneficial.

'Shared Nothing Architecture' in parallel databases implies that each node operates independently with its own resources, including CPU, memory, and storage. This architecture facilitates high scalability and fault isolation, as nodes do not share resources and communicate only via a network when needed. It supports efficient query execution by distributing processing loads and mitigates bottlenecks, improving the overall performance .

Q: How does the concept of Atomicity contribute to database transaction integrity, and what are the implications if Atomicity is not maintained?

Atomicity ensures that all operations within a transaction are completed; if even a single operation fails, the entire transaction is aborted, thereby maintaining the database's consistency. Without Atomicity, partial transactions would lead to data inconsistencies, such as the case where funds are deducted from one account but not credited to another, which would leave the system in an incorrect state after a failure .

Q: What challenges can arise with deductive databases in terms of scalability and performance, and how do these challenges affect their application in real-world scenarios?

Deductive databases face challenges in scalability due to the computational intensity of logical inferences and complex rule processing across large datasets. This can lead to performance bottlenecks, especially with recursive queries and slow response times if not optimized. These limitations may hinder the adoption of deductive databases in real-world scenarios where data volumes are vast and real-time processing is required .

The document covers key concepts in PL/SQL, including cursors, triggers, and normalization in databases. Cursors are pointers for retrieving data row-by-row, while triggers automatically execute code in response to certain events. The document also discusses various normal forms in database design, including 4NF, 5NF, and DKNF, emphasizing the importance of eliminating redundancy and ensuring data integrity.

Uploaded by

Ramneet Kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views80 pages

PL/SQL Cursors and Triggers Explained

Uploaded by

Ramneet Kaur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 2:

A Cursor in PL/SQL is a pointer to a context area that stores the result set of a query.

PL/SQL Cursors
The cursor is used to retrieve data one row at a time from the results set, unlike other SQL
commands that operate on all rows at once.

Cursors update table records in a singleton or row-by-row manner.

The Data that is stored in the Cursor is called the Active Data Set. Oracle DBMS has another
predefined area in the main memory Set, within which the cursors are opened. Hence the size of
the cursor is limited by the size of this pre-defined area.

Cursor Actions
Key actions involved in working with cursors in PL/SQL are:

1. Declare Cursor: A cursor is declared by defining the SQL statement that returns a result
set.
2. Open: A Cursor is opened and populated by executing the SQL statement defined by the
cursor.
3. Fetch: When the cursor is opened, rows can be fetched from the cursor one by one or in a
block to perform data manipulation.
4. Close: After data manipulation, close the cursor explicitly.
5. Deallocate: Finally, delete the cursor definition and release all the system resources
associated with the cursor.
Types of Cursors in PL/SQL
Cursors are classified depending on the circumstances in which they are opened.

• Implicit Cursor: If the Oracle engine opened a cursor for its internal processing it is
known as an Implicit Cursor. It is created “automatically” for the user by Oracle when a
query is executed and is simpler to code.
• Explicit Cursor: A Cursor can also be opened for processing data through a PL/SQL
block, on demand. Such a user-defined cursor is known as an Explicit Cursor.
Explicit cursor
An explicit cursor is defined in the declaration section of the PL/SQL Block. It is created on a
SELECT Statement which returns more than one row.

Syntax for creating cursor

CURSOR cursor_name IS select_statement;

Where,

• cursor_name: A suitable name for the cursor.

• select_statement: A select query which returns multiple rows

How to use Explicit Cursor?

There are four steps in using an Explicit Cursor.

1. DECLARE the cursor in the Declaration section.

2. OPEN the cursor in the Execution Section.
3. FETCH the data from the cursor into PL/SQL variables or records in the Execution
Section.
4. CLOSE the cursor in the Execution Section before you end the PL/SQL Block.

Syntax

General Syntax of using an explicit cursor in PL/SQL is:

DECLARE
variables;
records;
CURSOR cursor_name IS select_statement;
BEGIN
OPEN cursor_name;
LOOP
FETCH cursor_name INTO variables OR records;
EXIT WHEN cursor_name%NOTFOUND;

process the records;

END LOOP;
CLOSE cursor_name;
END;

1. What is a Trigger in PL/SQL?

A trigger is like a watchdog or an automatic alarm in your database.

It automatically runs some code when something happens to a table — like when you insert,
update, or delete a record.

You don't have to call it manually — the database "triggers" it by itself when the event happens.

2. Simple Example

Imagine you have a table called Bank_Account.

You want to:

• Automatically record whenever someone updates the account balance.

Instead of manually writing code every time, you can create a trigger that says:

"Hey database, whenever someone updates the Bank_Account table, automatically save the old
and new balance in a history table."

The trigger will do it every time without you worrying about it!
3. Real Life Analogy

Think about a motion sensor light:

• When someone walks into a room (event),

• The light automatically turns on (action).

You don't flip a switch — the sensor (trigger) handles it.

Triggers in PL/SQL work exactly the same way.

4. When Do Triggers Work?

Triggers can happen:

• Before the action (like before inserting or updating)

• After the action (like after inserting or deleting)
• Instead of the action (in special cases for views)

5. Basic Structure of a Trigger

Here’s a simple way a trigger looks in PL/SQL:

sql
CopyEdit
CREATE OR REPLACE TRIGGER trigger_name
BEFORE INSERT ON table_name
FOR EACH ROW
BEGIN
-- Code to run automatically
END;

Meaning:

• When a new row is inserted into the table,

• The code inside the trigger will automatically run before the insert happens.
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

FOR BCNF problems refer your note book.

Q1 Consider the relation schema R(A,B,C), which has the FD B → C. If A is a candidate key for R,
is it possible for R to be in BCNF? If so, under what conditions? If not, explain why not.
Sol : The only way R could be in BCNF is if B includes a key, i.e. B is a key for R

Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the
following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
■ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.

Example

➢ The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID,
which leads to un-necessary repetition of data.
➢ So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 24

DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

STUDENT_HOBBY

Concept of Surrogate Key:

✓ Alternate of Primary Key that allows duplication of data’s/records.
✓ Surrogate key is a unique identification key, it is like an artificial key to production key,
because the production key may be alphanumeric or composite key but the surrogate key is
always single numeric key.
✓ A surrogate key has the following characteristics:
i. The value is never reused and is unique within the whole system.
ii. It is system generated and an integer.
iii. The value cannot be manipulated by the user or application.
iv. The value is not an amalgam of different values from multiple domains.
✓ A Surrogate Keys can be generated in a variety of ways, and most databases offers ways to
generate surrogate keys.
Example: Oracle uses SEQUENCE,
MYSQL uses Auto_Increment,
and SQL Server uses IDENTITY.

Lossless join and Dependency preserving decomposition:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 25

1. What is a Join Dependency?

A join dependency happens when you can split a table into two or more smaller tables,
and then join (combine) them back together to get the original table — exactly the same,
without any missing or wrong data.

Think of it like breaking a big LEGO house into smaller sections and being able to
rebuild the same house perfectly.

2. Why are Join Dependencies important?

In databases, we often split big tables into smaller ones to:

• Remove repetition (redundancy)

• Make storage efficient
• Avoid update mistakes (anomalies)

But if splitting messes up the original information when rejoining, it’s a problem!
Join dependency ensures that after splitting and rejoining, nothing is lost or wrongly
added.

3. Formal Definition

In formal terms:
A relation (a table) R has a join dependency if it can be reconstructed exactly by joining
multiple projections (smaller tables with selected columns) without any extra or missing
rows.

4. Simple Example

Imagine a table:

Student Course Teacher

John Math Mr. A
Scienc
John Mr. B
e
Mary Math Mr. A

Now, split it into:

• Table 1: (Student, Course)

Student Course
John Math
Scienc
John
e
Mary Math

• Table 2: (Course, Teacher)

Course Teacher
Math Mr. A
Scienc Mr. B
e

Now, if you join Table 1 and Table 2 on the Course column, you get:

Student Course Teacher

John Math Mr. A
Scienc
John Mr. B
e
Mary Math Mr. A

Which is exactly the original table!

This means there is a join dependency.
Key Points to Remember

• Join dependency = No data loss when splitting and rejoining.

• Important for maintaining data integrity.
• Needed for 5th Normal Form (5NF).
• Deals with splitting a table into two or more parts.

1. What is 5th Normal Form (5NF)?

5th Normal Form (5NF) is a rule for organizing tables in a database.

It says:

A table is in 5NF if it has been broken down (split into smaller tables) as much as possible
without losing any data, and you can join those smaller tables back to get the original
table exactly.

5NF is all about removing very tricky, hidden repetition and making sure the table is
perfectly organized.

2. Why do we need 5NF?

Sometimes, even after splitting tables into smaller parts (like in 3NF or 4NF), there are still
complex hidden connections between the columns.

5NF ensures:

• No unnecessary duplication (no repeated data).

• No confusing relationships.
• Easier updates, inserts, and deletes (fewer mistakes).

It makes the database very clean and efficient.

3. Simple Example of 5NF

Imagine a company keeps track of:

• Which employees work on which projects using which machines.

Original Table:

Employee Project Machine

Alice P1 M1
Alice P1 M2
Bob P2 M1
Bob P2 M2

You notice:

• Alice works on Project P1 using Machine M1 and M2.

• Bob works on Project P2 using Machine M1 and M2.

But there's no direct link between Employee and Machine, only through the Project.

To make this cleaner (5NF), you split the table into:

• Employee-Project Table
• Project-Machine Table

Now you have two smaller tables:

Employee-Project:

Employee Project
Alice P1
Bob P2

Project-Machine:

Project Machine
P1 M1
P1 M2
P2 M1
P2 M2
If you join these two tables on Project, you can recreate the original table!

No extra information.
No missing information.

That’s what 5NF does!

4. Real-Life Analogy

Imagine building a burger:

• Buns (Employee)
• Patty (Project)
• Sauce (Machine)

Instead of making a big chart with all burger combinations,

you separately list:

• Which bun goes with which patty.

• Which patty goes with which sauce.

Then, you can create every correct burger combination just by joining the two lists.
No wasted ingredients, no wrong burgers.

That’s 5NF!

5. Key Points to Remember

• 5NF fixes complicated, hidden redundancies.

• 5NF splits a table into smaller tables linked by join dependencies.
• After splitting, joining the tables gives back the original without error.
• It is mostly needed in very complex databases (not always required for
small/simple databases).
1. What is Domain-Key Normal Form (DKNF)?

Domain-Key Normal Form (DKNF) is the highest and most perfect level of
normalization in a database.

A table is in DKNF if every rule (constraint) about the data is based only on domains (the
allowed types of values) and keys (unique identifiers).

In DKNF:

• No other strange or hidden rules are needed to keep the data correct.
• Only the meaning (domain) and uniqueness (key) of data control the table.

2. Breaking it down simply

• Domain = The type or range of values a column can have.

(Example: Age must be a number between 0 and 150.)
• Key = A column or group of columns that uniquely identifies a row.
(Example: Aadhar number, student ID.)

DKNF says:

"If your table is correct just because of domain rules and key rules, and you don’t need any
extra weird conditions, then your table is perfect — it's in DKNF."

3. Why is DKNF important?

• It completely removes anomalies (problems when inserting, deleting, or updating

data).
• The data becomes very clean and safe.
• It’s the ultimate goal of normalization — but reaching DKNF is very hard in real life!
4. Example

Imagine you have this table:

Student_ID Course Grade

101 Math A
Scienc
102 B
e

• Domain rules:
o Student_ID must be a number.
o Grade must be A, B, C, D, or F.
• Key rules:
o (Student_ID, Course) together must be unique.

If all the data rules are based only on these domain rules and key rules,
and there are no extra rules like "if a student is taking Math, grade must be A or B only",
then the table is in DKNF!

5. Real Life Analogy

Imagine filling a form:

• Domain rule: Name must have only letters.

• Key rule: Each passport number must be unique.

If these two rules are enough to make the form valid, and you don't need extra manual
checks, then your form is like DKNF — clean, perfect, self-managed!

6. Quick Points to Remember

Point Meaning
Domain Type or range of allowed values
Key Unique identifier for rows
Only domain and key rules control data
DKNF
correctness
Result No anomalies, pure clean database
Difficulty Hard to achieve in practice
Query Processing in DBMS
Query Processing is the activity performed in extracting data from the
database. In query processing, it takes various steps for fetching the data
from the database. The steps involved are:

1. Parsing and translation

2. Optimization

3. Evaluation

The query processing works in the following way:

Parsing and Translation

As query processing includes certain activities for data retrieval. Initially, the
given user queries get translated in high-level database languages such as
SQL. It gets translated into expressions that can be further used at the
physical level of the fle system. After this, the actual evaluation of the
queries and a variety of query -optimizing transformations and takes place.
Thus before processing a query, a computer system needs to translate the
query into a human-readable and understandable language. Consequently,
SQL or Structured Query Language is the best suitable choice for humans.
But, it is not perfectly suitable for the internal representation of the query to
the system. Relational algebra is well suited for the internal representation of
a query. The translation process in query processing is similar to the parser
of a query. When a user executes any query, for generating the internal form
of the query, the parser in the system checks the syntax of the query,
verifes the name of the relation in the database, the tuple, and fnally the
required attribute value. The parser creates a tree of the query, known as
'parse-tree.' Further, translate it into the form of relational algebra. With this,
it evenly replaces all the use of the views when used in the query.
Thus, we can understand the working of a query processing in the below-
described diagram:

Suppose a user executes a query. As we have learned that there are various
methods of extracting the data from the database. In SQL, a user wants to
fetch the records of the employees whose salary is greater than or equal to
10000. For doing this, the following query is undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be

translated in the form of relational algebra. We can bring this query in the
relational algebra form as:

o σsalary>10000 (πsalary (Employee))

o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra
operation by using diferent algorithms. So, in this way, a query processing
begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is required to

annotate the translated relational algebra expression with the instructions
used for specifying and evaluating each operation. Thus, after translating the
user query, the system executes a query evaluation plan.

Query Evaluation Plan

o In order to fully evaluate a query, the system needs to construct a query
evaluation plan.

o The annotations in the evaluation plan may refer to the algorithms to be used
for the particular index or the specifc operations.

o Such relational algebra with annotations is referred to as Evaluation

Primitives. The evaluation primitives carry the instructions needed for the
evaluation of the operation.

o Thus, a query evaluation plan defnes a sequence of primitive operations

used for evaluating a query. The query evaluation plan is also referred to
as the query execution plan.

o A query execution engine is responsible for generating the output of the

given query. It takes the query execution plan, executes it, and fnally makes
the output for the user query.

Optimization
o The cost of the query evaluation can vary for diferent types of queries.
Although the system is responsible for constructing the evaluation plan, the
user does need not to write their query efciently.
o Usually, a database system generates an efcient query evaluation plan,
which minimizes its cost. This type of task performed by the database system
and is known as Query Optimization.

o For optimizing a query, the query optimizer should have an estimated cost
analysis of each operation. It is because the overall operation cost depends
on the memory allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query
and produces the output of the query.
ADBMS
Unit 2
Transaction processing and Concurrency control

Transaction

o The transaction is a set of logically related operation. It contains a group

of tasks.
o A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.

Example: Suppose an employee of bank transfers Rs 800 from X's account to

Y's account. This small transaction contains several low-level tasks:

X's Account

1. Open_Account(X)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance - 800
4. [Link] = New_Balance
5. Close_Account(X)

Y's Account

1. Open_Account(Y)
2. Old_Balance = [Link]
3. New_Balance = Old_Balance + 800
4. [Link] = New_Balance
5. Close_Account(Y)

Operations of Transaction:

Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from
the buffer.
Let's take an example to debit transaction from an account which consists of
following operations:

1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);

Let's assume the value of X before starting of the transaction is 4000.

Following are the main operations of transaction:

o he first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's
final value will be 3500.

But it may be possible that because of the failure of hardware, software or

power, etc. that transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is
not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

ACID Properties
The transaction has the four properties. These are used to maintain consistency
in a database, before and after the transaction.

Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity

o It states that all operations of the transaction take place at once if not, the
transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A

consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.

T1 T2

Read(A) Read(B)
A:= A-100 Y:=
Write(A) Write(B)

After completion of the transaction, A consists of Rs 500 and B consists of Rs

400.

If the transaction T fails after the completion of transaction T1 but before

completion of transaction T2, then the amount will be deducted from A but not
added to B. This shows the inconsistent database state. In order to ensure
correctness of database state, the transaction must be executed in entirety.

Consistency
o The integrity constraints are maintained so that the database is consistent
before and after the transaction.
o The execution of a transaction will leave a database in either its prior
stable state or a new stable state.
o The consistent property of database states that every transaction sees a
consistent database instance.
o The transaction is used to transform the database from one consistent
state to another consistent state.
For example: The total amount must be maintained before or after the
transaction.
1. Total before T occurs = 600+300=900
2. Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2
fails, then inconsistency will occur.

Isolation
o It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one is
completed.
o In isolation, if the transaction T1 is being executed and using the data
item X, then that data item can't be accessed by any other transaction T2
until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation
property.

Durability
o The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or
by the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot
be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.
States of Transaction
The different stages a transaction goes through during its lifecycle are known as
the transaction states. The following is a diagrammatic representation of the
different stages of a transaction.

Active state

o The active state is the first state of every transaction. In this state, the
transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But
all the records are still not saved to the database.

Partially committed

o In the partially committed state, a transaction executes its final operation,

but the data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks
step is executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations

successfully. In this state, all the effects are now permanently saved on the
database system.
Failed state

o If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire
a query to fetch the marks, then the transaction will fail to execute.

Aborted

o If any of the checks fail and the transaction has reached a failed state then
the database recovery system will make sure that the database is in its
previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled back to
its consistent state.
o After aborting the transaction, the database recovery module will select
one of the two operations:
1. Re-start the transaction
2. Kill the transaction

Example

Let us take a very simple example of Railway ticket booking. Can you think of
the things that need to be retrieved from the database when you initiate the
booking process?

You will need the train details, the already booked ticket details, the platform
details, and many more such things. Now, once these details are retrieved the
transaction of booking a ticket enters the active state.

After the user has completed the entire process of booking a ticket from their
end, the transaction enters the partially committed state. In case any error
occurred during the process, then the transaction will enter the failed state.

Now, say the process was successful and the transaction entered the partially
committed state, now if the saving in the database is completed successfully
then the transaction enters the committed state. In case there is any error while
saving in the database then it enters the failed state.
Anything from the failed state enters the aborted state so that rollbacks can
take place and the database consistency is maintained.

Now, let’s talk about the terminated state. If the booking is permanently saved
in the database, or it has been aborted due to some unforeseen reasons then the
transaction enters the terminated state.

Serializability
 In the field of computer science, serializability is a term that is a property
of the system that describes how the different process operates the shared
data.
 If the result given by the system is similar to the operation performed by
the system, then in this situation, we call that system serializable.
 Here the cooperation of the system means there is no overlapping in the
execution of the data. In DBMS, when the data is being written or read
then, the DBMS can stop all the other processes from accessing the data.
 A schedule is serialized if it is equivalent to a serial schedule. A
concurrent schedule must ensure it is the same as if executed serially
means one after another. It refers to the sequence of actions such as read,
write, abort, commit are performed in a serial manner.

Schedules in DBMS are of two types:

1. Serial Schedule - A schedule in which only one transaction is executed at

a time, i.e., one transaction is executed completely before starting another
transaction.

Example:

Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
2. Here, we can see that Transaction-2 starts its execution after the
completion of Transaction-1.

2. Non serial schedule − When a transaction is overlapped between the

transaction T1 and T2.

Example:

Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)

We can see that Transaction-2 starts its execution before the completion of
Transaction-1, and they are interchangeably working on the same data, i.e., "a"
and "b".

Types of serializability

There are two types of serializability −

1. Conflict serializability
Conflict serializability is a type of conflict operation in serializability that
operates the same data item that should be executed in a particular order and
maintains the consistency of the database. In DBMS, each transaction has some
unique value, and every transaction of the database is based on that unique
value of the database.

This unique value ensures that no two operations having the same conflict value
are executed concurrently. For example, let's consider two examples, i.e., the
order table and the customer table. One customer can have multiple orders, but
each order only belongs to one customer. There is some condition for the
conflict serializability of the database. These are as below.

o Both operations should have different transactions.

o Both transactions should have the same data item.
o There should be at least one write operation between the two operations.

If there are two transactions that are executed concurrently, one operation has to
add the transaction of the first customer, and another operation has added by the
second operation. This process ensures that there would be no inconsistency in
the database.

The conflicting pairs are:

1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)

2. View serializability
If a non-serial schedule is view equivalent to some other serial schedule then
the schedule is called View Serializable Schedule. It is needed to ensure the
consistency of a schedule.

What is view equivalency?

The two conditions needed by schedules(S1 and S2) to be view equivalent are:
1. Initial read must be on the same piece of data.
Example: If transaction t1 is reading "A" from database in schedule S1, then in
schedule S2, t1 must read A.
2. Final write must be on the same piece of data.
Example: If a transaction t1 updated A at last in S1, then in S2, t1 should
perform final write as well.
3. The mid sequence should also be in the same order.
Example: If t1 is reading A which is updated by t2 in S1, then in S2, t1 should
read A which should be updated by t2.

This process of checking view equivalency of a schedule is called View

Serializability.

Example: We have a schedule "S" having two transactions t1, and t2 working
simultaneously.

t1 t2
R(x)
W(x)
t1 t2
R(x)
W(x)
R(y)
W(y)
R(y)
W(y)

Let's form its view equivalent schedule (S') by interchanging mid-read-write

operations of both the transactions. S':

t1 t2
R(x)
W(x)
R(y)
W(y)
R(x)
W(x)
R(y)
W(y)

Since a view equivalent schedule is possible, it is a view serializable schedule.

Prioritization
Prioritization is useful for browsing tasks, and tasks that use a lot of processor
time. Input/Output bound tasks can take the required amount of CPU, and move
on to the next read/write wait. CPU-intensive tasks take higher priority over the
less intensive tasks. Prioritization can be implemented in all CICS® systems. It
is more important in a high-activity system than in a low-activity system. With
careful priority selection, you can improve overall throughput and response
time. Prioritization can minimize resource usage of certain resource-bound
transactions. Prioritization increases the response time for lower-priority tasks,
and can distort the regulating effects of MXT and the MAXACTIVE attribute of
the transaction class definition.

Priorities do not affect the order of servicing terminal input messages and,
therefore, the time they wait to be attached to the transaction manager. Because
prioritization is determined in three sets of definitions (terminal, transaction,
and operator), it can be a time-consuming process for you to track many
transactions in a system. CICS prioritization is not interrupt-driven as is the case
with operating system prioritization, but determines the position on a ready
queue. This means that, after a task is given control of the processor, the task
does not relinquish that control until it issues a CICS command that calls the
CICS dispatcher. After the dispatch of a processor-bound task, CICS can be tied
up for long periods if CICS requests are infrequent. For that reason,
prioritization should be implemented only if MXT and the MAXACTIVE
attribute of the transaction class definition adjustments have proved to be
insufficient.

You should use prioritization sparingly, if at all, and only after you have already
adjusted task levels using MXT and the MAXACTIVE attribute of the
transaction class definition. It is probably best to set all tasks to the same
priority, and then prioritize some transactions either higher or lower on an
exception basis, and according to the specific constraints in a system. Do not
prioritize against slow tasks unless you can accept the longer task life and
greater dispatch overhead; these tasks are slow, in any case, and give up control
each time they have to wait for I/O. Use small priority values and differences
and concentrate on transaction priority. Give priority to control operator tasks
rather than the person, or at least to the control operator's signon ID rather than
to a specific physical terminal (the control operator may move around).

Consider for high priority a task that uses large resources. However, the effects
of this on the overall system need careful monitoring to ensure that loading a
large transaction of this type does not lock out other transactions. Also
consider for high priority those transactions that cause enqueues to system
resources, thus locking out other transactions. As a result, these can process
quickly and then release resources. Here are some examples:

 Using intrapartition transient data with logical recovery

 Updating frequently used records
 Automatic logging
 Tasks needing fast application response time, for example, data entry.

Lower priority should be considered for tasks that:

 Have long browsing activity

 Are process-intensive with minimal I/O activity
 Do not require terminal interaction, for example:
o Auto-initiate tasks (except when you use transient data
intrapartition queues that have a destination of terminal defined
and a trigger level that is greater than zero).
Introduction to Distributed Databases
DDB Technology :

Merger of two technologies:

o database technology, and

o network and data communication technology.

Computer networks allow distributed processing of data.

Traditional databases, on the other hand, focus on providing centralized, controlled access to data.

Distributed databases allow an integration of information and its processing by applications.

• Distributed database (DDB) as a collection of multiple logically interrelated databases

distributed over a computer network,

• Distributed database management system (DDBMS) as a software system that manages a

distributed database while making the distribution transparent to the user.

When database to be called distributed ?

• There are multiple computers, called sites or nodes. These sites must be connected by an
underlying communication network to transmit data and commands among sites, as shown

• ■ Logical interrelation of the connected databases- It is essential that the information in the
databases be logically related.

• ■ Absence of homogeneity constraint among connected nodes. -It is not necessary that all
nodes be identical in terms of data, hardware, and software.

Transparency

• The internal details of the distribution are hidden from the users(hiding implementation details
from end users.)

Data organization transparency (also known as distribution or network transparency).

This refers to freedom for the user from the operational details of the network and the placement
of the data in the distributed system.

• It may be divided into location transparency and naming transparency.

• Location transparency refers to the fact that the command used to perform a task is
independent of the location of the data and the location of the node where the command was
issued. (user need not be aware about physical location of database)

• Naming transparency implies that once a name is associated with an object, the named objects
can be accessed unambiguously without additional specification as to where the data is located.
(user need not be provide any additional information about name of database)

Replication transparency.

• copies of the same data objects may be stored at multiple sites for better availability,
performance, and reliability.

• user unaware of the existence of these copies.

Fragmentation transparency.

• Two types of fragmentation are possible.

• Horizontal fragmentation-- distributes a relation (table) into subrelations that are subsets of
the tuples (rows) in the original relation.

• Vertical fragmentation -distributes a relation into subrelations where each subrelation is

defined by a subset of the columns of the original relation.

design transparency and execution transparency—

• referring to freedom from knowing how the distributed database is designed and where a
transaction executes.

Autonomy

• Autonomy determines the extent to which individual nodes or DBs in a connected DDB can
operate independently

• Design autonomy refers to independence of data model usage and transaction management
techniques among nodes.

• Communication autonomy determines the extent to which each node can decide on sharing of
information with other nodes.

• Execution autonomy refers to independence of users to act as they please.

Reliability and Availability

• Reliability is broadly defined as the probability that a system is running (not down) at a certain
time point,

• Availability is the probability that the system is continuously available during a time interval

Advantages of Distributed Databases

• Improved ease and flexibility of application development.

• Developing and maintaining applications at geographically distributed sites of an organization is

facilitated due to transparency of data distribution and control.

• Improved performance- Data localization reduces the contention for CPU and I/O services and
simultaneously reduces access delays involved in wide area networks.
Distributed Data Storage
Consider a relation r that is to be stored in the database.

There are two approaches to storing this relation in the distributed database:

Replication.

The system maintains several identical replicas (copies) of the relation, and stores each replica at a
different site.

Fragmentation.

The system partitions the relation into several fragments, and stores each fragment at a different site.

Data Replication

• If relation r is replicated, a copy of relation r is stored in two or more sites.

Advantages and disadvantages to replication

1 )Availability

2)Increased parallelism :

• The no of transactions can read relation r in parallel

• The more replicas of r there are, the greater the chance that the needed data will be found in
the site where the transaction is executing.

3) Increased overhead on update.

• The system must ensure that all replicas of a relation r are consistent; otherwise, erroneous
computations may result.

• Thus, whenever r is updated, the update must be propagated to all sites containing replicas.
The result is increased overhead.

Data Fragmentation

• If relation r is fragmented, r is divided into a number of fragments

r1, r2, . . . , r n.

• These fragments contain sufficient information to allow reconstruction of the original relation r.

Horizontal fragmentation,

• a relation r is partitioned into a number of subsets,

r1, r2, . . . , r n.

Each tuple of relation r must belong to at least one of the fragments, so that the original relation can be
reconstructed, if needed.

keep tuples at the sites where they are used the most, to minimize data transfer.

defined as a selection on the global relation r.

That is,we use a predicate Pi to construct fragment ri:

ri = σPi (r)

Example : Horizontal fragmentation

• Consider the account relation

• Account = (acc_no, branch_name, balance)

• If the banking system has only two branches - Hillside and Valley view, then there are two
different fragments :

Vertical fragmentation

• Vertical fragmentation split the relation by decomposing the scheme R of relation 'r'.

• Vertical fragmentation of r(R) involves the definition of several subset of attributes R1, R2, ........
Rn, of the scheme R so that
UNIT - II

Parallel Databases

A parallel database system seeks to improve performance through parallellization of various operations,
such as loading data, building indexes and evaluating queries. Although data may be stored in a
distributed fashion such a system, the distribution is governed solely by performance considerations.

1.2 Parallel Systems :

 Parallel systems improve processing and I/O speeds by using multiple CPUs and disks in parallel.
Parallel machines are becoming increasingly common, making the study of parallel database
systems correspondingly more important.
 The driving force behind parallel database systems is the demands of applications that have to
query an extremely large databases or that have to process large number of transactions per second
(of the order of thousands of transactions per second).
 Centralized and client-server database systems are not powerful enough to handle such
applications.
 In parallel processing, many operations are performed simultaneously, as opposed to serial
processing, in which the computational steps are performed sequentially.
 A coarse-grain parallel machine consists of the small number of powerful processors; a massively
parallel or fine-grain parallel machine uses thousands of smaller processors.
 Most high-end machines today offer some degree of coarse-grain parallelism: Two or more
processor machines are common.
1.2.1 Measures of Performance of Database Systems :

There are two main measures of performance of a database system :

1. Throughout, the number of tasks that can be completed in a given time interval.
2. Response time, the amount of time it takes to complete a single task from the time it is submitted.
A system that processes a large number of small transactions can improve throughout by
processing many transactions in parallel. A system that processes large transactions can improve
response time as well as throughout by performing subtasks of each transaction in parallel.
1.2.2 Speedup and Scaleup :
Two important issues in studying parallelism are :
(1) Speedup :
Running a given task in less time by increasing the degree of parallelism is called speedup.
(2) Scaleup :
Handling larger tasks by increasing the degree of parallelism is scaleup.
Speedup :
 Consider a database application running on a parallel system with a certain number of processors
and disks. Now suppose that we increase the size of the system by increasing the number or
processors, disks, and other components of the system.
 The goal is to process the task in time inversely proportional to the number of processors and disks
allocated.
 The parallel system is said to demonstrate linear speedup if the speedup is N when the larger
system has N times the resources (CPU, disk, and so on) of the smaller system.
 If the speedup is less than N, the system is said to demonstrate sublinear speedup
Fig. 1.1 illustrates linear and sublinear speedup.

Fig. 1.1 : Speedup

Scaleup :
 Scaleup relates to the ability to process larger tasks in the same amount of time by providing more
resources.
 Let Q be a task and QN be a task that is N times bigger than Q. Suppose execution time of task Q on
machine MS is TS and the execution time of task QN on parallel machine ML which is N times larger
than MS is TL.
Scaleup is defined as TS / TL.
Where,
TL : Execution time of a task on the larger machine
TS : The execution, time of the same task on the smaller machine
The parallel system ML is said to demonstrate linear scaleup on task Q if.
TL = TS.
If TL > TS the system is said to demonstrate sublinear scaleup.

Fig. 1.2 : Scaleup

1.3 Architectures for Parallel Databases :

There are several architectural models for parallel machines. Among the most prominent ones are
those in Fig. 1.3 (In the Fig. 1.3, M denotes memory, P denotes a processor, and disks are shown as
cylinders)
 Shared memory : All the processors share a common memory (Fig. 1.3(a)).

Fig. 1.3(a) : Shared memory

o Shared disk : All the processors share a common set of disk (Fig. 1.3(b)). Shared-disk are
sometimes called clusters.

Fig. 1.3(b) : Shared disk

o Shared nothing : The processors share neither a common memory nor common disk (Fig.
1.3(c)).
Fig. 1.3(c) : Shared nothing

o Hierarchical : This model is a hybrid of the preceding three architectures

(Fig. 1.3(d)).
Techniques used to speedup transaction processing on data-server systems, such and lock caching
and lock de-escalation, can also be in shared-disk parallel databases as well as in shared-nothing parallel
databases. In fact, they are very important for efficient transaction processing in such systems.

1.3.1 Shared Memory :

In shared memory architecture, the processors and disks have access to a common memory,
typically via a bus or through an interconnection network.

Advantages :

 The benefit shared memory is extremely efficient communication between processors. Data in
shared memory can be accessed by any processor without being moved with software.
 A processor can send messages to other processors much faster by using memory writes (which
usually rake less than a microsecond) than by sending a message through communication
mechanism.
Disadvantages :

 The downside of shared-memory that the architecture is not scalable beyond 32 or 64 processors
because the bus or interconnection network becomes a bottleneck (since it is shared by all
processors).
 Adding more number of processors should be avoided as they most of the time in waiting for their
turn on the bus to access memory.
 Shared-memory architectures usually have large memory caches at each processor so that
referencing of the shared memory is avoided whenever possible.
 However, at least some of the data will not be in the cache and accesses will have to go to the
shared memory. Moreover, the caches need to be kept coherent.
 Maintaining cache-coherency becomes an increasing overhead with increasing overhead with
increasing number of processors.
 Consequently, shared memory machines are not capable of scaling up beyond a point; current
shared-memory machines cannot support more than 64 processors.

1.3.2 Shared Disk :

In the shared-disk model, all processors can access all disks directly via an interconnection
network, but the processors have private memories.

Advantages :

 Since each processor has its own memory, the memory bus is not a bottleneck.
 It offers a cheap way to provide a degree of fault tolerance.
 If a processor (or its memory) fails, the other processor can take over its tasks, since the database is
resident on disks that are accessible from all processors.
 We can make the disk subsystem itself fault tolerant by using RAID architecture,
 The shared-disk architecture has found acceptance in many applications.
Disadvantages :

 The main problem with a shared-disk system is again scalability.

 Although the memory bus is no longer a bottleneck, the interconnection to the disk subsystem is
now a bottleneck; it is particularly so in a situation where the database makes a large number of
accesses to disks.
 Compared to shared memory systems, shared-disk systems can scale to a somewhat larger number
of processors, but communication across processor is slower, since it has to go through a
communication network.

Example :

DEC clusters running Rdb were One of the early commercial users of the shared disk database
architecture. (Rdb is now owned by Oracle, and is tailed Oracle Rdb. Digital Equipment Corporation
(DEC) is now owned by Compaq.)

1.3.3 Shared Nothing :

 In a shared-nothing system, each node of the machine consists of a processor, memory, and one or
more disks.
 The processors at one node may communicate with one another processor at another node by a
high-speed interconnection network.
 A node functions as the server for the data on the disk or disks that the node owns. Since local disk
references are serviced by local disks at each processor.

Advantages :

 The shared-nothing model overcomes the disadvantage of requiring all I/O to go through a singly
interconnection network; only queries, accesses to non local disks, and result relations pass through
the network.
 Moreover, the interconnection networks for shared nothing systems are usually designed to be
scalable, so that their transmission capacity increases as more nodes are added.
 Consequently, shared-nothing architectures are more scalable, and can easily support a large
number of processors.

Disadvantage :

The main drawback of shared nothing systems is the costs of communication and of nonlocal disk
access, which are higher than in a shared memory or shared-disk architecture since sending data involves
software interaction at both ends.
Applications :

 The Teradata database machine was among (the earliest commercial systems to use the shared-
nothing database architecture.
 The Grace and the Gamma research prototypes also used shared-nothing architectures.
1.3.4 Hierarchical :

 The hierarchical architecture combines the characteristics of shared-memory, shared-disk, and

shared-nothing architectures.
 At the top level, the system consists of nodes connected by an interconnection network, and do not
share disks or memory with one another. Thus, the top level is a shared-nothing architecture.
 Each node of the system could actually be a shared-memory system with a few processors
Alternatively, each node could be a shared-disk system, and each of the systems sharing a set of
disks could be a shared-memory system.
 Thus, a system could be built as a hierarchy, with shared-memory architecture with a few
processors at the base, and a shared-nothing architecture at the top, with possibly shared-disk
architecture in the middle.
 Fig. 1.3(d) illustrates a hierarchical architecture with shared-memory nodes connected together in a
shared nothing architecture.
 Commercial parallel database systems today run on several of these architectures.
 Attempts to reduce the complexity of programming such systems have yielded distributed virtual
memory architectures, where logically there is a single shared memory, but physically there are
multiple disjoint memory systems; the virtual-memory-mapping hardware, coupled with system
software, allows each processor to view the disjoint memories as a single virtual memory.
 Since access speeds differ, depending whether the page is available locally or not, such architecture
is also referred to as nonuniform memory architecture (NUMA).

Fig. 1.3(d)

1.3.5 Parallel Query Evaluation :

Now we try to understand parallel evaluation of a relational query in a DBMS with a shared-
nothing architecture. While it is possible to consider parallel execution of multiple queries, it is hard to
identify in advance which queries will run concurrently. So the emphasis has been on parallel execution
of a single query.
 A relational query execution plan is a graph of relational algebra operators, and the operators in a
graph can be executed in parallel. If one operator consumes the output of a second operator, we
have pipelined parallelism.(the output of the second operator is worked on by the first operator as
soon as it is generated)
 If not, the two operators can proceed essentially independently. An operator is said to block if it
produces no output until it has consumed all inputs. Pipelined parallelism is limited by the presence
of operators that block.
 To evaluate different operators in parallel, we can evaluate each individual operator in a query plan
in a parallel fashion. The key to evaluating operator in parallel is to partition the input data; we can
then work on a partition in parallel and combine the results. This approach is called a partitioned
parallel evaluation.
 An important observation, which explains why shared-nothing parallel database system have been
very successful, is that database query evaluation is very amenable to data-partitioned parallel
evaluation.
 The goal is to minimize data shipping by partitioning the data and structuring the algorithms to do
most of the processing at individual processors.

1.4 I/O Parallelism :

Definition : I/O parallelism refers to reducing the time required to retrieve relations from disk by
partitioning the relations on multiple disks. The most common form of data partitioning in a parallel
database environment is horizontal partitioning.
In horizontal partitioning, the tuples of a relation are divided (or declustered) among many disks,
so that each tuple resides on one disk. Several partitioning strategies have been proposed.
UNIT-IV

XML: Extensible Markup Language

 XML neither programming language nor presentation language.

 It is used to transfer data between applications and databases.
 It describes the data and focuses on what data is.
 XML tags are not predefined in XML. You must define your own tag
 XML uses a DTD (Document Type Definition) to formally describe the data.

Features of XML

 XML is heavily used as a format for document storage and processing, both online and offline.

 Enhances search ability, making it possible for search engines to categorize data instead of
wasting processing power on context-based full-text searches.

 XML does not allow References to external data entities. Named character references are not
allowed in XML.

 XML does not allow empty comment declaration.

 XML is extensible, because it only specifies the structural rules of tags. No specification on tags
them self.

 Excellent for handling data with a complex structure

 Handles data in a tree structure having one-and only one-root element

 Excellent for long-term data storage and data reusability

 Xml separate data from HTML

 XML data is stored in plain text format. This provides a software- and hardware-independent
way of storing data.

 This makes it much easier to create data that can be shared by different applications.

 XML Simplifies Data Transport

 XML data is stored in text format. This makes it easier to expand or upgrade to new operating
systems, new applications, or new browsers, without losing data.

 XML increase data availability.

DTD XML

• A Document Type Definition (DTD) defines the legal building blocks of an XML document.

• It defines the document structure with a list of legal elements and attributes.

• A DTD can be declared inline inside an XML document, or as an external reference.

• Internal DTD Declaration

• If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with the
following syntax:

• !DOCTYPE note defines that the root element of this document is note

• !ELEMENT note defines that the note element contains four elements: "to, from, heading,
body"

• !ELEMENT to defines the to element to be of type "#PCDATA"

• !ELEMENT from defines the from element to be of type "#PCDATA"

• !ELEMENT heading defines the heading element to be of type "#PCDATA"

• !ELEMENT body defines the body element to be of type "#PCDATA"

External DTD Declaration

• If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the
following syntax:

• <!DOCTYPE root-element SYSTEM "filename">

• This is the same XML document as above, but with an external DTD
Why Use a DTD?

• With a DTD, each of your XML files can carry a description of its own format.

• With a DTD, independent groups of people can agree to use a standard DTD for interchanging
data.

• Your application can use a standard DTD to verify that the data you receive from the outside
world is valid.

• You can also use a DTD to verify your own data

XML Schema:
• XML Schema is an XML-based alternative to DTD.

• An XML schema describes the structure of an XML document.

• The XML Schema language is also referred to as XML Schema Definition (XSD).

What is an XML Schema?

• The purpose of an XML Schema is to define the legal building blocks of an XML document, just
like a DTD.

An XML Schema:

• defines elements that can appear in a document

• defines attributes that can appear in a document

• defines which elements are child elements

• defines the order of child elements

• defines the number of child elements

• defines whether an element is empty or can include text

• defines data types for elements and attributes

defines default and fixed values for elements and attributes

XML Schemas are the Successors of DTDs

• We think that very soon XML Schemas will be used in most Web applications as a replacement
for DTDs.

Here are some reasons:

• XML Schemas are extensible to future additions

• XML Schemas are richer and more powerful than DTDs

• XML Schemas are written in XML

• XML Schemas support data types

• XML Schemas support namespaces

• XML Schemas are much more powerful than DTDs

XML Schemas Support Data Types

One of the greatest strength of XML Schemas is the support for data types.

With support for data types:

• It is easier to describe allowable document content

• It is easier to validate the correctness of data

• It is easier to work with data from a database

• It is easier to define data facets (restrictions on data)

• It is easier to define data patterns (data formats)

• It is easier to convert data between different data types

XML Schemas use XML Syntax

• Another great strength about XML Schemas is that they are written in XML.
Some benefits of that XML Schemas are written in XML:

• You don't have to learn a new language

• You can use your XML editor to edit your Schema files

• You can use your XML parser to parse your Schema files

• You can manipulate your Schema with the XML DOM

You can transform your Schema with XSLT

XML Schemas Secure Data Communication

• When sending data from a sender to a receiver, it is essential that both parts have the same
"expectations" about the content.

• With XML Schemas, the sender can describe the data in a way that the receiver will understand.

• A date like: "03-11-2004" will, in some countries, be interpreted as [Link] and in other
countries as [Link].

• However, an XML element with a data type like this:

• ensures a mutual understanding of the content, because the XML data type "date" requires the
format "YYYY-MM-DD".

XML Schemas are Extensible

• XML Schemas are extensible, because they are written in XML.

With an extensible Schema definition you can:

• Reuse your Schema in other Schemas

• Create your own data types derived from the standard types

• Reference multiple schemas in the same document

What is a Simple Element?

• A simple element is an XML element that can contain only text.

• It cannot contain any other elements or attributes.

• However, the "only text" restriction is quite misleading. The text can be of many different types.
• It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or
it can be a custom type that you can define yourself.

• You can also add restrictions (facets) to a data type in order to limit its content, or you can
require the data to match a specific pattern.

The syntax for defining a simple element is:

• <xs:element name="xxx" type="yyy"/>

where xxx is the name of the element and yyy is the data type of the element.

• XML Schema has a lot of built-in data types. The most common types are:

• xs:string • xs:decimal xs:integer • xs:boolean • xs:date • xs:time

API for XML

DOM
• document object model (DOM), treats XML content as a tree, with each element
represented by a node, called a DOMNode.
• Programs may access parts of the document in a navigational fashion, beginning with the
root.

• DOM libraries are available for most common programming langauges and are even
present in Web browsers, where it may be used to manipulate the document displayed to
the user.
• The Java DOM API provides an interface called Node, and interfaces Element and
Attribute, which inherit from the Node interface.

• The Node interface provides methods such as

getParentNode(), getFirstChild(), and
getNextSibling(),to navigate the DOM tree, starting with the root node.
• Subelements of an element can be accessed by name getElementsByTagName(name),
which returns a list of all child elements with a specified tag name.

• individual members of the list can be accessed by the method item(i), which returns the
ith element in the list.

• Attribute values of an element can be accessed by name, using the method getAttribute(
name). The text value of an element is modeled as a Text node.

• The method getData() on the Text node returns the text contents.

• DOM also provides a variety of functions for updating the document by adding and
deleting attribute and element children of a node, setting node values, and so on.

• DOM can be used to access XML data stored in databases, and an XML database can be
built using DOM as its primary interface for accessing and modifying data.

SAX (Simple API for XML) is an event sequential access parser API developed by the XML-DEV mailing
list for XML documents.

• SAX provides a mechanism for reading data from an XML document that is an alternative to that
provided by the Document Object Model (DOM).

Where the DOM operates on the document as a whole, SAX parsers operate on each piece of
the XML document sequentially.

• It does not first create any internal structure

• Client does not specify what methods to call

• Client just overrides the methods of the API and place his own code inside there
• When the parser encounters start-tag, end-tag,etc., it thinks of them as events

• When such an event occurs, the handler automatically calls back to a particular method
overridden by the client, and feeds as arguments the method what it sees

• SAX parser is event-based,

• it works like an event handler in Java (e.g. MouseAdapter)

Advantage:

(1) It is simple

(2) It is memory efficient

(3) It works well in stream application

Disadvantage:

• The data is broken into pieces and clients never have all the information as a whole unless they
create their own data structure

Querying and Transformation

In particular, tools for querying and transformation of XML data are essential to extract
information from large bodies of XML data, and to convert data between different
representations (schemas) in XML.

Just as the output of a relational query is a relation, the output of an XML query can be an XML
document. As a result, querying and transformation can be combined into a single tool.

Several languages provide increasing degrees of querying and transformation capabilities:

• XPath is a language for path expressions, and is actually a building block for the remaining two
query languages.

• XSLT was designed to be a transformation language, as part of the XSL style sheet system,
which is used to control the formatting of XML data into HTML or other print or display
languages. Although designed for formatting, XSLT can generate XML as output, and can
express many interesting queries.

• XQuery has been proposed as a standard for querying of XML data

XPath
XPath addresses parts of an XML document by means of path expressions.

The language can be viewed as an extension of the simple path expressions in object-oriented
and object-relational databases
A path expression in XPath is a sequence of location steps separated by ―/‖
The result of a path expression is a set of values.

For instance, on the document in above Figure

the XPath expression

/bank-2/customer/name

would return these elements:

The expression
/bank-2/customer/name/text()
would return the same names, but without the enclosing tags.
X-Query

• The best way to explain XQuery is to say that XQuery is to XML what SQL is to database
tables.

• It is the language for querying XML data.

• XQuery is a language for finding and extracting elements and attributes from XML documents.

• XQuery is designed to query XML data – not just XML files, but anything that can appear as
XML.

Uses of XQuery

• Extract information to use in a Web Service

• Query XML documents

• Read data from databases and generate reports

• Transform XML data

FLWOR
• For – binds a variable to each item returned by the in expression

Let – allows variable assignments

Where – used to specify criteria for result

Order by – defines the sort-order

Return – specifies what is to be returned

General expression: FLWOR expression FOR < for-variable > IN < in-expression >
LET < let-variable > := < let-expression> [ WHERE < filter-expression> ]
[ ORDER BY < order-specification > ] RETURN <expression>

Example: retrieve the name of instructors who have a salary that is higher than 30000

• for $x in doc(”[Link]")/university/instructor
where $x/salary>30000
return <instr> {$x/name} </instr>
For/Let Clause :
• for <variable> in <expression>, . .

• Variables begin with $.

• To bind values to single or multiple variables FOR/LET clause used

• When iteration are required FOR is used.

• LET clause values to a single or multiple variables as FOR clause does but without iteration

Where Clause

Where clause is optional. It is used to specify one or more conditions as per the requirement.
It is used to restrict the number of nodes returned by expression
Return Clause :
For each query return clause is evaluated.
The result produced are concatenated & return to users

XSL Transformations:

XSLT is a language for transforming XML documents into XHTML documents or to other XML documents.

XPath is a language for navigating in XML documents.

XSLT stands for XSL Transformations

With XSLT you can add/remove elements and attributes to or from the output file.

You can also rearrange and sort elements, perform tests and make decisions about which elements to
hide and display.

XSLT uses XPath to find information in an XML document.

XPath is used to navigate through elements and attributes in XML documents.

Figure :XSLT Model
XML applications

B2B Exchange :XML provides necessary standards which is required to exchange B2B data amongst
the organizations.
XML is less expensive & flexible

• Legacy System Integration :XML provides the facility to integrate legacy system data with modern
e-commerce System

It also transfer data from multiple heterogeneous databases to data warehouse

• Web Page Development :for creation of web page XML provide many features.

• Database Support :A DBMS which supports XML exchange creates new system by enabling
integration with external system

On the other side database in its native format stores the XML data.

Database Meta-dictionaires: for databases XML can also facilitate creation of meta dictionaries &
vocabularies.
Meta dictionary created are not dependent on the DBMS type, it uses common language for data
description.
• XML Databases : To handle vast XML data exchange, to manage & utilize data efficiently many
XML software are available in the market.

Such products also includes OODBMS with XML interfaces to full XML database engines & servers.
XML provides advanced features than provided by traditional DBMS it used to handle complex
relationship.
XML can store the contents of book which include chapters, paragraphs, headers etc.

UNIT-V
Why Have a Separate DataWarehouse?
Because operational databases store huge amounts of data, you may wonder, ―why not perform
on-line analytical processing directly on such databases instead of spending additional time and
resources to construct a separate data warehouse?‖
1. What is Parallel Database?

A parallel database uses multiple processors (or computers) to work together and
process data faster.
It splits the workload among processors so tasks like query execution, data retrieval, and
storage happen simultaneously.

2. What is Shared Nothing Architecture?

Shared Nothing Architecture (SN) means each processor (also called a node) has:
Its own CPU
Its own memory
Its own disk/storage
Nothing is shared between processors — hence, "shared nothing."
Each node is completely independent and communicates with other nodes only through
a network when necessary.

3. Visual Representation:

Imagine a system like this:

Node 1 Node 2 Node 3

CPU1 CPU2 CPU3

Memory1 Memory2 Memory3

Disk1 Disk2 Disk3

No memory or disk is shared between nodes — only network connects them if needed.

4. How It Works (Simple Example):

Suppose you have a large table of customers.

In Shared Nothing Parallel DBMS:

The table is divided into fragments.

Each node stores only its fragment.
When you run a query (like "find all customers from New York"):
Only relevant nodes process their part.
The results are combined and sent back to the user.

5. Advantages of Shared Nothing Architecture:

Advantage Explanation

Scalability You can easily add more nodes to handle

more data or users.

Fault Isolation If one node fails, others continue to work —

failure is localized.

Performance Since no sharing of memory or disk, there

is no bottleneck. Each node works
independently and efficiently.

Cost Efficiency Nodes can be cheaper, commodity

hardware — no need for expensive
centralized storage.

6. Disadvantages:

Disadvantage Explanation

Data Skew If data isn't distributed evenly, some nodes

get overloaded (hotspots).

Complex Query Coordination Merging results from multiple nodes can

be complicated and slow if data is
scattered.

Network Overhead Nodes need to communicate over the

network when queries involve data from
multiple nodes, which can be slower than
local access.

7. Real-world Examples:

Google BigQuery
Amazon Redshift
Apache Cassandra (for some workloads)
Teradata
All of these rely on shared nothing design principles.

8. Quick Comparison with Other Architectures:

Architecture Sharing Style Example

Shared Memory All processors share SMP systems (e.g., Oracle

memory RAC)

Shared Disk Processors have separate IBM GPFS

memory but share a disk

Shared Nothing Processors have separate Google BigQuery, Redshift

memory and separate disks

9. Summary Sentence:

In Shared Nothing Architecture, every processor-node is fully independent with its own CPU,
memory, and disk, enabling high scalability, fault tolerance, and parallel performance in a
distributed database system.
Active Databases
What is an Active Database?
An active database enhances the traditional passive database by allowing automatic reactions
to events. Instead of waiting for external programs or users to initiate actions, it responds based
on predefined rules.

It introduces event-driven behavior, commonly through ECA rules (Event-Condition-Action).

Components of Active Databases (ECA Model)

1. Event:
• Represents a specific database operation or a combination of operations.
• Types of Events:
o Primitive Events: INSERT, DELETE, UPDATE.
o Composite Events: Created using operators like AND, OR, SEQUENCE.
Example:
UPDATE of salary — a primitive event.
INSERT into Employees AND UPDATE of salary — a composite event.
2. Condition:
• A Boolean expression evaluated after the event.
• If the condition is TRUE, the action is executed.
• Uses current or old values (e.g., [Link], [Link]).
3. Action:
• The operation that is automatically performed.
• Can be:
o SQL statements (e.g., INSERT, DELETE)
o Procedures or functions
o Triggering other ECA rules

Rule Execution Modes:

1. Immediate Mode – The rule is executed immediately after the event and condition
evaluation.
2. Deferred Mode – Execution is postponed until a certain stage (e.g., end of transaction).
3. Cascading Mode – Rules can trigger other rules, forming rule chains (requires careful
control).

Advanced Features in Active DBMS:

• Rule Prioritization: Define which rules execute first.
• Conflict Resolution: Handle cases where multiple rules trigger conflicting actions.
• Rule Groups: Group related rules to handle complex workflows.

Real-Life Use Cases of Active Databases:

Use Case Description
Banking Automatically freeze account if fraudulent transaction is detected.
E-commerce Send discount emails when cart is abandoned for >24 hours.
Healthcare Alert staff if patient vitals cross threshold.
IoT (Smart Home) Turn off AC if no motion detected for 1 hour.
Supply Chain Reorder stock automatically when inventory drops below threshold.

B. Triggers – The Workhorse of Active Databases

Triggers are the most commonly used feature to implement ECA rules in SQL-based
databases like MySQL, PostgreSQL, Oracle, SQL Server.

Types of Triggers (Based on Event Timing)

Type Description
BEFORE Executes before the event occurs. Used for validation or modification.
Trigger
AFTER Trigger Executes after the event. Used for logging, auditing, or enforcement.
INSTEAD OF Used on views to override default behavior. Mostly in SQL Server and
Oracle.

Real SQL Trigger Example

CREATE TRIGGER log_deletion
AFTER DELETE ON Orders
FOR EACH ROW
BEGIN
INSERT INTO Audit_Log(order_id, action, deleted_at)
VALUES (OLD.order_id, 'Deleted', NOW());
END;

Explanation:
• Event: DELETE on Orders.
• Action: Log the deletion to Audit_Log.
• Condition: Not explicitly stated but implicitly always true.

Advanced Use of Triggers

1. Nested Triggers:
o A trigger can cause another trigger to fire.
o Care needed to avoid infinite loops or recursion.
2. Conditional Triggers:
IF [Link] = 'Cancelled' THEN
-- Action
END IF;

3. Accessing OLD and NEW values:

o OLD.column_name – value before update/delete.
o NEW.column_name – value after insert/update.

Limitations of Triggers
Issue Explanation
Complex Debugging Trigger chains can make flow hard to trace.
Performance Overhead Too many triggers slow down transactions.
Portability Issues Syntax varies across DBMSs (e.g., MySQL vs. Oracle).
Recursive/Infinite Loops If not properly handled, triggers can cause infinite rule firing.

Best Practices for Triggers

• Keep trigger logic simple and focused.
• Log all trigger activity during development.
• Avoid using too many triggers on a single table.
• Use naming conventions like trg_after_insert_order.

Summary Table
Feature Description
Active Database DB that can respond to events via rules
ECA Rule Event → Condition → Action execution model
Trigger Database code that automatically executes on certain operations
Use Cases Automation, alerts, logging, policy enforcement
Challenge Debugging, recursion, DBMS-specific syntax differences

TEMPORAL DATABASE CONCEPTS

1. Introduction
A Temporal Database is a special type of database that manages time-varying data. Unlike
conventional databases that store only the current data, temporal databases store historical,
current, and sometimes future data with associated timestamps. This is useful for applications
where it is necessary to know the state of the data at any point in time.

2. Time Dimensions in Temporal Databases

There are two major time dimensions:
a) Valid Time:
• The time when the data is true in the real world.
• Example: An employee worked in the HR department from Jan 2020 to Dec 2022.
b) Transaction Time:
• The time when the data is stored or modified in the database.
• Example: The record about that employee was added to the database in March 2021.
c) Bitemporal Data:
• Combines both valid time and transaction time.
• Tracks both when something happened and when it was recorded.

3. Structure of Temporal Tables

Temporal tables have extra columns:
• valid_from, valid_to
• transaction_start, transaction_end
(Diagram Description)
Imagine a table where each row has:
| Data | Valid_From | Valid_To | Transaction_Start | Transaction_End |
Each data update creates a new version with updated time fields, while old versions are
retained.

4. Example
| Emp_ID | Dept | Valid_From | Valid_To | Tx_Start | Tx_End |
| 101 | HR | 2020-01-01 | 2022-12-31 | 2021-03-10| 9999-12-31|
This shows that the employee was in HR (valid time), and the info was stored in March 2021
(transaction time).

5. Advantages
• Allows historical tracking and time-based queries.
• Useful for auditing, legal compliance, and trend analysis.
• Ensures data integrity over time.
• Supports “time-travel” queries (e.g., "What was true on Jan 1, 2022?").

6. Applications
Area Use Case
HR Employee role changes
Finance Account balance history
Healthcare Medical record history
E-commerce Product price changes over time

7. SQL:2011 Temporal Support

Modern SQL standards like SQL:2011 offer built-in support through:
• PERIOD FOR
• SYSTEM_TIME for transaction time
• VALID_TIME support for valid periods

8. Use Cases :
• HR systems (track employee roles over time)
• Banking (transaction history)
• Healthcare (track patient diagnosis history)
• Legal (audit trail for case files)
• E-commerce (track price changes of products)

9. Conclusion
Temporal databases are essential for systems where data evolves over time and where historical
accuracy is important. They offer enhanced functionality for real-world applications that
demand time-based reasoning.
A. SPATIAL DATABASES

1. What is a Spatial Database?

A Spatial Database is a type of database system designed to store, manipulate, and query
spatial data — data that represents objects in a geometric space. This includes points, lines, and
polygons that define locations, distances, topologies, and relationships in two or three
dimensions.

2. Types of Spatial Data

1. Vector Data
• Represents spatial features as geometric shapes.
• Types: Points (e.g., ATM), Lines (e.g., roads), Polygons (e.g., buildings).
2. Raster Data
• Represents continuous data like satellite imagery or elevation.
• Stored as a grid of pixels, each with a value (e.g., temperature, color).

3. Spatial Relationships
Spatial databases support spatial predicates such as:
• Contains – A region contains a point.
• Intersects – A road intersects a river.
• Within – A point lies within a city boundary.
• Nearest Neighbor – Find the nearest hospital.

4. Spatial Indexing Methods

Because spatial data is multi-dimensional, traditional indexing (like B-trees) is inefficient.
Instead, spatial databases use:
a. R-Tree (Rectangle Tree)
• Indexes bounding rectangles.
• Efficient for range queries and intersections.
• Example: PostGIS, Oracle Spatial.
b. Quad Tree
• Divides 2D space recursively into 4 quadrants.
• Works well for sparse data.
c. Grid Index
• Divides space into uniform cells.
• Suitable for raster data.

5. Spatial Query Languages

Modern spatial DBMSs extend SQL with spatial functions:
SELECT name FROM Hospitals
WHERE ST_Distance(location, ST_Point(20.5, 78.9)) < 2000;
ST_Distance() computes distance between two geometries.

6. Real-World Applications
Domain Example
Navigation Google Maps, GPS routing
Urban Planning Zoning, traffic simulation
Agriculture Precision farming (satellite-based field mapping)
Defense Target tracking, geospatial intelligence

B. MULTIMEDIA DATABASES

1. What is a Multimedia Database?

A Multimedia Database (MMDB) stores and manages complex media types like images,
audio, video, animation, and text. It allows retrieval, storage, annotation, and indexing of
media objects.
2. Core Components of Multimedia Data
Component Description
Media Data Actual raw files (e.g., .jpg, .mp4, .mp3)
Metadata Descriptive tags (e.g., title, duration)
Content Descriptors Extracted features (e.g., color, pitch)

3. Storage Methods
Multimedia objects are stored using:
• BLOB (Binary Large Objects) for raw media.
• CLOB (Character Large Objects) for text/media descriptions.

4. Multimedia Query Models

a. Keyword-based Retrieval
• Search using tags/metadata:
"Find videos with keyword 'sunset'"
b. Content-Based Retrieval (CBIR/CBVR)
• Based on media features (color, shape, texture, etc.)
• Example:
Find images visually similar to this one.
c. Semantic Retrieval
• Higher-level understanding:
"Find videos of cats playing with yarn"

5. Multimedia Indexing Techniques

Index Type Used For
Color Histogram Image similarity
Mel-Frequency Cepstral Coefficients (MFCC) Audio fingerprinting
Edge Histograms Shape matching

6. Multimedia Database Architecture

A multimedia DBMS includes:
• Media Server (for storage)
• Feature Extractor (e.g., extract color, pitch)
• Query Processor
• Media Player Interface

7. Real-World Applications
Field Example
Social Media Facebook, Instagram image tagging
Healthcare Medical image analysis (X-rays, MRIs)
Education E-learning platforms (videos, documents)
Law Enforcement Facial recognition from surveillance feeds

8. Challenges in Multimedia Databases

• Large file sizes
• Real-time streaming needs
• Complex content retrieval
• High-performance storage systems
• Standardization of formats and retrieval techniques

Conclusion
Spatial and Multimedia Databases represent the evolution of DBMS to handle complex, real-
world data. Spatial DBs empower GIS and location-aware services, while Multimedia DBs
enable storage and intelligent retrieval of rich media content. These systems are crucial for next-
generation applications across almost every domain, from defense to healthcare and
entertainment.

DEDUCTIVE DATABASE
A deductive database is a type of database that combines a traditional relational database with
logic-based reasoning, particularly using deductive logic (such as rules and facts) to derive new
information from stored data.
In a deductive database, you not only store data but also have the ability to define logical rules
and facts that can be applied to the data to infer new knowledge or derive conclusions. This
system uses a set of rules (often in logic programming languages like Prolog) to infer
relationships and new facts from existing ones, making it a powerful tool for complex querying
and problem-solving.
Here are key characteristics of a deductive database:
1. Rules and Facts: You define facts (basic information) and rules (logical inferences) that
the system uses to generate conclusions. For example, if "A is a parent of B" and "B is a
parent of C," the system can deduce that "A is a grandparent of C."
2. Recursive Queries: Deductive databases often support recursive queries, which means
they can ask about relationships that involve multiple levels, such as a family tree or a
network of connections.
3. Inferences: Deductive databases allow for the automatic inference of new facts based on
the existing data and the defined rules. This helps in scenarios like data analysis,
decision-making, and complex problem-solving.
4. Declarative Nature: The user defines what they want to know without necessarily
specifying how to compute it. The database engine then determines the most efficient
way to derive the answer.
5. Logic Programming: Deductive databases often employ a logic programming language
such as Datalog, which is a declarative language closely related to Prolog, to express
rules and queries.
Deductive databases are a sophisticated extension of traditional relational databases, leveraging
logic programming and reasoning to perform complex queries and inference over data. In more
advanced applications, they can provide significant benefits by allowing data-driven conclusions
to be drawn automatically, enhancing decision-making, analysis, and automation processes in
various fields.

Let’s dive into some advanced applications of deductive databases and how they are used in
real-world systems:
1. Knowledge Representation and Expert Systems
• Domain: Artificial Intelligence, Decision Support
• Application: Deductive databases are ideal for expert systems, which simulate human
expertise in specialized domains (e.g., medical diagnosis, financial analysis). In these
systems, the knowledge about the domain is stored as facts and rules.
• Example: In a medical expert system, rules like "If a patient has a fever and cough, then
they might have a respiratory infection" are stored. Given facts such as "This patient has
a fever and cough," the system can infer potential diagnoses.
• Benefits: The ability to deduce new facts based on existing knowledge, making the
system adaptable and capable of answering complex, context-sensitive queries.
2. Data Integration and Warehousing
• Domain: Big Data, Data Warehousing, Enterprise Integration
• Application: Deductive databases are crucial in integrating data from multiple
heterogeneous sources. By using logical rules, you can transform data from different
formats into a unified schema, infer missing data, or apply business rules for consistency
across systems.
• Example: In a business intelligence system, you can integrate data from different
departments (sales, inventory, HR) and use logical rules to deduce overall company
performance or future trends based on historical data.
• Benefits: The ability to automatically handle inconsistencies, apply complex business
rules, and unify disparate data sources.
3. Recursive Queries and Complex Graph Traversals
• Domain: Network Analysis, Social Networks, Supply Chains
• Application: Deductive databases excel in recursive queries and graph-based problems.
You can query hierarchical data (such as organizational charts, family trees, or network
topologies) and traverse graphs to find relationships or patterns.
• Example: In a social network analysis, rules can be defined to deduce indirect
relationships, such as friends of friends. "If A is friends with B, and B is friends with C,
then A and C are indirectly connected."
• Benefits: Recursive queries allow you to analyze multi-level relationships (e.g.,
grandparent relationships or supply chain dependencies) effortlessly.
4. Semantic Web and Linked Data
• Domain: Web Technologies, Data Interoperability
• Application: The Semantic Web aims to make internet data machine-readable by
structuring data using ontologies. Deductive databases are used to manage these
ontologies, infer new facts based on linked data, and support reasoning about
relationships between data points on the web.
• Example: In a linked-data context, if one dataset contains information about authors and
another about books, a deductive database could infer which author wrote which books,
even if the relationship is not explicitly stored.
• Benefits: Facilitates automated reasoning across interconnected data sources and
supports more advanced querying on the web.
5. Automated Planning and Decision Making
• Domain: Robotics, Process Management, Supply Chain Optimization
• Application: Deductive databases can be used in automated planning systems to
generate step-by-step procedures for achieving specific goals based on available
resources and constraints.
• Example: In a robotics system, rules about the robot's capabilities (e.g., "If the robot has
a gripper, it can pick up objects") and the environment (e.g., "If there is an object in front
of the robot, it can move towards it") allow the system to plan actions dynamically.
• Benefits: Supports intelligent decision-making based on real-time data and predefined
knowledge.
6. Legal and Compliance Systems
• Domain: Law, Compliance, Regulatory Systems
• Application: Deductive databases are used to model and analyze complex legal rules and
regulations, automatically verifying if an action or scenario complies with legal
requirements.
• Example: In a compliance system for financial services, you can define rules about
acceptable financial transactions (e.g., "A transaction over $10,000 must be reported")
and infer whether transactions are in compliance with regulations.
• Benefits: Helps organizations ensure they comply with legal requirements by
automatically detecting potential violations.
7. Complex Event Processing (CEP)
• Domain: Real-Time Systems, Monitoring, and Alerting
• Application: Deductive databases can be used to detect patterns in real-time data streams
by defining rules that trigger actions or alerts when certain conditions are met.
• Example: In a stock market monitoring system, you could define a rule: "If the price of
stock X exceeds a certain threshold and the volume traded increases by 20%, trigger an
alert for potential insider trading."
• Benefits: Provides real-time insights and alerts based on complex patterns and
relationships in dynamic data.
8. Database Query Optimization
• Domain: Database Management Systems (DBMS), Query Processing
• Application: Deductive databases can assist in optimizing queries by leveraging logical
inference to reduce the complexity of queries and avoid redundant computations. The
system can reason about how to combine facts and rules in an optimized way.
• Example: A complex query can be simplified using logical inference rules to avoid
unnecessary table joins or data retrieval steps.
• Benefits: Improves the performance of complex queries by reasoning about relationships
between data and applying optimization techniques.
Technologies Supporting Deductive Databases:
• Datalog: A declarative logic programming language, often used in deductive databases. It
is a subset of Prolog and is commonly used for querying and reasoning over logical data.
• Logic Programming: Systems like Prolog provide the logical framework for
implementing deductive databases, enabling automatic inference.
• Ontology Languages: Languages like RDF (Resource Description Framework) and
OWL (Web Ontology Language) are frequently used to define the rules and relationships
in a semantic web-based deductive database.
• Graph Databases: Technologies like Neo4j or GraphDB, though not strictly deductive
databases, support similar use cases involving complex relationships, recursion, and
pattern recognition.
Challenges and Considerations:
• Complexity: Deductive databases can be computationally expensive, especially with
large datasets and complex rules.
• Scalability: While they provide powerful inference capabilities, scaling deductive
databases for massive amounts of data and complex rules can be challenging.
• Performance: Recursive queries and logical inferences can slow down response times,
especially if the underlying database system is not optimized for such operations.
In conclusion, deductive databases offer a robust framework for solving complex problems
involving logic, reasoning, and inference across large datasets. They enable more advanced
applications such as expert systems, real-time event processing, legal compliance, and more,
offering greater flexibility and intelligence in decision-making and data analysis.

OBJECT-BASED DATABASES
Object-based databases (OBD) are a type of database that integrate object-oriented programming
concepts into the database management system (DBMS). Unlike traditional relational databases,
which use tables to store data in rows and columns, object-based databases store data as objects,
similar to how data is represented in object-oriented programming languages (like Java, C++, or
Python).
Here are some advanced concepts related to object-based databases in the context of database
management systems:
1. Object-Oriented Concepts
Object-based databases implement the following core object-oriented concepts:
• Encapsulation: Data and operations (methods) are bundled together in a single unit
(object).
• Inheritance: Objects can inherit properties and methods from other objects.
• Polymorphism: Objects can be treated as instances of their parent class, and their
methods can be called without knowing the specific type of object.
• Abstraction: Only essential data is exposed, hiding internal complexity.
2. Objects and Object Identity
• Object: In object-based databases, an object represents a real-world entity, such as a
customer or an order. Each object is self-contained with its own properties (attributes)
and methods (functions).
• Object Identity: Every object in an object database has a unique identifier (OID) that
distinguishes it from other objects.
3. Persistent Objects
In an object-based DBMS, objects that are created during the execution of a program can be
stored persistently in the database. These objects maintain their state even after the program that
created them has terminated.
4. Complex Data Types
Object-based databases allow complex data types, such as arrays, lists, sets, and even other
objects, to be stored directly in the database, unlike relational databases where data must be
broken down into primitive data types (e.g., integers, strings).
5. Object Query Languages (OQL)
Object-based databases use object query languages like OQL, which is a query language
designed to work with objects and their relationships. OQL allows querying and retrieving
objects using object-oriented concepts like classes and inheritance.
6. Mapping Object-Oriented Models to Relational Models
One of the challenges of object-based databases is mapping complex object-oriented models to
relational databases, which is often done using Object-Relational Mapping (ORM) tools. Some
object-based databases provide a seamless way to integrate with relational databases, allowing
developers to work in an object-oriented manner without having to worry about relational
schemas.
7. Advantages of Object-Based Databases
• Natural representation: Real-world entities can be represented directly as objects,
leading to a more intuitive mapping between the application and the database.
• Support for complex data types: Objects can contain multiple types of data, including
other objects, which makes them suitable for applications that require handling complex
data (e.g., CAD systems, multimedia databases).
• Inheritance and Reusability: The use of inheritance in object-based databases allows
for code and data reuse, making the system more flexible.
8. Disadvantages of Object-Based Databases
• Complexity: Object-based databases can be more complex to design and maintain,
especially if developers are not familiar with object-oriented programming principles.
• Performance: Object databases can sometimes have performance issues due to the need
to map objects to database records, particularly for large or complex datasets.
• Lack of standardization: Unlike relational databases, object-based databases do not
have a widely adopted standard, leading to potential portability issues.
9. Examples of Object-Based Databases
• ObjectDB: A Java-based object database that provides an object-oriented interface for
storing Java objects directly.
• db4o: A database for Java and .NET applications that allows developers to store complex
objects without needing to translate them into relational models.
• Versant Object Database: A database that supports object-oriented programming
concepts, optimized for handling large-scale, complex applications.
10. Object-Relational Hybrid Databases
Some modern databases combine object-oriented features with relational database features,
known as Object-Relational Databases (ORDs). These systems attempt to bridge the gap
between relational and object-oriented databases by supporting both types of data models.
Examples include PostgreSQL (with support for user-defined types and inheritance) and Oracle
Database (with object-relational features).
11. Usage in Real-World Applications
Object-based databases are particularly well-suited for applications that involve complex data
models, such as:
• CAD/CAM systems: These systems require the ability to model objects that have both
attributes and behaviors.
• Multimedia databases: Object databases can store multimedia content (images, video,
audio) more effectively than relational databases.
• Telecommunications: Telecommunications systems involve complex objects, such as
calls and customer profiles, which are naturally represented in object-oriented models.

Common questions