0% found this document useful (0 votes)
29 views44 pages

Distributed Database Overview and Types

There are two main ways that data can be distributed across sites in a distributed database: replication and fragmentation. Replication involves storing complete copies of the entire database or relations at multiple sites, allowing for high availability but increasing overhead of updates. Fragmentation involves dividing relations into fragments and distributing those pieces across sites, avoiding copies but requiring relations to be reconstructed from fragments. Distributed databases can use combinations of these techniques depending on factors like query load and update frequency.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views44 pages

Distributed Database Overview and Types

There are two main ways that data can be distributed across sites in a distributed database: replication and fragmentation. Replication involves storing complete copies of the entire database or relations at multiple sites, allowing for high availability but increasing overhead of updates. Fragmentation involves dividing relations into fragments and distributing those pieces across sites, avoiding copies but requiring relations to be reconstructed from fragments. Distributed databases can use combinations of these techniques depending on factors like query load and update frequency.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit 2

Distributed Databases,Active Database and Open Database


Connectivity
Distributed databases:
A distributed database is a type of database that has contributions from the common database and
information captured by local computers. In this type of database system, the data is not in one place
and is distributed at various organizations.

Types of Distributed Databases


Distributed databases can be broadly classified into homogeneous and heterogeneous distributed database
environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its properties are

 The sites use very similar software.
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user requests.
 The database is accessed through a single interface as if it is a single database.

Types of Homogeneous Distributed Database


There are two types of homogeneous distributed database −
 Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or master DBMS co-
ordinates data updates across the sites.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS products and data
models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network, hierarchical or object
oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in processing user requests.

Types of Heterogeneous Distributed Databases


 Federated − The heterogeneous database systems are independent in nature and integrated together so that
they function as a single database system.
 Un-federated − The database systems employ a central coordinating module through which the databases
are accessed.

Distributed DBMS Architectures


DDBMS architectures are generally developed depending on three parameters −
 Distribution − It states the physical distribution of data across the different sites.
 Autonomy − It indicates the distribution of control of the database system and the degree to which each
constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system components and
databases.

Architectural Models
Some of the common architectural models are −

 Client - Server Architecture for DDBMS


 Peer - to - Peer Architecture for DDBMS
 Multi - DBMS Architecture

Client - Server Architecture for DDBMS


This is a two-level architecture where the functionality is divided into servers and clients. The server functions
primarily encompass data management, query processing, optimization and transaction management. Client
functions include mainly user interface. However, they have some functions like consistency checking and
transaction management.
The two different client - server architecture are −

 Single Server Multiple Client


 Multiple Server Multiple Client (shown in the following diagram)
Peer- to-Peer Architecture for DDBMS
In these systems, each peer acts both as a client and a server for imparting database services. The peers share their
resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas −
 Global Conceptual Schema − Depicts the global logical view of data.
 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.
Multi - DBMS Architectures
This is an integrated database system formed by a collection of two or more autonomous database systems.
Multi-DBMS can be expressed through six levels of schemas −
 Multi-database View Level − Depicts multiple user views comprising of subsets of the integrated
distributed database.
 Multi-database Conceptual Level − Depicts integrated multi-database that comprises of global logical
multi-database structure definitions.
 Multi-database Internal Level − Depicts the data distribution across different sites and multi-database to
local data mapping.
 Local database View Level − Depicts public view of local data.
 Local database Conceptual Level − Depicts local data organization at each site.
 Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −

 Model with multi-database conceptual level.


 Model without multi-database conceptual level.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −

 Non-replicated and non-fragmented


 Fully replicated
 Partially replicated
 Fragmented
 Mixed

Non-replicated & Non-fragmented


In this design alternative, different tables are placed at different sites. Data is placed so that it is at a close proximity
to the site where it is used most. It is most suitable for database systems where the percentage of queries needed
to join information in tables placed at different sites is low. If an appropriate distribution strategy is adopted, then
this design alternative helps to reduce the communication cost during data processing.

Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since, each site has its own
copy of the entire database, queries are very fast requiring negligible communication cost. On the contrary, the
massive redundancy in data requires huge cost during update operations. Hence, this is suitable for systems where
a large number of queries is required to be handled whereas the number of database updates is low.

Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables is done in
accordance to the frequency of access. This takes into consideration the fact that the frequency of accessing the
tables vary considerably from site to site. The number of copies of the tables (or portions) depends on how
frequently the access queries execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and each fragment
can be stored at different sites. This considers the fact that it seldom happens that all data stored in a table is
required at a given site. Moreover, fragmentation increases parallelism and provides better disaster recovery. Here,
there is only one copy of each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −

 Vertical fragmentation
 Horizontal fragmentation
 Hybrid fragmentation

Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially fragmented in any
form (horizontal or vertical), and then these fragments are partially replicated across the different sites according
to the frequency of accessing the fragments.

Distributed Data Storage

A distributed database is basically a database that is not limited to one system, it is spread over different
sites, i.e, on multiple computers or over a network of computers. A distributed database system is located
on various sites that don’t share physical components. This may be required when a particular database
needs to be accessed by various users globally. It needs to be managed such that for the users it looks
like one single database.
Distributed Data Storage :
There are 2 ways in which data can be stored on different sites. These are:
[Link] –
In this approach, the entire relation is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of
data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests
can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made
at one site needs to be recorded at every site that relation is stored or else it may lead to inconsistency.
This is a lot of overhead. Also, concurrency control becomes way more complex as concurrent access
now needs to be checked over a number of sites.
[Link] –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the
fragments is stored in different sites where they’re required. It must be made sure that the fragments are
such that they can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.

Fragmentation of relations can be done in two ways:

 Horizontal fragmentation – Splitting by rows –


The relation is fragmented into groups of tuples so that each tuple is assigned to at least one
fragment.
 Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each fragment must contain a
common candidate key so as to ensure lossless join.

Flat & Nested Distributed Transactions


Introduction:
A transaction is a series of object operations that must be done in an ACID-compliant manner.
Example : In banking transactions
 Atomicity –
The transaction is completed entirely or not at [Link] be commit or abort.
 Consistency –
It is a term that refers to the transition from one consistent state to another.
 Isolation –
It is carried out separately from other transactions.
 Durability–
Once completed, it is long lasting.
Transactions – Commands :
 Begin–
initiate a new transaction.
 Commit–
End a transaction and the changes made during the transaction are saved. Also, it allows other
transactions to see the modifications you’ve made.
 Abort–
End a transaction and all changes made during the transaction will be undone.
Various roles are allocated to running a transaction successfully :
 Client–
The transactions are issued by the clients.
 Coordinator–
The execution of the entire transaction is controlled by it (handles Begin, commit & abort).
 Server–
Every component that accesses or modifies a resource is subject to transaction control. The
coordinator must be known by the transactional server. The transactional server registers its
participation in a transaction with the coordinator.
A flat or nested transaction that accesses objects handled by different servers is referred to as a
distributed transaction.
When a distributed transaction reaches its end, in order to maintain the atomicity property of the
transaction , it is mandatory that all of the servers involved in the transaction either commit the
transaction or abort it.
To do this, one of the servers takes on the job of coordinator, which entails ensuring that the same
outcome is achieved across all servers.
The method by which the coordinator accomplishes this is determined by the protocol selected. The
most widely used protocol is the ‘two-phase commit protocol.’ This protocol enables the servers to
communicate with one another in order to come to a joint decision on whether to commit or abort the
complete transaction.
Flat&NestedDistributed Transactions:
If a client transaction calls actions on multiple servers, it is said to be distributed. Distributed
transactions can be structured in two different ways:
1. Flat transactions
2. Nested transactions
FLAT TRANSACTIONS:
A flat transaction has a single initiating point(Begin) and a single end point(Commit or abort). They are
usually very simple and are generally used for short activities rather than larger ones.
A client makes requests to multiple servers in a flat transaction. Transaction T, for example, is a flat
transaction that performs operations on objects in servers X, Y, and Z.
Before moving on to the next request, a flat client transaction completes the previous one. As a result,
each transaction visits the server object in order.
A transaction can only wait for one object at a time when servers utilize locking.

Flat Transaction

Limitations of a flat Transaction :


 All work is lost in the event of a crash.
 Only one DBMS may be used at a time.
 No partial rollback is possible.
NESTED TRANSACTIONS :
A transaction that includes other transactions within its initiating point and a end point are known as
nested transactions. So the nesting of the transactions is done in a transaction. The nested transactions
here are called sub-transactions.
The top-level transaction in a nested transaction can open sub-transactions, and each sub-transaction
can open more sub-transactions down to any depth of nesting.
A client’s transaction T opens up two sub-transactions, T1 and T2, which access objects on servers X
and Y, as shown in the diagram below.
T1.1, T1.2, T2.1, and T2.2, which access the objects on the servers M,N, and P, are opened by the sub-
transactions T1 and T2.

Nested Transaction

Concurrent Execution of the Sub-transactions is done which are at the same level – in the nested
transaction [Link], in the above diagram, T1 and T2 invoke objects on different servers and
hence they can run in parallel and are therefore concurrent.
T1.1, T1.2, T2.1, and T2.2 are four sub-transactions. These sub-transactions can also run in parallel.
Consider a distributed transaction (T) in which a customer transfers :
 Rs. 105 from account A to account C and
 Subsequently, Rs. 205 from account B to account D.
It can be viewed/ thought of as :
Transaction T :
Start
Transfer Rs 105 from A to C :
Deduct Rs 105 from A(withdraw from A) & Add Rs 105 to C(depopsit to C)
Transfer Rs 205 from B to D :
Deduct Rs 205 from B (withdraw from B)& Add Rs 205 to D(depopsit to D)
End
Assuming :
1. Account A is on server X
2. Account B is on server Y,and
3. Accounts C and D are on server Z.
The transaction T involves four requests – 2 for deposits and 2 for withdrawals. Now they can be treated
as sub transactions (T1, T2, T3, T4) of the transaction T.
As shown in the figure below, transaction T is designed as a set of four nested transactions : T1, T2, T3
and T4.
Advantage:
The performance is higher than a single transaction in which four operations are invoked one after the
other in sequence.

Nested Transaction

So, the Transaction T may be divided into sub-transactions as :


//Start the Transaction
T = open transaction
//T1
openSubtransaction
[Link](105);
//T2
openSubtransaction
[Link](205);
//T3
openSubtransaction
[Link](105);
//T4
openSubtransaction
[Link](205);
//End the trsnaction
close Transaction
Role of coordinator :
When the Distributed Transaction commits, the servers that are involved in the transaction execution,for
proper coordination, must be able to communicate with one another .
When a client initiates a transaction, an “openTransaction” request is sent to any coordinator server. The
contacted coordinator carries out the “openTransaction” and returns the transaction identifier to the
client.
Distributed transaction identifiers must be unique within the distributed system.
A simple way is to generate a TID contains two parts – the ‘server identifier” (example : IP address) of
the server that created it and a number unique to the server.
The coordinator who initiated the transaction becomes the distributed transaction’s coordinator and has
the responsibility of either aborting it or committing it.
Every server that manages an object accessed by a transaction is a participant in the transaction &
provides an object we call the participant. The participants are responsible for working together with
the coordinator to complete the commit process.
The coordinator every time, records the new participant in the participants list. Each participant knows
the coordinator & the coordinator knows all the participants. This enables them to collect the information
that will be needed at the time of commit and hence work in coordination.

Distributed Transactions Concepts

What Are Distributed Transactions?


A distributed transaction includes one or more statements that, individually or as a group, update data
on two or more distinct nodes of a distributed database. For example, assume the database
configuration depicted in Figure 4-1:
Figure 4-1 Distributed System

The following distributed transaction executed by SCOTT updates the local SALES database, the
remote HQ database, and the remote MAINT database:
UPDATE [Link]@[Link]
SET loc = 'REDWOOD SHORES'
WHERE deptno = 10;
UPDATE [Link]
SET deptno = 11
WHERE deptno = 10;
UPDATE [Link]@[Link]
SET room = 1225
WHERE room = 1163;
COMMIT;

Note:

If all statements of a transaction reference only a single remote node, then the transaction is
remote, not distributed.

The section contains the following topics:

 Supported Types of Distributed Transactions


 Session Trees for Distributed Transactions
 Two-Phase Commit Mechanism

Supported Types of Distributed Transactions

This section describes permissible operations in distributed transactions:

 DML and DDL Transactions


 Transaction Control Statements

DML and DDL Transactions

The following list describes DML and DDL operations supported in a distributed transaction:
 CREATE TABLE AS SELECT
 DELETE
 INSERT (default and direct load)
 LOCK TABLE
 SELECT
 SELECT FOR UPDATE

You can execute DML and DDL statements in parallel, and INSERT direct load statements serially,
but note the following restrictions:

 All remote operations must be SELECT statements.


 These statements must not be clauses in another distributed transaction.
 If the table referenced in the table_expression_clause of an INSERT, UPDATE, or DELETE
statement is remote, then execution is serial rather than parallel.
 You cannot perform remote operations after issuing parallel DML/DDL or direct load INSERT.
 If the transaction begins using XA or OCI(oracle call interface), it executes serially.
 No loopback operations can be performed on the transaction originating the parallel operation.
For example, you cannot reference a remote object that is actually a synonym for a local object.
 If you perform a distributed operation other than a SELECT in the transaction, no DML is
parallelized.

Transaction Control Statements

The following list describes supported transaction control statements:

 COMMIT
 ROLLBACK
 SAVEPOINT

Savepoint in SQL
o Savepoint is a command in SQL that is used with the rollback command.
o It is a command in Transaction Control Language that is used to mark the transaction in a table.
o Consider you are making a very long table, and you want to roll back only to a certain position in a table
then; this can be achieved using the savepoint.
o If you made a transaction in a table, you could mark the transaction as a certain name, and later on, if you
want to roll back to that point, you can do it easily by using the transaction's name.
o Savepoint is helpful when we want to roll back only a small part of a table and not the whole table. In
simple words, we can say savepoint is a bookmark in SQL.

Session Trees for Distributed Transactions

Oracle8i defines a session tree of all nodes participating in a distributed transaction. A session tree is
a hierarchical model of the transaction that describes the relationships among the nodes that are
involved. Each node plays a role in the transaction. For example, the node that originates the
transaction is the global coordinator, and the node in charge of initiating a commit or rollback is called
the commit point site.
Two-Phase Commit Mechanism

Unlike a transaction on a local database, a distributed transaction involves altering data on multiple
databases. Consequently, distributed transaction processing is more complicated, because Oracle must
coordinate the committing or rolling back of the changes in a transaction as a self-contained unit. In
other words, the entire transaction commits, or the entire transactions rolls back.

Oracle ensures the integrity of data in a distributed transaction using the two-phase commit
mechanism. In the prepare phase, the initiating node in the transaction asks the other participating
nodes to promise to commit or roll back the transaction. During the commit phase, the initiating node
asks all participating nodes to commit the transaction; if this outcome is not possible, then all nodes
are asked to roll back.

Session Trees for Distributed Transactions


As the statements in a distributed transaction are issued, Oracle8i defines a session tree of all nodes
participating in the transaction. A session tree is a hierarchical model that describes the relationships
among sessions and their roles. Figure 4-2 illustrates a session tree:

Figure 4-2 Example of a Session Tree

All nodes participating in the session tree of a distributed transaction assume one or more of the
following roles:

client A node that references information in a database belonging to a different node.


database server A node that receives a request for information from another node.
global coordinator The node that originates the distributed transaction.
local coordinator A node that is forced to reference data on other nodes to complete its part of the transaction.
commit point site The node that commits or rolls back the transaction as instructed by the global coordinator.

The role a node plays in a distributed transaction is determined by:

 Whether the transaction is local or remote


 The commit point strength of the node ("Commit Point Site")
 Whether all requested data is available at a node, or whether other nodes need to be referenced
to complete the transaction
 Whether the node is read-only

Clients

A node acts as a client when it references information from another node's database. The referenced
node is a database server. In Figure 4-2, the node SALES is a client of the nodes that host the
WAREHOUSE and FINANCE databases.

Database Servers

A database server is a node that hosts a database from which a client requests data.

In Figure 4-2, an application at the SALES node initiates a distributed transaction that accesses data
from the WAREHOUSE and FINANCE nodes. Therefore, [Link] has the role of client
node, and WAREHOUSE and FINANCE are both database servers. In this example, SALES is a
database server and a client because the application also requests a change to the SALES database.

Local Coordinators

A node that must reference data on other nodes to complete its part in the distributed transaction is
called a local coordinator. In Figure 4-2, SALES is a local coordinator because it coordinates the
nodes it directly references: WAREHOUSE and FINANCE. SALES also happens to be the global
coordinator because it coordinates all the nodes involved in the transaction.

A local coordinator is responsible for coordinating the transaction among the nodes it communicates
directly with by:

 Receiving and relaying transaction status information to and from those nodes.
 Passing queries to those nodes.
 Receiving queries from those nodes and passing them on to other nodes.
 Returning the results of queries to the nodes that initiated them.

Global Coordinator

The node where the distributed transaction originates is called the global coordinator. The database
application issuing the distributed transaction is directly connected to the node acting as the global
coordinator. For example, in Figure 4-2, the transaction issued at the node SALES references
information from the database servers WAREHOUSE and FINANCE. Therefore,
[Link] is the global coordinator of this distributed transaction.

The global coordinator becomes the parent or root of the session tree. The global coordinator performs
the following operations during a distributed transaction:

 Sends all of the distributed transaction's SQL statements, remote procedure calls, etc. to the
directly referenced nodes, thus forming the session tree.
 Instructs all directly referenced nodes other than the commit point site to prepare the
transaction.
 Instructs the commit point site to initiate the global commit of the transaction if all nodes
prepare successfully.
 Instructs all nodes to initiate a global rollback of the transaction if there is an abort response.
Commit Point Site

The job of the commit point site is to initiate a commit or roll back operation as instructed by the
global coordinator. The system administrator always designates one node to be the commit point
site in the session tree by assigning all nodes a commit point strength. The node selected as commit
point site should be the node that stores the most critical data.

Figure 4-3 illustrates an example of distributed system, with SALES serving as the commit point site:

Figure 4-3 Commit Point Site

The commit point site is distinct from all other nodes involved in a distributed transaction in these
ways:

 The commit point site never enters the prepared state. Consequently, if the commit point site
stores the most critical data, this data never remains in-doubt, even if a failure occurs. In failure
situations, failed nodes remain in a prepared state, holding necessary locks on data until in-
doubt transactions are resolved.
 The commit point site commits before the other nodes involved in the transaction. In effect, the
outcome of a distributed transaction at the commit point site determines whether the transaction
at all nodes is committed or rolled back: the other nodes follow the lead of the commit point
site. The global coordinator ensures that all nodes complete the transaction in the same manner
as the commit point site.

How a Distributed Transaction Commits

A distributed transaction is considered committed after all non-commit point sites are prepared, and
the transaction has been actually committed at the commit point site. The online redo log at the commit
point site is updated as soon as the distributed transaction is committed at this node.

Because the commit point log contains a record of the commit, the transaction is considered committed
even though some participating nodes may still be only in the prepared state and the transaction not
yet actually committed at these nodes. In the same way, a distributed transaction is
considered not committed if the commit has not been logged at the commit point site.

Commit Point Strength

Every database server must be assigned a commit point strength. If a database server is referenced in
a distributed transaction, the value of its commit point strength determines which role it plays in the
two-phase commit. Specifically, the commit point strength determines whether a given node is the
commit point site in the distributed transaction and thus commits before all of the other nodes. This
value is specified using the initialization parameter COMMIT_POINT_STRENGTH.

How Oracle Determines the Commit Point Site

The commit point site, which is determined at the beginning of the prepare phase, is selected only
from the nodes participating in the transaction. The following sequence of events occurs:

1. Of the nodes directly referenced by the global coordinator, Oracle selects the node with the
highest commit point strength as the commit point site.

2. The initially-selected node determines if any of the nodes from which it has to obtain
information for this transaction has a higher commit point strength.

3. Either the node with the highest commit point strength directly referenced in the transaction or
one of its servers with a higher commit point strength becomes the commit point site.

4. After the final commit point site has been determined, the global coordinator sends prepare
responses to all nodes participating in the transaction.

Figure 4-4 shows in a sample session tree the commit point strengths of each node (in parentheses)
and shows the node chosen as the commit point site:

Figure 4-4 Commit Point Strengths and Determination of the Commit Point Site

The following conditions apply when determining the commit point site:

 A read-only node cannot be the commit point site.


 If multiple nodes directly referenced by the global coordinator have the same commit point
strength, then Oracle designates one of these as the commit point site.
 If a distributed transaction ends with a rollback, then the prepare and commit phases are not
needed. Consequently, Oracle never determines a commit point site. Instead, the global
coordinator sends a ROLLBACK statement to all nodes and ends the processing of the
distributed transaction.
As Figure 4-4 illustrates, the commit point site and the global coordinator can be different nodes of
the session tree. The commit point strength of each node is communicated to the coordinators when
the initial connections are made. The coordinators retain the commit point strengths of each node they
are in direct communication with so that commit point sites can be efficiently selected during two-
phase commits. Therefore, it is not necessary for the commit point strength to be exchanged between
a coordinator and a node each time a commit occurs.

Two-Phase Commit Mechanism


All participating nodes in a distributed transaction should perform the same action: they should either
all commit or all perform a rollback of the transaction. Oracle8i automatically controls and monitors
the commit or rollback of a distributed transaction and maintains the integrity of the global
database (the collection of databases participating in the transaction) using the two-phase commit
mechanism. This mechanism is completely transparent, requiring no programming on the part of the
user or application developer.

The commit mechanism has the following distinct phases, which Oracle performs automatically
whenever a user commits a distributed transaction:

prepare The initiating node, called the global coordinator, asks participating nodes other than the commit point
phase site to promise to commit or roll back the transaction, even if there is a failure. If any node cannot
prepare, the transaction is rolled back.
commit If all participants respond to the coordinator that they are prepared, then the coordinator asks the
phase commit point site to commit. After it commits, the coordinator asks all other nodes to commit the
transaction.
forget The global coordinator forgets about the transaction.
phase

This section contains the following topics:

 Prepare Phase
 Commit Phase
 Forget Phase

Prepare Phase
The first phase in committing a distributed transaction is the prepare phase. In this phase, Oracle does
not actually commit or roll back the transaction. Instead, all nodes referenced in a distributed
transaction (except the commit point site, described in the "Commit Point Site") are told to prepare to
commit. By preparing, a node:

 Records information in the online redo logs so that it can subsequently either commit or roll
back the transaction, regardless of intervening failures.
 Places a distributed lock on modified tables, which prevents reads.

When a node responds to the global coordinator that it is prepared to commit, the prepared
node promises to either commit or roll back the transaction later--but does not make a unilateral
decision on whether to commit or roll back the transaction. The promise means that if an instance
failure occurs at this point, the node can use the redo records in the online log to recover the database
back to the prepare phase.

Note:

Queries that start after a node has prepared cannot access the associated locked data until
all phases complete.

Types of Responses in the Prepare Phase

When a node is told to prepare, it can respond in the following ways:

prepared Data on the node has been modified by a statement in the distributed transaction, and the node has
successfully prepared.
read- No data on the node has been, or can be, modified (only queried), so no preparation is necessary.
only
abort The node cannot successfully prepare.
Prepared Response

When a node has successfully prepared, it issues a prepared message. The message indicates that the
node has records of the changes in the online log, so it is prepared either to commit or perform a
rollback. The message also guarantees that locks held for the transaction can survive a failure.

Read-Only Response

When a node is asked to prepare, and the SQL statements affecting the database do not change the
node's data, the node responds with a read-only message. The message indicates that the node will not
participate in the commit phase.

There are three cases in which all or part of a distributed transaction is read-only:

Case Conditions Consequence

Partially read- Any of the following occurs: The read-only nodes recognize their status when asked
only to prepare. They give their local coordinators a read-
 Only queries are issued at one only response. Thus, the commit phase completes faster
or more nodes. because Oracle eliminates read-only nodes from
 No data is changed. subsequent processing.
Case Conditions Consequence

 Changes rolled back due to


triggers firing or constraint
violations.

Completely read- All of following occur: All nodes recognize that they are read-only during
only with prepare phase, so no commit phase is required. The
prepare phase  No data changes. global coordinator, not knowing whether all nodes are
 Transaction is not started read-only, must still perform the prepare phase.
with SET TRANSACTION
READ ONLY statement.

Completely read- All of following occur: Only queries are allowed in the transaction, so global
only without coordinator does not have to perform two-phase
two-phase  No data changes. commit. Changes by other transactions do not degrade
commit  Transaction is started with global transaction-level read consistency because of
SET TRANSACTION READ global SCN coordination among nodes. The transaction
ONLY statement. does not use rollback segments.

Note that if a distributed transaction is set to read-only, then it does not use rollback segments. If many
users connect to the database and their transactions are not set to READ ONLY, then they allocate
rollback space even if they are only performing queries.

Abort Response

When a node cannot successfully prepare, it performs the following actions:

1. Releases resources currently held by the transaction and rolls back the local portion of the
transaction.

2. Responds to the node that referenced it in the distributed transaction with an abort message.

These actions then propagate to the other nodes involved in the distributed transaction so that they can
roll back the transaction and guarantee the integrity of the data in the global database. This response
enforces the primary rule of a distributed transaction: all nodes involved in the transaction either all
commit or all roll back the transaction at the same logical time.

Steps in the Prepare Phase

To complete the prepare phase, each node excluding the commit point site performs the following
steps:

1. The node requests that its descendants, that is, the nodes subsequently referenced, prepare to
commit.
2. The node checks to see whether the transaction changes data on itself or its descendants. If there
is no change to the data, then the node skips the remaining steps and returns a read-only
response

3. The node allocates the resources it needs to commit the transaction if data is changed.

4. The node saves redo records corresponding to changes made by the transaction to its online
redo log.

5. The node guarantees that locks held for the transaction are able to survive a failure.

6. The node responds to the initiating node with a prepared response , if its attempt or the attempt
of one of its descendents to prepare was unsuccessful, with an abort response.

These actions guarantee that the node can subsequently commit or roll back the transaction on the
node. The prepared nodes then wait until a COMMIT or ROLLBACK request is received from the
global coordinator.

After the nodes are prepared, the distributed transaction is said to be in-doubt . It retains in-doubt
status until all changes are either committed or rolled back.

Commit Phase

The second phase in committing a distributed transaction is the commit phase. Before this phase
occurs, all nodes other than the commit point site referenced in the distributed transaction have
guaranteed that they are prepared, that is, they have the necessary resources to commit the transaction.

Steps in the Commit Phase

The commit phase consists of the following steps:

1. The global coordinator instructs the commit point site to commit.

2. The commit point site commits.

3. The commit point site informs the global coordinator that it has committed.

4. The global and local coordinators send a message to all nodes instructing them to commit the
transaction.

5. At each node, Oracle8i commits the local portion of the distributed transaction and releases
locks.

6. At each node, Oracle8i records an additional redo entry in the local redo log, indicating that the
transaction has committed.

7. The participating nodes notify the global coordinator that they have committed.

When the commit phase is complete, the data on all nodes of the distributed system is consistent with
one another.
Guaranteeing Global Database Consistency

Each committed transaction has an associated system change number (SCN) to uniquely identify the
changes made by the SQL statements within that transaction. The SCN functions as an internal Oracle
timestamp that uniquely identifies a committed version of the database.

In a distributed system, the SCNs of communicating nodes are coordinated when all of the following
actions occur:

 A connection occurs using the path described by one or more database links.
 A distributed SQL statement executes.
 A distributed transaction commits.

Among other benefits, the coordination of SCNs among the nodes of a distributed system ensures
global read-consistency at both the statement and transaction level. If necessary, global time-based
recovery can also be completed.

During the prepare phase, Oracle8i determines the highest SCN at all nodes involved in the
transaction. The transaction then commits with the high SCN at the commit point site. The commit
SCN is then sent to all prepared nodes with the commit decision.

Forget Phase

After the participating nodes notify the commit point site that they have committed, the commit point
site can forget about the transaction. The following steps occur:

1. After receiving notice from the global coordinator that all nodes have committed, the commit
point site erases status information about this transaction.

2. The commit point site informs the global coordinator that it has erased the status information.

3. The global coordinator erases its own information about the transaction.

In-Doubt Transactions
The two-phase commit mechanism ensures that all nodes either commit or perform a rollback together.
What happens if any of the three phases fails because of a system or network error? The transaction
becomes in-doubt.

Distributed transactions can become in-doubt in the following ways:

 A server machine running Oracle software crashes.


 A network connection between two or more Oracle databases involved in distributed processing
is disconnected.
 An unhandled software error occurs.

The RECO (recovery)process automatically resolves in-doubt transactions when the machine,
network, or software problem is resolved. Until RECO can resolve the transaction, the data is locked
for both reads and writes. Oracle blocks reads because it cannot determine which version of the data
to display for a query.
This section contains the following topics:

 Automatic Resolution of In-Doubt Transactions


 Manual Resolution of In-Doubt Transactions
 Relevance of System Change Numbers for In-Doubt Transactions

Automatic Resolution of In-Doubt Transactions

In the majority of cases, Oracle resolves the in-doubt transaction automatically. Assume that there are
two nodes, LOCAL and REMOTE, in the following scenarios. The local node is the commit point
site. User SCOTT connects to LOCAL and executes and commits a distributed transaction that updates
LOCAL and REMOTE.

Failure During the Prepare Phase

Figure 4-5 illustrates the sequence of events when there is a failure during the prepare phase of a
distributed transaction:

Figure 4-5 Failure During Prepare Phase

The following steps occur:

1. Scott connects to LOCAL and executes a distributed transaction.

2. The global coordinator, which in this example is also the commit point site, requests all
databases other than the commit point site to promise to commit or roll back when told to do
so.

3. The REMOTE database crashes before issuing the prepare response back to LOCAL.

4. The transaction is ultimately rolled back on each database by the RECO process when the
remote site is restored.

Failure During the Commit Phase

Figure 4-5 illustrates the sequence of events when there is a failure during the commit phase of a
distributed transaction:
Figure 4-6 Failure During Prepare Phase

The following steps occur:

1. Scott connects to LOCAL and executes a distributed transaction.

2. The global coordinator, which in this case is also the commit point site, requests all databases
other than the commit point site to promise to commit or roll back when told to do so.

3. The commit point site receives a prepare message from REMOTE saying that it will commit.

4. The commit point site commits the transaction locally, then sends a commit message to
REMOTE asking it to commit.

5. The REMOTE database receives the commit message, but cannot respond because of a network
failure.

6. The transaction is ultimately committed on the remote database by the RECO process after the
network is restored.

Manual Resolution of In-Doubt Transactions

You should only need to resolve an in-doubt transaction in the following cases:

 The in-doubt transaction has locks on critical data or rollback segments.


 The cause of the machine, network, or software failure cannot be repaired quickly.

Resolution of in-doubt transactions can be complicated. The procedure requires that you do the
following:

 Identify the transaction identification number for the in-doubt transaction.


 Query the DBA_2PC_PENDING and DBA_2PC_NEIGHBORS views to determine whether
the databases involved in the transaction have committed.
 If necessary, force a commit using the COMMIT FORCE statement or a rollback using the
ROLLBACK FORCE statement.

Relevance of System Change Numbers for In-Doubt Transactions

A system change number (SCN) is an internal timestamp for a committed version of the database. The
Oracle database server uses the SCN clock value to guarantee transaction consistency. For example,
when a user commits a transaction, Oracle records an SCN for this commit in the online redo log.

Oracle uses SCNs to coordinate distributed transactions among different databases. For example,
Oracle uses SCNs in the following way:

1. An application establishes a connection using a database link.

2. The distributed transaction commits with the highest global SCN among all the databases
involved.

3. The commit global SCN is sent to all databases involved in the transaction.

SCNs are important for distributed transactions because they function as a synchronized commit
timestamp of a transaction--even if the transaction fails. If a transaction becomes in-doubt, an
administrator can use this SCN to coordinate changes made to the global database. The global SCN
for the transaction commit can also be used to identify the transaction later, for example, in distributed
recovery.

Distributed Transaction Processing: Case Study


In this scenario, a company has separate Oracle8i database servers, [Link] and
[Link]. As users insert sales records into the SALES database, associated
records are being updated at the WAREHOUSE database.

This case study of distributed processing illustrates:

 The definition of a session tree


 How a commit point site is determined
 When prepare messages are sent
 When a transaction actually commits
 What information is stored locally about the transaction

Stage 1: Client Application Issues DML Statements

At the Sales department, a salesperson uses SQL*Plus to enter a sales order and then commit it. The
application issues a number of SQL statements to enter the order into the SALES database and update
the inventory in the WAREHOUSE database:
CONNECT scott/tiger@[Link] ...;
INSERT INTO orders ...;
UPDATE inventory@[Link] ...;
INSERT INTO orders ...;
UPDATE inventory@[Link] ...;
COMMIT;

These SQL statements are part of a single distributed transaction, guaranteeing that all issued SQL
statements succeed or fail as a unit. Treating the statements as a unit prevents the possibility of an
order being placed and then inventory not being updated to reflect the order. In effect, the transaction
guarantees the consistency of data in the global database.
As each of the SQL statements in the transaction executes, the session tree is defined, as shown
in Figure 4-7.

Figure 4-7 Defining the Session Tree

Note the following aspects of the transaction:

 An order entry application running with the SALES database initiates the transaction.
Therefore, [Link] is the global coordinator for the distributed transaction.
 The order entry application inserts a new sales record into the SALES database and updates the
inventory at the warehouse. Therefore, the nodes [Link] and
[Link] are both database servers.
 Because [Link] updates the inventory, it is a client of
[Link].

This stage completes the definition of the session tree for this distributed transaction. Each node in the
tree has acquired the necessary data locks to execute the SQL statements that reference local data.
These locks remain even after the SQL statements have been executed until the two-phase commit is
completed.

Stage 2: Oracle Determines Commit Point Site

Oracle determines the commit point site immediately following the COMMIT statement.
[Link], the global coordinator, is determined to be the commit point site, as shown
in Figure 4-8.
Figure 4-8 Determining the Commit Point Site

Stage 3: Global Coordinator Sends Prepare Response

The prepare stage involves the following steps:

1. After Oracle determines the commit point site, the global coordinator sends the prepare message
to all directly referenced nodes of the session tree, excluding the commit point site. In this
example, [Link] is the only node asked to prepare.

2. [Link] tries to prepare. If a node can guarantee that it can commit the
locally dependent part of the transaction and can record the commit information in its local redo
log, then the node can successfully prepare. In this example, only [Link]
receives a prepare message because [Link] is the commit point site.

3. [Link] responds to [Link] with a prepared message.

As each node prepares, it sends a message back to the node that asked it to prepare. Depending on the
responses, one of the following can happen:

 If any of the nodes asked to prepare respond with an abort message to the global coordinator,
then the global coordinator tells all nodes to roll back the transaction, and the operation is
completed.
 If all nodes asked to prepare respond with a prepared or a read-only message to the global
coordinator, that is, they have successfully prepared, then the global coordinator asks the
commit point site to commit the transaction.

Figure 4-9 Sending and Acknowledging the Prepare Message


Stage 4: Commit Point Site Commits

The committing of the transaction by the commit point site involves the following steps:

1. [Link], receiving acknowledgment that [Link] is


prepared, instructs the commit point site to commit the transaction.

2. The commit point site now commits the transaction locally and records this fact in its local redo
log.

Even if [Link] has not yet committed, the outcome of this transaction is pre-
determined. In other words, the transaction will be committed at all nodes even if a given node's ability
to commit is delayed.

Stage 5: Commit Point Site Informs Global Coordinator of Commit

This stage involves the following steps:

1. The commit point site tells the global coordinator that the transaction has committed. Because
the commit point site and global coordinator are the same node in this example, no operation is
required. The commit point site knows that the transaction is committed because it recorded
this fact in its online log.

2. The global coordinator confirms that the transaction has been committed on all other nodes
involved in the distributed transaction.

Stage 6: Global and Local Coordinators Tell All Nodes to Commit

The committing of the transaction by all the nodes in the transaction involves the following steps:

1. After the global coordinator has been informed of the commit at the commit point site, it tells
all other directly referenced nodes to commit.

2. In turn, any local coordinators instruct their servers to commit, and so on.

3. Each node, including the global coordinator, commits the transaction and records appropriate
redo log entries locally. As each node commits, the resource locks that were being held locally
for that transaction are released.

In Figure 4-10, [Link], which is both the commit point site and the global coordinator,
has already committed the transaction locally. SALES now instructs [Link] to
commit the transaction.
Figure 4-10 Instructing Nodes to Commit

Stage 7: Global Coordinator and Commit Point Site Complete the Commit

The completion of the commit of the transaction occurs in the following steps:

1. After all referenced nodes and the global coordinator have committed the transaction, the global
coordinator informs the commit point site of this fact.

2. The commit point site, which has been waiting for this message, erases the status information
about this distributed transaction.

3. The commit point site informs the global coordinator that it is finished. In other words, the
commit point site forgets about committing the distributed transaction. This action is
permissible because all nodes involved in the two-phase commit have committed the
transaction successfully, so they will never have to determine its status in the future.

4. The global coordinator finalizes the transaction by forgetting about the transaction itself.

After the completion of the COMMIT phase, the distributed transaction is itself complete. The
steps described above are accomplished automatically and in a fraction of a second.

Query Processing in Distributed DBMS


A Query processing in a distributed database management system requires the transmission of data
between the computers in a network. A distribution strategy for a query is the ordering of data
transmissions and local data processing in a database system. Generally, a query in Distributed DBMS
requires data from multiple sites, and this need for data from different sites is called the transmission of
data that causes communication costs. Query processing in DBMS is different from query processing in
centralized DBMS due to this communication cost of data transfer over the network. The transmission
cost is low when sites are connected through high-speed Networks and is quite significant in other
networks.

1. Costs (Transfer of data) of Distributed Query processing :


In Distributed Query processing, the data transfer cost of distributed query processing means the cost of
transferring intermediate files to other sites for processing and therefore the cost of transferring the
ultimate result files to the location where that result’s required. Let’s say that a user sends a query to
site S1, which requires data from its own and also from another site S2. Now, there are three strategies
to process this query which are given below:
1. We can transfer the data from S2 to S1 and then process the query
2. We can transfer the data from S1 to S2 and then process the query
3. We can transfer the data from S1 and S2 to S3 and then process the query. So the choice
depends on various factors like, the size of relations and the results, the communication cost
between different sites, and at which the site result will be utilized.
Commonly, the data transfer cost is calculated in terms of the size of the messages. By using the below
formula, we can calculate the data transfer cost:
Data transfer cost = C * Size
Where C refers to the cost per byte of data transferring and Size is the no. of bytes transmitted.
Example: Consider the following table EMPLOYEE and DEPARTMENT.
Site1: EMPLOYEE
EID NAME SALARY DID

EID- 10 bytes
SALARY- 20 bytes
DID- 10 bytes
Name- 20 bytes
Total records- 1000
Record Size- 60 bytes
Site2: DEPARTMENT
DID DNAME

DID- 10 bytes
DName- 20 bytes
Total records- 50
Record Size- 30 bytes
Example : Find the name of employees and their department names. Also, find the amount of data
transfer to execute this query when the query is submitted to Site 3.
Answer : Considering the query is submitted at site 3 and neither of the two relations that is an
EMPLOYEE and the DEPARTMENT not available at site 3. So, to execute this query, we have three
strategies:
1. Transfer both the tables that is EMPLOYEE and DEPARTMENT at SITE 3 then join the
tables there. The total cost in this is 1000 * 60 + 50 * 30 = 60,000 + 1500 = 61500 bytes.
2. Transfer the table EMPLOYEE to SITE 2, join the table at SITE 2 and then transfer the result
at SITE 3. The total cost in this is 60 * 1000 + 60 * 1000 = 120000 bytes since we have to
transfer 1000 tuples having NAME and DNAME from site 1,
3. Transfer the table DEPARTMENT to SITE 1, join the table at SITE 2 join the table at site1
and then transfer the result at site3. The total cost is 30 * 50 + 60 * 1000 = 61500 bytes since
we have to transfer 1000 tuples having NAME and DNAME from site 1 to site 3 that is 60
bytes each.
Now, If the Optimisation criteria are to reduce the amount of data transfer, we can choose either 1 or 3
strategies from the above.
2. Using Semi join in Distributed Query processing :
The semi-join operation is used in distributed query processing to reduce the number of tuples in a table
before transmitting it to another site. This reduction in the number of tuples reduces the number and the
total size of the transmission that ultimately reducing the total cost of data transfer. Let’s say that we
have two tables R1, R2 on Site S1, and S2. Now, we will forward the joining column of one table say
R1 to the site where the other table say R2 is located. This column is joined with R2 at that site. The
decision whether to reduce R1 or R2 can only be made after comparing the advantages of reducing R1
with that of reducing R2. Thus, semi-join is a well-organized solution to reduce the transfer of data in
distributed query processing.
Example : Find the amount of data transferred to execute the same query given in the above
example using semi-join operation.
Answer : The following strategy can be used to execute the query.
1. Select all (or Project) the attributes of the EMPLOYEE table at site 1 and then transfer them
to site 3. For this, we will transfer NAME, DID(EMPLOYEE) and the size is 25 * 1000 =
25000 bytes.
2. Transfer the table DEPARTMENT to site 3 and join the projected attributes of EMPLOYEE
with this table. The size of the DEPARTMENT table is 25 * 50 = 1250
Applying the above scheme, the amount of data transferred to execute the query will be 25000 + 1250
= 26250 bytes.

Distributed Transaction Management


Definition
Distributed transaction management deals with the problems of always providing a consistent
distributed database in the presence of a large number of transactions (local and global) and failures
(communication link and/or site failures). This is accomplished through (i) distributed commit protocols
that guarantee atomicity property; (ii) distributed concurrency control techniques to ensure consistency
and isolation properties; and (iii) distributed recovery methods to preserve consistency and durability
when failures occur.

1) Distributed Commit Protocols :


In a local database system, for committing a transaction, the transaction manager has to only convey
the decision to commit to the recovery manager. However, in a distributed system, the transaction
manager should convey the decision to commit to all the servers in the various sites where the
transaction is being executed and uniformly enforce the decision. When processing is complete at each
site, it reaches the partially committed transaction state and waits for all other transactions to reach
their partially committed states. When it receives the message that all the sites are ready to commit, it
starts to commit. In a distributed system, either all sites commit or none of them does.
The different distributed commit protocols are −

 One-phase commit
 Two-phase commit
 Three-phase commit

Distributed One-phase Commit


Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a controlling
site and a number of slave sites where the transaction is being executed. The steps in distributed
commit are −
 After each slave has locally completed its transaction, it sends a “DONE” message to the
controlling site.
 The slaves wait for “Commit” or “Abort” message from the controlling site. This waiting time is
called window of vulnerability.
 When the controlling site receives “DONE” message from each slave, it makes a decision to
commit or abort. This is called the commit point. Then, it sends this message to all the slaves.
 On receiving this message, a slave either commits or aborts and then sends an
acknowledgement message to the controlling site.

Distributed Two-phase Commit


Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The steps
performed in the two phases are as follows −
Phase 1: Prepare Phase
 After each slave has locally completed its transaction, it sends a “DONE” message to the
controlling site. When the controlling site has received “DONE” message from all slaves, it sends
a “Prepare” message to the slaves.
 The slaves vote on whether they still want to commit or not. If a slave wants to commit, it sends
a “Ready” message.
 A slave that does not want to commit sends a “Not Ready” message. This may happen when the
slave has conflicting concurrent transactions or there is a timeout.
Phase 2: Commit/Abort Phase
 After the controlling site has received “Ready” message from all the slaves −
o The controlling site sends a “Global Commit” message to the slaves.

o The slaves apply the transaction and send a “Commit ACK” message to the controlling
site.
o When the controlling site receives “Commit ACK” message from all the slaves, it considers
the transaction as committed.
 After the controlling site has received the first “Not Ready” message from any slave −
o The controlling site sends a “Global Abort” message to the slaves.

o The slaves abort the transaction and send a “Abort ACK” message to the controlling site.

o When the controlling site receives “Abort ACK” message from all the slaves, it considers
the transaction as aborted.

Distributed Three-phase Commit


The steps in distributed three-phase commit are as follows −
Phase 1: Prepare Phase
The steps are same as in distributed two-phase commit.
Phase 2: Prepare to Commit Phase

 The controlling site issues an “Enter Prepared State” broadcast message.


 The slave sites vote “OK” in response.
Phase 3: Commit / Abort Phase
The steps are same as two-phase commit except that “Commit ACK”/”Abort ACK” message is not
required.

2)Distributed DBMS - Controlling Concurrency

Concurrency controlling techniques ensure that multiple transactions are executed simultaneously
while maintaining the ACID properties of the transactions and serializability in the schedules.
In this chapter, we will study the various approaches for concurrency control.
Locking Based Concurrency Control Protocols
Locking-based concurrency control protocols use the concept of locking data items. A lock is a variable
associated with a data item that determines whether read/write operations can be performed on that
data item. Generally, a lock compatibility matrix is used which states whether a data item can be locked
by two transactions at the same time.
Locking-based concurrency control systems can use either one-phase or two-phase locking protocols.

One-phase Locking Protocol


In this method, each transaction locks an item before use and releases the lock as soon as it has
finished using it. This locking method provides for maximum concurrency but does not always enforce
serializability.

Two-phase Locking Protocol


In this method, all locking operations precede the first lock-release or unlock operation. The transaction
comprise of two phases. In the first phase, a transaction only acquires all the locks it needs and do not
release any lock. This is called the expanding or the growing phase. In the second phase, the
transaction releases the locks and cannot request any new locks. This is called the shrinking phase.
Every transaction that follows two-phase locking protocol is guaranteed to be serializable. However,
this approach provides low parallelism between two conflicting transactions.

Timestamp Concurrency Control Algorithms


Timestamp-based concurrency control algorithms use a transaction’s timestamp to coordinate
concurrent access to a data item to ensure serializability. A timestamp is a unique identifier given by
DBMS to a transaction that represents the transaction’s start time.
These algorithms ensure that transactions commit in the order dictated by their timestamps. An older
transaction should commit before a younger transaction, since the older transaction enters the system
before the younger one.
Timestamp-based concurrency control techniques generate serializable schedules such that the
equivalent serial schedule is arranged in order of the age of the participating transactions.
Some of timestamp based concurrency control algorithms are −

 Basic timestamp ordering algorithm.


 Conservative timestamp ordering algorithm.
 Multiversion algorithm based upon timestamp ordering.
Timestamp based ordering follow three rules to enforce serializability −
 Access Rule − When two transactions try to access the same data item simultaneously, for
conflicting operations, priority is given to the older transaction. This causes the younger
transaction to wait for the older transaction to commit first.
 Late Transaction Rule − If a younger transaction has written a data item, then an older
transaction is not allowed to read or write that data item. This rule prevents the older transaction
from committing after the younger transaction has already committed.
 Younger Transaction Rule − A younger transaction can read or write a data item that has
already been written by an older transaction.

Optimistic Concurrency Control Algorithm


In systems with low conflict rates, the task of validating every transaction for serializability may lower
performance. In these cases, the test for serializability is postponed to just before commit. Since the
conflict rate is low, the probability of aborting transactions which are not serializable is also low. This
approach is called optimistic concurrency control technique.
In this approach, a transaction’s life cycle is divided into the following three phases −
 Execution Phase − A transaction fetches data items to memory and performs operations upon
them.
 Validation Phase − A transaction performs checks to ensure that committing its changes to the
database passes serializability test.
 Commit Phase − A transaction writes back modified data item in memory to the disk.
This algorithm uses three rules to enforce serializability in validation phase −
Rule 1 − Given two transactions Ti and Tj, if Ti is reading the data item which Tj is writing, then Ti’s
execution phase cannot overlap with Tj’s commit phase. Tj can commit only after Ti has finished
execution.
Rule 2 − Given two transactions Ti and Tj, if Ti is writing the data item that Tj is reading, then Ti’s commit
phase cannot overlap with T j’s execution phase. Tj can start executing only after Ti has already
committed.
Rule 3 − Given two transactions Ti and Tj, if Ti is writing the data item which Tj is also writing, then Ti’s
commit phase cannot overlap with T j’s commit phase. Tj can start to commit only after Ti has already
committed.

Concurrency Control in Distributed Systems


In this section, we will see how the above techniques are implemented in a distributed database system.

Distributed Two-phase Locking Algorithm


The basic principle of distributed two-phase locking is same as the basic two-phase locking protocol.
However, in a distributed system there are sites designated as lock managers. A lock manager controls
lock acquisition requests from transaction monitors. In order to enforce co-ordination between the lock
managers in various sites, at least one site is given the authority to see all transactions and detect lock
conflicts.
Depending upon the number of sites who can detect lock conflicts, distributed two-phase locking
approaches can be of three types −
 Centralized two-phase locking − In this approach, one site is designated as the central lock
manager. All the sites in the environment know the location of the central lock manager and
obtain lock from it during transactions.
 Primary copy two-phase locking − In this approach, a number of sites are designated as lock
control centers. Each of these sites has the responsibility of managing a defined set of locks. All
the sites know which lock control center is responsible for managing lock of which data
table/fragment item.
 Distributed two-phase locking − In this approach, there are a number of lock managers, where
each lock manager controls locks of data items stored at its local site. The location of the lock
manager is based upon data distribution and replication.

Distributed Timestamp Concurrency Control


In a centralized system, timestamp of any transaction is determined by the physical clock reading. But,
in a distributed system, any site’s local physical/logical clock readings cannot be used as global
timestamps, since they are not globally unique. So, a timestamp comprises of a combination of site ID
and that site’s clock reading.
For implementing timestamp ordering algorithms, each site has a scheduler that maintains a separate
queue for each transaction manager. During transaction, a transaction manager sends a lock request
to the site’s scheduler. The scheduler puts the request to the corresponding queue in increasing
timestamp order. Requests are processed from the front of the queues in the order of their timestamps,
i.e. the oldest first.

Conflict Graphs
Another method is to create conflict graphs. For this transaction classes are defined. A transaction class
contains two set of data items called read set and write set. A transaction belongs to a particular class
if the transaction’s read set is a subset of the class’ read set and the transaction’s write set is a subset
of the class’ write set. In the read phase, each transaction issues its read requests for the data items in
its read set. In the write phase, each transaction issues its write requests.
A conflict graph is created for the classes to which active transactions belong. This contains a set of
vertical, horizontal, and diagonal edges. A vertical edge connects two nodes within a class and denotes
conflicts within the class. A horizontal edge connects two nodes across two classes and denotes a
write-write conflict among different classes. A diagonal edge connects two nodes across two classes
and denotes a write-read or a read-write conflict among two classes.
The conflict graphs are analyzed to ascertain whether two transactions within the same class or across
two different classes can be run in parallel.

Distributed Optimistic Concurrency Control Algorithm


Distributed optimistic concurrency control algorithm extends optimistic concurrency control algorithm.
For this extension, two rules are applied −
Rule 1 − According to this rule, a transaction must be validated locally at all sites when it executes. If a
transaction is found to be invalid at any site, it is aborted. Local validation guarantees that the
transaction maintains serializability at the sites where it has been executed. After a transaction passes
local validation test, it is globally validated.
Rule 2 − According to this rule, after a transaction passes local validation test, it should be globally
validated. Global validation ensures that if two conflicting transactions run together at more than one
site, they should commit in the same relative order at all the sites they run together. This may require a
transaction to wait for the other conflicting transaction, after validation before commit. This requirement
makes the algorithm less optimistic since a transaction may not be able to commit as soon as it is
validated at a site.

3)Distributed Recovery Methods :


In order to recuperate from database failure, database management systems resort to a number of
recovery management techniques. In this chapter, we will study the different approaches for database
recovery.
The typical strategies for database recovery are −
 In case of soft failures that result in inconsistency of database, recovery strategy includes
transaction undo or rollback. However, sometimes, transaction redo may also be adopted to
recover to a consistent state of the transaction.
 In case of hard failures resulting in extensive damage to database, recovery strategies
encompass restoring a past copy of the database from archival backup. A more current state of
the database is obtained through redoing operations of committed transactions from transaction
log.

Recovery from Power Failure


Power failure causes loss of information in the non-persistent memory. When power is restored, the
operating system and the database management system restart. Recovery manager initiates recovery
from the transaction logs.
In case of immediate update mode, the recovery manager takes the following actions −
 Transactions which are in active list and failed list are undone and written on the abort list.
 Transactions which are in before-commit list are redone.
 No action is taken for transactions in commit or abort lists.
In case of deferred update mode, the recovery manager takes the following actions −
 Transactions which are in the active list and failed list are written onto the abort list. No undo
operations are required since the changes have not been written to the disk yet.
 Transactions which are in before-commit list are redone.
 No action is taken for transactions in commit or abort lists.

Recovery from Disk Failure


A disk failure or hard crash causes a total database loss. To recover from this hard crash, a new disk
is prepared, then the operating system is restored, and finally the database is recovered using the
database backup and transaction log. The recovery method is same for both immediate and deferred
update modes.
The recovery manager takes the following actions −
 The transactions in the commit list and before-commit list are redone and written onto the commit
list in the transaction log.
 The transactions in the active list and failed list are undone and written onto the abort list in the
transaction log.

Checkpointing
Checkpoint is a point of time at which a record is written onto the database from the buffers. As a
consequence, in case of a system crash, the recovery manager does not have to redo the transactions
that have been committed before checkpoint. Periodical checkpointing shortens the recovery process.
The two types of checkpointing techniques are −

 Consistent checkpointing
 Fuzzy checkpointing

Consistent Checkpointing
Consistent checkpointing creates a consistent image of the database at checkpoint. During recovery,
only those transactions which are on the right side of the last checkpoint are undone or redone. The
transactions to the left side of the last consistent checkpoint are already committed and needn’t be
processed again. The actions taken for checkpointing are −

 The active transactions are suspended temporarily.


 All changes in main-memory buffers are written onto the disk.
 A “checkpoint” record is written in the transaction log.
 The transaction log is written to the disk.
 The suspended transactions are resumed.
If in step 4, the transaction log is archived as well, then this checkpointing aids in recovery from disk
failures and power failures, otherwise it aids recovery from only power failures.
Fuzzy Checkpointing
In fuzzy checkpointing, at the time of checkpoint, all the active transactions are written in the log. In
case of power failure, the recovery manager processes only those transactions that were active during
checkpoint and later. The transactions that have been committed before checkpoint are written to the
disk and hence need not be redone.

Example of Checkpointing
Let us consider that in system the time of checkpointing is tcheck and the time of system crash is tfail.
Let there be four transactions T a, Tb, Tc and Td such that −
 Ta commits before checkpoint.
 Tb starts before checkpoint and commits before system crash.
 Tc starts after checkpoint and commits before system crash.
 Td starts after checkpoint and was active at the time of system crash.
The situation is depicted in the following diagram −

The actions that are taken by the recovery manager are −

 Nothing is done with Ta.


 Transaction redo is performed for Tb and Tc.
 Transaction undo is performed for T d.

Transaction Recovery Using UNDO / REDO


Transaction recovery is done to eliminate the adverse effects of faulty transactions rather than to
recover from a failure. Faulty transactions include all transactions that have changed the database into
undesired state and the transactions that have used values written by the faulty transactions.
Transaction recovery in these cases is a two-step process −
 UNDO all faulty transactions and transactions that may be affected by the faulty transactions.
 REDO all transactions that are not faulty but have been undone due to the faulty transactions.
Steps for the UNDO operation are −
 If the faulty transaction has done INSERT, the recovery manager deletes the data item(s)
inserted.
 If the faulty transaction has done DELETE, the recovery manager inserts the deleted data item(s)
from the log.
 If the faulty transaction has done UPDATE, the recovery manager eliminates the value by writing
the before-update value from the log.
Steps for the REDO operation are −
 If the transaction has done INSERT, the recovery manager generates an insert from the log.
 If the transaction has done DELETE, the recovery manager generates a delete from the log.
 If the transaction has done UPDATE, the recovery manager generates an update from the log.

event-condition-action rule (ECA rule)


An event-condition-action rule (ECA rule) is the method underlying event-driven computing, in which
actions are triggered by events, given the existence of specific conditions.

Events with significance to the system are identified within an event-driven program. An event could
be some user action, a transmission of sensor data or a message from some other program or system,
among an almost infinite number of other possibilities. The ECA rule specifies how events drive the
desired program responses. When an event with significance for the system occurs, the conditions are
checked for or evaluated; if the conditions exist or meet pre-established criteria, the appropriate action
is executed.

ECA rules originated in active databases and have since been used in areas
including personalization, big data management and business process automation. The model is being
explored for M2M (machine-to-machine) networking, Internet of Things (IoT), cognitive
computing and the Semantic Web.

Eg : ATM process

Event – ATM card insertion

Condition – ask the Password,whether the password match or not

Action – If Password matches then do the transaction

Active Databases
Active Database is a database consisting of set of triggers. These databases are very difficult to be
maintained because of the complexity that arises in understanding the effect of these triggers. In such
database, DBMS initially verifies whether the particular trigger specified in the statement that modifies
the database) is activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition part and then executes the action part only if
the specified condition is evaluated to true. It is possible to activate more than one trigger within a single
statement.
In such situation, DBMS processes each of the trigger randomly. The execution of an action part of a
trigger may either activate other triggers or the same trigger that Initialized this action. Such types of
trigger that activates itself is called as ‘recursive trigger’. The DBMS executes such chains of trigger in
some pre-defined manner but it effects the concept of understanding.

Features of Active Database:


1. It possess all the concepts of a conventional database i.e. data modelling facilities, query
language etc.
2. It supports all the functions of a traditional database like data definition, data manipulation,
storage management etc.
3. It supports definition and management of ECA rules.
4. It detects event occurrence.
5. It must be able to evaluate conditions and to execute actions.
6. It means that it has to implement rule execution.
Advantages :
1. Enhances traditional database functionalities with powerful rule processing capabilities.
2. Enable a uniform and centralized description of the business rules relevant to the information
system.
3. Avoids redundancy of checking and repair operations.
4. Suitable platform for building large and efficient knowledge base and expert systems.

A trigger is a procedure which is automatically invoked by the DBMS in response to changes to the
database, and is specified by the database administrator (DBA). A database with a set of associated
triggers is generally called an active database.

Parts of trigger
A triggers description contains three parts, which are as follows −

 Event − An event is a change to the database which activates the trigger.

 Condition − A query that is run when the trigger is activated is called as a condition.
 Action −A procedure which is executed when the trigger is activated and its condition is true.

Use of trigger
Triggers may be used for any of the following reasons −

 To implement any complex business rule, that cannot be implemented using integrity constraints.

 Triggers will be used to audit the process. For example, to keep track of changes made to a table.

 Trigger is used to perform automatic action when another concerned action takes place.

Types of triggers
The different types of triggers are explained below −

 Statement level trigger − It is fired only once for DML statement irrespective of number of rows
affected by statement. Statement-level triggers are the default type of trigger.

 Before-triggers − At the time of defining a trigger we can specify whether the trigger is to be fired
before a command like INSERT, DELETE, or UPDATE is executed or after the command is
executed. Before triggers are automatically used to check the validity of data before the action is
performed. For instance, we can use before trigger to prevent deletion of rows if deletion should
not be allowed in a given case.

 After-triggers − It is used after the triggering action is completed. For example, if the trigger is
associated with the INSERT command then it is fired after the row is inserted into the table.

 Row-level triggers − It is fired for each row that is affected by DML command. For example, if
an UPDATE command updates 150 rows then a row-level trigger is fired 150 times whereas a
statement-level trigger is fired only for once.

Create database trigger


To create a database trigger, we use the CREATE TRIGGER command. The details to be given at the
time of creating a trigger are as follows −

 Name of the trigger.


 Table to be associated with.
 When trigger is to be fired: before or after.
 Command that invokes the trigger- UPDATE, DELETE, or INSERT.
 Whether row-level triggers or not.
 Condition to filter rows.
 PL/SQL block is to be executed when trigger is fired.
The syntax to create database trigger is as follows −
CREATE [OR REPLACE] TRIGGER triggername
{BEFORE|AFTER}
{DELETE|INSERT|UPDATE[OF COLUMNS]} ON table
[FOR EACH ROW {WHEN condition]]
[REFERENCE [OLD AS old] [NEW AS new]]
BEGIN
PL/SQL BLOCK
END.

Design and Implementation Issues for Active Databases


we discuss some additional issues concerning how rules are designed and implemented. The first issue concerns
activation, deactivation, and grouping of rules. In addition to creating rules, an active database system should allow
users to activate, deactivate, and drop rules by referring to their rule names. A deactivated rule will not be triggered
by the triggering event. This feature allows users to selectively deactivate rules for certain periods of time when
they are not needed. The activate command will make the rule active again. The drop com- mand deletes the rule
from the system. Another option is to group rules into named rule sets , so the whole set of rules can be activated,
deactivated, or dropped. It is also useful to have a command that can trigger a rule or rule set via an explicit
PROCESS RULES command issued by the user.

The second issue concerns whether the triggered action should be executed before, after, instead of, or concurrently
with the triggering event. A before trigger executes the trigger before executing the event that caused the trigger.
It can be used in applications such as checking for constraint violations. An after trigger executes the trig- ger after
executing the event, and it can be used in applications such as maintaining derived data and monitoring for specific
events and conditions. An instead of trig- ger executes the trigger instead of executing the event, and it can be used
in applica- tions such as executing corresponding updates on base relations in response to an event that is an update
of a view.

A related issue is whether the action being executed should be considered as a separate transaction or whether it
should be part of the same transaction that triggered the rule. We will try to categorize the various options. It is
important to note that not all options may be available for a particular active database system. In fact, most com-
mercial systems are limited to one or two of the options that we will now discuss.

Let us assume that the triggering event occurs as part of a transaction execution. We should first consider the
various options for how the triggering event is related to the evaluation of the rule’s condition. The rule condition
evaluation is also known as rule consideration , since the action is to be executed only after considering whether
the condition evaluates to true or false. There are three main possibilities for rule consideration:

1. Immediate consideration. The condition is evaluated as part of the same transaction as the triggering event, and
is evaluated immediately. This case

can be further categorized into three options:

Evaluate the condition before executing the triggering event.

Evaluate the condition after executing the triggering event.

Evaluate the condition instead of executing the triggering event.

2. Deferred consideration. The condition is evaluated at the end of the trans- action that included the triggering
event. In this case, there could be many

3. Detached consideration. The condition is evaluated as a separate transac- tion, spawned from the triggering
transaction.

The next set of options concerns the relationship between evaluating the rule condi- tion and executing the rule
action. Here, again, three options are possible: immediate , deferred, or detached execution. Most active systems
use the first

option. That is, as soon as the condition is evaluated, if it returns true, the action is immediately executed.
The Oracle system uses the immediate consideration model, but it allows the user to specify for each rule whether
the before or after option is to be used with immediate condition evaluation. It also uses the immediate execution
model. The STARBURST system uses the deferred consideration option, meaning that all rules triggered by a
transaction wait until the triggering transaction reaches its end and issues its COMMIT WORK command before
the rule conditions are evaluated.

Another issue concerning active database rules is the distinction between row-level rules and statement-level rules.
Because SQL update statements (which act as trig- gering events) can specify a set of tuples, one has to distinguish
between whether the rule should be considered once for the whole statement or whether it should be con-sidered
separately for each row (that is, tuple) affected by the statement. The SQL-99 standard and the Oracle system
allow the user to choose which of the options is to be used for each rule, whereas STAR- BURST uses statement-
level semantics only.

One of the difficulties that may have limited the widespread use of active rules, in spite of their potential to simplify
database and software development, is that there are no easy-to-use techniques for designing, writing, and verifying
rules. For exam- ple, it is quite difficult to verify that a set of rules is consistent, meaning that two or more rules in
the set do not contradict one another. It is also difficult to guarantee termination of a set of rules under all
circumstances.

An example to illustrate the termination problem for active rules.

R1: CREATE TRIGGER T1

AFTER INSERT ON TABLE1 FOR EACH ROW

UPDATE TABLE2

SET Attribute1 = ... ;

R2: CREATE TRIGGER T2

AFTER UPDATE OF Attribute1 ON TABLE2 FOR EACH ROW

INSERT INTO TABLE1 VALUES ( ... );

To illustrate the termination problem briefly, consider the rules in Figure 26.4. Here, rule R1 is triggered by an
INSERT event on TABLE1 and its action includes an update event on Attribute1 of TABLE2 . However, rule R2
’s triggering event is an UPDATE event on Attribute1 of TABLE2 , and its action includes an INSERT event on
TABLE1 . In this example, it is easy to see that these two rules can trigger one another indefinitely, leading to
non- termination. However, if dozens of rules are written, it is very difficult to determine whether termination is
guaranteed or not.

If active rules are to reach their potential, it is necessary to develop tools for the design, debugging, and monitoring
of active rules that can help users design and debug their rules.

Microsoft Open Database Connectivity (ODBC)

What is ODBC?
Open Database Connectivity (ODBC) is an open standard Application Programming Interface (API) for accessing
a database. In 1992, Microsoft partners with Simba to build the world’s first ODBC driver; [Link], and
standards-based data access was born. By using ODBC statements in a program, you can access files in a number
of different common databases. In addition to the ODBC software, a separate module or driver is needed for each
database to be accessed.

ODBC History
Microsoft introduced the ODBC standard in 1992. ODBC was a standard designed to unify access to SQL
databases. Following the success of ODBC, Microsoft introduced OLE (object linking and embedding)DB which
was to be a broader data access standard. OLE DB was a data access standard that went beyond just SQL databases
and extended to any data source that could deliver data in tabular format. Microsoft’s plan was that OLE DB would
supplant ODBC as the most common data access standard. More recently, Microsoft introduced the ADO data
access standard. ADO was supposed to go further than OLE DB, in that ADO was more object oriented. However,
even with Microsoft’s very significant attempts to replace the ODBC standard with what were felt to be “better”
alternatives, ODBC has continued to be the de facto data access standard for SQL data sources. In fact, today the
ODBC standard is more common than OLE DB and ADO(Active x Data Object) because ODBC is widely
supported (including support from Oracle and IBM) and is a cross platform data access standard. Today, the most
common data access standards for SQL data sources continue to be ODBC and JDBC, and it is very likely that
standards like OLE DB and ADO will fade away over time.

ODBC Overview

ODBC has become the de-facto standard for standards-based data access in both relational and non-relational
database management systems (DBMS). Simba worked closely with Microsoft to co-develop the ODBC standard
back in the early 90’s. The ODBC standard enables maximum interoperability thereby enabling application
developers to write a single application to access data sources from different vendors. ODBC is based on the Call-
Level Interface (CLI) specifications from Open Group and ISO/IEC (International Organization for
Standardization/International Electrotechnical Commission)for database APIs and uses Structured Query
Language (SQL) as its database access language.

ODBC Architecture

The architecture of ODBC-based data connectivity is as follows:


ODBC Enabled Application

This is any ODBC compliant application, such as Microsoft Excel, Tableau, Crystal Reports, Microsoft Power BI,
or similar application (Spreadsheet, Word processor, Data Access & Retrievable Tool, etc.). The ODBC enabled
application performs processing by passing SQL Statements to and receiving results from the ODBC Driver
Manager.

ODBC Driver Manager


The ODBC Driver Manager loads and unloads ODBC drivers on behalf of an application. The Windows platform
comes with a default Driver Manager, while non-windows platforms have the choice to use an open source ODBC
Driver Manager like unixODBC and iODBC(independent ODBC). The ODBC Driver Manager processes ODBC
function calls, or passes them to an ODBC driver and resolves ODBC version conflicts.

ODBC Driver

The ODBC driver processes ODBC function calls, submits SQL requests to a specific data source and returns
results to the application. The ODBC driver may also modify an application’s request so that the request conforms
to syntax supported by the associated database. A framework to easily build an ODBC drivers is available from
Simba Technologies, as are ODBC drivers for many data sources, such as Salesforce, MongoDB, Spark and more.
The Simba SDK is available in C++, Java and C# and supports building drivers for Windows, OSX and many *Nix
distributions.

Data Source

A data source is simply the source of the data. It can be a file, a particular database on a DBMS, or even a live data
feed. The data might be located on the same computer as the program, or on another computer somewhere on a
network.

You might also like