Distributed Database Overview and Types
Distributed Database Overview and Types
Architectural Models
Some of the common architectural models are −
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since, each site has its own
copy of the entire database, queries are very fast requiring negligible communication cost. On the contrary, the
massive redundancy in data requires huge cost during update operations. Hence, this is suitable for systems where
a large number of queries is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables is done in
accordance to the frequency of access. This takes into consideration the fact that the frequency of accessing the
tables vary considerably from site to site. The number of copies of the tables (or portions) depends on how
frequently the access queries execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions, and each fragment
can be stored at different sites. This considers the fact that it seldom happens that all data stored in a table is
required at a given site. Moreover, fragmentation increases parallelism and provides better disaster recovery. Here,
there is only one copy of each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially fragmented in any
form (horizontal or vertical), and then these fragments are partially replicated across the different sites according
to the frequency of accessing the fragments.
A distributed database is basically a database that is not limited to one system, it is spread over different
sites, i.e, on multiple computers or over a network of computers. A distributed database system is located
on various sites that don’t share physical components. This may be required when a particular database
needs to be accessed by various users globally. It needs to be managed such that for the users it looks
like one single database.
Distributed Data Storage :
There are 2 ways in which data can be stored on different sites. These are:
[Link] –
In this approach, the entire relation is stored redundantly at 2 or more sites. If the entire database is
available at all sites, it is a fully redundant database. Hence, in replication, systems maintain copies of
data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests
can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made
at one site needs to be recorded at every site that relation is stored or else it may lead to inconsistency.
This is a lot of overhead. Also, concurrency control becomes way more complex as concurrent access
now needs to be checked over a number of sites.
[Link] –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the
fragments is stored in different sites where they’re required. It must be made sure that the fragments are
such that they can be used to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Flat Transaction
Nested Transaction
Concurrent Execution of the Sub-transactions is done which are at the same level – in the nested
transaction [Link], in the above diagram, T1 and T2 invoke objects on different servers and
hence they can run in parallel and are therefore concurrent.
T1.1, T1.2, T2.1, and T2.2 are four sub-transactions. These sub-transactions can also run in parallel.
Consider a distributed transaction (T) in which a customer transfers :
Rs. 105 from account A to account C and
Subsequently, Rs. 205 from account B to account D.
It can be viewed/ thought of as :
Transaction T :
Start
Transfer Rs 105 from A to C :
Deduct Rs 105 from A(withdraw from A) & Add Rs 105 to C(depopsit to C)
Transfer Rs 205 from B to D :
Deduct Rs 205 from B (withdraw from B)& Add Rs 205 to D(depopsit to D)
End
Assuming :
1. Account A is on server X
2. Account B is on server Y,and
3. Accounts C and D are on server Z.
The transaction T involves four requests – 2 for deposits and 2 for withdrawals. Now they can be treated
as sub transactions (T1, T2, T3, T4) of the transaction T.
As shown in the figure below, transaction T is designed as a set of four nested transactions : T1, T2, T3
and T4.
Advantage:
The performance is higher than a single transaction in which four operations are invoked one after the
other in sequence.
Nested Transaction
The following distributed transaction executed by SCOTT updates the local SALES database, the
remote HQ database, and the remote MAINT database:
UPDATE [Link]@[Link]
SET loc = 'REDWOOD SHORES'
WHERE deptno = 10;
UPDATE [Link]
SET deptno = 11
WHERE deptno = 10;
UPDATE [Link]@[Link]
SET room = 1225
WHERE room = 1163;
COMMIT;
Note:
If all statements of a transaction reference only a single remote node, then the transaction is
remote, not distributed.
The following list describes DML and DDL operations supported in a distributed transaction:
CREATE TABLE AS SELECT
DELETE
INSERT (default and direct load)
LOCK TABLE
SELECT
SELECT FOR UPDATE
You can execute DML and DDL statements in parallel, and INSERT direct load statements serially,
but note the following restrictions:
COMMIT
ROLLBACK
SAVEPOINT
Savepoint in SQL
o Savepoint is a command in SQL that is used with the rollback command.
o It is a command in Transaction Control Language that is used to mark the transaction in a table.
o Consider you are making a very long table, and you want to roll back only to a certain position in a table
then; this can be achieved using the savepoint.
o If you made a transaction in a table, you could mark the transaction as a certain name, and later on, if you
want to roll back to that point, you can do it easily by using the transaction's name.
o Savepoint is helpful when we want to roll back only a small part of a table and not the whole table. In
simple words, we can say savepoint is a bookmark in SQL.
Oracle8i defines a session tree of all nodes participating in a distributed transaction. A session tree is
a hierarchical model of the transaction that describes the relationships among the nodes that are
involved. Each node plays a role in the transaction. For example, the node that originates the
transaction is the global coordinator, and the node in charge of initiating a commit or rollback is called
the commit point site.
Two-Phase Commit Mechanism
Unlike a transaction on a local database, a distributed transaction involves altering data on multiple
databases. Consequently, distributed transaction processing is more complicated, because Oracle must
coordinate the committing or rolling back of the changes in a transaction as a self-contained unit. In
other words, the entire transaction commits, or the entire transactions rolls back.
Oracle ensures the integrity of data in a distributed transaction using the two-phase commit
mechanism. In the prepare phase, the initiating node in the transaction asks the other participating
nodes to promise to commit or roll back the transaction. During the commit phase, the initiating node
asks all participating nodes to commit the transaction; if this outcome is not possible, then all nodes
are asked to roll back.
All nodes participating in the session tree of a distributed transaction assume one or more of the
following roles:
Clients
A node acts as a client when it references information from another node's database. The referenced
node is a database server. In Figure 4-2, the node SALES is a client of the nodes that host the
WAREHOUSE and FINANCE databases.
Database Servers
A database server is a node that hosts a database from which a client requests data.
In Figure 4-2, an application at the SALES node initiates a distributed transaction that accesses data
from the WAREHOUSE and FINANCE nodes. Therefore, [Link] has the role of client
node, and WAREHOUSE and FINANCE are both database servers. In this example, SALES is a
database server and a client because the application also requests a change to the SALES database.
Local Coordinators
A node that must reference data on other nodes to complete its part in the distributed transaction is
called a local coordinator. In Figure 4-2, SALES is a local coordinator because it coordinates the
nodes it directly references: WAREHOUSE and FINANCE. SALES also happens to be the global
coordinator because it coordinates all the nodes involved in the transaction.
A local coordinator is responsible for coordinating the transaction among the nodes it communicates
directly with by:
Receiving and relaying transaction status information to and from those nodes.
Passing queries to those nodes.
Receiving queries from those nodes and passing them on to other nodes.
Returning the results of queries to the nodes that initiated them.
Global Coordinator
The node where the distributed transaction originates is called the global coordinator. The database
application issuing the distributed transaction is directly connected to the node acting as the global
coordinator. For example, in Figure 4-2, the transaction issued at the node SALES references
information from the database servers WAREHOUSE and FINANCE. Therefore,
[Link] is the global coordinator of this distributed transaction.
The global coordinator becomes the parent or root of the session tree. The global coordinator performs
the following operations during a distributed transaction:
Sends all of the distributed transaction's SQL statements, remote procedure calls, etc. to the
directly referenced nodes, thus forming the session tree.
Instructs all directly referenced nodes other than the commit point site to prepare the
transaction.
Instructs the commit point site to initiate the global commit of the transaction if all nodes
prepare successfully.
Instructs all nodes to initiate a global rollback of the transaction if there is an abort response.
Commit Point Site
The job of the commit point site is to initiate a commit or roll back operation as instructed by the
global coordinator. The system administrator always designates one node to be the commit point
site in the session tree by assigning all nodes a commit point strength. The node selected as commit
point site should be the node that stores the most critical data.
Figure 4-3 illustrates an example of distributed system, with SALES serving as the commit point site:
The commit point site is distinct from all other nodes involved in a distributed transaction in these
ways:
The commit point site never enters the prepared state. Consequently, if the commit point site
stores the most critical data, this data never remains in-doubt, even if a failure occurs. In failure
situations, failed nodes remain in a prepared state, holding necessary locks on data until in-
doubt transactions are resolved.
The commit point site commits before the other nodes involved in the transaction. In effect, the
outcome of a distributed transaction at the commit point site determines whether the transaction
at all nodes is committed or rolled back: the other nodes follow the lead of the commit point
site. The global coordinator ensures that all nodes complete the transaction in the same manner
as the commit point site.
A distributed transaction is considered committed after all non-commit point sites are prepared, and
the transaction has been actually committed at the commit point site. The online redo log at the commit
point site is updated as soon as the distributed transaction is committed at this node.
Because the commit point log contains a record of the commit, the transaction is considered committed
even though some participating nodes may still be only in the prepared state and the transaction not
yet actually committed at these nodes. In the same way, a distributed transaction is
considered not committed if the commit has not been logged at the commit point site.
Every database server must be assigned a commit point strength. If a database server is referenced in
a distributed transaction, the value of its commit point strength determines which role it plays in the
two-phase commit. Specifically, the commit point strength determines whether a given node is the
commit point site in the distributed transaction and thus commits before all of the other nodes. This
value is specified using the initialization parameter COMMIT_POINT_STRENGTH.
The commit point site, which is determined at the beginning of the prepare phase, is selected only
from the nodes participating in the transaction. The following sequence of events occurs:
1. Of the nodes directly referenced by the global coordinator, Oracle selects the node with the
highest commit point strength as the commit point site.
2. The initially-selected node determines if any of the nodes from which it has to obtain
information for this transaction has a higher commit point strength.
3. Either the node with the highest commit point strength directly referenced in the transaction or
one of its servers with a higher commit point strength becomes the commit point site.
4. After the final commit point site has been determined, the global coordinator sends prepare
responses to all nodes participating in the transaction.
Figure 4-4 shows in a sample session tree the commit point strengths of each node (in parentheses)
and shows the node chosen as the commit point site:
Figure 4-4 Commit Point Strengths and Determination of the Commit Point Site
The following conditions apply when determining the commit point site:
The commit mechanism has the following distinct phases, which Oracle performs automatically
whenever a user commits a distributed transaction:
prepare The initiating node, called the global coordinator, asks participating nodes other than the commit point
phase site to promise to commit or roll back the transaction, even if there is a failure. If any node cannot
prepare, the transaction is rolled back.
commit If all participants respond to the coordinator that they are prepared, then the coordinator asks the
phase commit point site to commit. After it commits, the coordinator asks all other nodes to commit the
transaction.
forget The global coordinator forgets about the transaction.
phase
Prepare Phase
Commit Phase
Forget Phase
Prepare Phase
The first phase in committing a distributed transaction is the prepare phase. In this phase, Oracle does
not actually commit or roll back the transaction. Instead, all nodes referenced in a distributed
transaction (except the commit point site, described in the "Commit Point Site") are told to prepare to
commit. By preparing, a node:
Records information in the online redo logs so that it can subsequently either commit or roll
back the transaction, regardless of intervening failures.
Places a distributed lock on modified tables, which prevents reads.
When a node responds to the global coordinator that it is prepared to commit, the prepared
node promises to either commit or roll back the transaction later--but does not make a unilateral
decision on whether to commit or roll back the transaction. The promise means that if an instance
failure occurs at this point, the node can use the redo records in the online log to recover the database
back to the prepare phase.
Note:
Queries that start after a node has prepared cannot access the associated locked data until
all phases complete.
prepared Data on the node has been modified by a statement in the distributed transaction, and the node has
successfully prepared.
read- No data on the node has been, or can be, modified (only queried), so no preparation is necessary.
only
abort The node cannot successfully prepare.
Prepared Response
When a node has successfully prepared, it issues a prepared message. The message indicates that the
node has records of the changes in the online log, so it is prepared either to commit or perform a
rollback. The message also guarantees that locks held for the transaction can survive a failure.
Read-Only Response
When a node is asked to prepare, and the SQL statements affecting the database do not change the
node's data, the node responds with a read-only message. The message indicates that the node will not
participate in the commit phase.
There are three cases in which all or part of a distributed transaction is read-only:
Partially read- Any of the following occurs: The read-only nodes recognize their status when asked
only to prepare. They give their local coordinators a read-
Only queries are issued at one only response. Thus, the commit phase completes faster
or more nodes. because Oracle eliminates read-only nodes from
No data is changed. subsequent processing.
Case Conditions Consequence
Completely read- All of following occur: All nodes recognize that they are read-only during
only with prepare phase, so no commit phase is required. The
prepare phase No data changes. global coordinator, not knowing whether all nodes are
Transaction is not started read-only, must still perform the prepare phase.
with SET TRANSACTION
READ ONLY statement.
Completely read- All of following occur: Only queries are allowed in the transaction, so global
only without coordinator does not have to perform two-phase
two-phase No data changes. commit. Changes by other transactions do not degrade
commit Transaction is started with global transaction-level read consistency because of
SET TRANSACTION READ global SCN coordination among nodes. The transaction
ONLY statement. does not use rollback segments.
Note that if a distributed transaction is set to read-only, then it does not use rollback segments. If many
users connect to the database and their transactions are not set to READ ONLY, then they allocate
rollback space even if they are only performing queries.
Abort Response
1. Releases resources currently held by the transaction and rolls back the local portion of the
transaction.
2. Responds to the node that referenced it in the distributed transaction with an abort message.
These actions then propagate to the other nodes involved in the distributed transaction so that they can
roll back the transaction and guarantee the integrity of the data in the global database. This response
enforces the primary rule of a distributed transaction: all nodes involved in the transaction either all
commit or all roll back the transaction at the same logical time.
To complete the prepare phase, each node excluding the commit point site performs the following
steps:
1. The node requests that its descendants, that is, the nodes subsequently referenced, prepare to
commit.
2. The node checks to see whether the transaction changes data on itself or its descendants. If there
is no change to the data, then the node skips the remaining steps and returns a read-only
response
3. The node allocates the resources it needs to commit the transaction if data is changed.
4. The node saves redo records corresponding to changes made by the transaction to its online
redo log.
5. The node guarantees that locks held for the transaction are able to survive a failure.
6. The node responds to the initiating node with a prepared response , if its attempt or the attempt
of one of its descendents to prepare was unsuccessful, with an abort response.
These actions guarantee that the node can subsequently commit or roll back the transaction on the
node. The prepared nodes then wait until a COMMIT or ROLLBACK request is received from the
global coordinator.
After the nodes are prepared, the distributed transaction is said to be in-doubt . It retains in-doubt
status until all changes are either committed or rolled back.
Commit Phase
The second phase in committing a distributed transaction is the commit phase. Before this phase
occurs, all nodes other than the commit point site referenced in the distributed transaction have
guaranteed that they are prepared, that is, they have the necessary resources to commit the transaction.
3. The commit point site informs the global coordinator that it has committed.
4. The global and local coordinators send a message to all nodes instructing them to commit the
transaction.
5. At each node, Oracle8i commits the local portion of the distributed transaction and releases
locks.
6. At each node, Oracle8i records an additional redo entry in the local redo log, indicating that the
transaction has committed.
7. The participating nodes notify the global coordinator that they have committed.
When the commit phase is complete, the data on all nodes of the distributed system is consistent with
one another.
Guaranteeing Global Database Consistency
Each committed transaction has an associated system change number (SCN) to uniquely identify the
changes made by the SQL statements within that transaction. The SCN functions as an internal Oracle
timestamp that uniquely identifies a committed version of the database.
In a distributed system, the SCNs of communicating nodes are coordinated when all of the following
actions occur:
A connection occurs using the path described by one or more database links.
A distributed SQL statement executes.
A distributed transaction commits.
Among other benefits, the coordination of SCNs among the nodes of a distributed system ensures
global read-consistency at both the statement and transaction level. If necessary, global time-based
recovery can also be completed.
During the prepare phase, Oracle8i determines the highest SCN at all nodes involved in the
transaction. The transaction then commits with the high SCN at the commit point site. The commit
SCN is then sent to all prepared nodes with the commit decision.
Forget Phase
After the participating nodes notify the commit point site that they have committed, the commit point
site can forget about the transaction. The following steps occur:
1. After receiving notice from the global coordinator that all nodes have committed, the commit
point site erases status information about this transaction.
2. The commit point site informs the global coordinator that it has erased the status information.
3. The global coordinator erases its own information about the transaction.
In-Doubt Transactions
The two-phase commit mechanism ensures that all nodes either commit or perform a rollback together.
What happens if any of the three phases fails because of a system or network error? The transaction
becomes in-doubt.
The RECO (recovery)process automatically resolves in-doubt transactions when the machine,
network, or software problem is resolved. Until RECO can resolve the transaction, the data is locked
for both reads and writes. Oracle blocks reads because it cannot determine which version of the data
to display for a query.
This section contains the following topics:
In the majority of cases, Oracle resolves the in-doubt transaction automatically. Assume that there are
two nodes, LOCAL and REMOTE, in the following scenarios. The local node is the commit point
site. User SCOTT connects to LOCAL and executes and commits a distributed transaction that updates
LOCAL and REMOTE.
Figure 4-5 illustrates the sequence of events when there is a failure during the prepare phase of a
distributed transaction:
2. The global coordinator, which in this example is also the commit point site, requests all
databases other than the commit point site to promise to commit or roll back when told to do
so.
3. The REMOTE database crashes before issuing the prepare response back to LOCAL.
4. The transaction is ultimately rolled back on each database by the RECO process when the
remote site is restored.
Figure 4-5 illustrates the sequence of events when there is a failure during the commit phase of a
distributed transaction:
Figure 4-6 Failure During Prepare Phase
2. The global coordinator, which in this case is also the commit point site, requests all databases
other than the commit point site to promise to commit or roll back when told to do so.
3. The commit point site receives a prepare message from REMOTE saying that it will commit.
4. The commit point site commits the transaction locally, then sends a commit message to
REMOTE asking it to commit.
5. The REMOTE database receives the commit message, but cannot respond because of a network
failure.
6. The transaction is ultimately committed on the remote database by the RECO process after the
network is restored.
You should only need to resolve an in-doubt transaction in the following cases:
Resolution of in-doubt transactions can be complicated. The procedure requires that you do the
following:
A system change number (SCN) is an internal timestamp for a committed version of the database. The
Oracle database server uses the SCN clock value to guarantee transaction consistency. For example,
when a user commits a transaction, Oracle records an SCN for this commit in the online redo log.
Oracle uses SCNs to coordinate distributed transactions among different databases. For example,
Oracle uses SCNs in the following way:
2. The distributed transaction commits with the highest global SCN among all the databases
involved.
3. The commit global SCN is sent to all databases involved in the transaction.
SCNs are important for distributed transactions because they function as a synchronized commit
timestamp of a transaction--even if the transaction fails. If a transaction becomes in-doubt, an
administrator can use this SCN to coordinate changes made to the global database. The global SCN
for the transaction commit can also be used to identify the transaction later, for example, in distributed
recovery.
At the Sales department, a salesperson uses SQL*Plus to enter a sales order and then commit it. The
application issues a number of SQL statements to enter the order into the SALES database and update
the inventory in the WAREHOUSE database:
CONNECT scott/tiger@[Link] ...;
INSERT INTO orders ...;
UPDATE inventory@[Link] ...;
INSERT INTO orders ...;
UPDATE inventory@[Link] ...;
COMMIT;
These SQL statements are part of a single distributed transaction, guaranteeing that all issued SQL
statements succeed or fail as a unit. Treating the statements as a unit prevents the possibility of an
order being placed and then inventory not being updated to reflect the order. In effect, the transaction
guarantees the consistency of data in the global database.
As each of the SQL statements in the transaction executes, the session tree is defined, as shown
in Figure 4-7.
An order entry application running with the SALES database initiates the transaction.
Therefore, [Link] is the global coordinator for the distributed transaction.
The order entry application inserts a new sales record into the SALES database and updates the
inventory at the warehouse. Therefore, the nodes [Link] and
[Link] are both database servers.
Because [Link] updates the inventory, it is a client of
[Link].
This stage completes the definition of the session tree for this distributed transaction. Each node in the
tree has acquired the necessary data locks to execute the SQL statements that reference local data.
These locks remain even after the SQL statements have been executed until the two-phase commit is
completed.
Oracle determines the commit point site immediately following the COMMIT statement.
[Link], the global coordinator, is determined to be the commit point site, as shown
in Figure 4-8.
Figure 4-8 Determining the Commit Point Site
1. After Oracle determines the commit point site, the global coordinator sends the prepare message
to all directly referenced nodes of the session tree, excluding the commit point site. In this
example, [Link] is the only node asked to prepare.
2. [Link] tries to prepare. If a node can guarantee that it can commit the
locally dependent part of the transaction and can record the commit information in its local redo
log, then the node can successfully prepare. In this example, only [Link]
receives a prepare message because [Link] is the commit point site.
As each node prepares, it sends a message back to the node that asked it to prepare. Depending on the
responses, one of the following can happen:
If any of the nodes asked to prepare respond with an abort message to the global coordinator,
then the global coordinator tells all nodes to roll back the transaction, and the operation is
completed.
If all nodes asked to prepare respond with a prepared or a read-only message to the global
coordinator, that is, they have successfully prepared, then the global coordinator asks the
commit point site to commit the transaction.
The committing of the transaction by the commit point site involves the following steps:
2. The commit point site now commits the transaction locally and records this fact in its local redo
log.
Even if [Link] has not yet committed, the outcome of this transaction is pre-
determined. In other words, the transaction will be committed at all nodes even if a given node's ability
to commit is delayed.
1. The commit point site tells the global coordinator that the transaction has committed. Because
the commit point site and global coordinator are the same node in this example, no operation is
required. The commit point site knows that the transaction is committed because it recorded
this fact in its online log.
2. The global coordinator confirms that the transaction has been committed on all other nodes
involved in the distributed transaction.
The committing of the transaction by all the nodes in the transaction involves the following steps:
1. After the global coordinator has been informed of the commit at the commit point site, it tells
all other directly referenced nodes to commit.
2. In turn, any local coordinators instruct their servers to commit, and so on.
3. Each node, including the global coordinator, commits the transaction and records appropriate
redo log entries locally. As each node commits, the resource locks that were being held locally
for that transaction are released.
In Figure 4-10, [Link], which is both the commit point site and the global coordinator,
has already committed the transaction locally. SALES now instructs [Link] to
commit the transaction.
Figure 4-10 Instructing Nodes to Commit
Stage 7: Global Coordinator and Commit Point Site Complete the Commit
The completion of the commit of the transaction occurs in the following steps:
1. After all referenced nodes and the global coordinator have committed the transaction, the global
coordinator informs the commit point site of this fact.
2. The commit point site, which has been waiting for this message, erases the status information
about this distributed transaction.
3. The commit point site informs the global coordinator that it is finished. In other words, the
commit point site forgets about committing the distributed transaction. This action is
permissible because all nodes involved in the two-phase commit have committed the
transaction successfully, so they will never have to determine its status in the future.
4. The global coordinator finalizes the transaction by forgetting about the transaction itself.
After the completion of the COMMIT phase, the distributed transaction is itself complete. The
steps described above are accomplished automatically and in a fraction of a second.
EID- 10 bytes
SALARY- 20 bytes
DID- 10 bytes
Name- 20 bytes
Total records- 1000
Record Size- 60 bytes
Site2: DEPARTMENT
DID DNAME
DID- 10 bytes
DName- 20 bytes
Total records- 50
Record Size- 30 bytes
Example : Find the name of employees and their department names. Also, find the amount of data
transfer to execute this query when the query is submitted to Site 3.
Answer : Considering the query is submitted at site 3 and neither of the two relations that is an
EMPLOYEE and the DEPARTMENT not available at site 3. So, to execute this query, we have three
strategies:
1. Transfer both the tables that is EMPLOYEE and DEPARTMENT at SITE 3 then join the
tables there. The total cost in this is 1000 * 60 + 50 * 30 = 60,000 + 1500 = 61500 bytes.
2. Transfer the table EMPLOYEE to SITE 2, join the table at SITE 2 and then transfer the result
at SITE 3. The total cost in this is 60 * 1000 + 60 * 1000 = 120000 bytes since we have to
transfer 1000 tuples having NAME and DNAME from site 1,
3. Transfer the table DEPARTMENT to SITE 1, join the table at SITE 2 join the table at site1
and then transfer the result at site3. The total cost is 30 * 50 + 60 * 1000 = 61500 bytes since
we have to transfer 1000 tuples having NAME and DNAME from site 1 to site 3 that is 60
bytes each.
Now, If the Optimisation criteria are to reduce the amount of data transfer, we can choose either 1 or 3
strategies from the above.
2. Using Semi join in Distributed Query processing :
The semi-join operation is used in distributed query processing to reduce the number of tuples in a table
before transmitting it to another site. This reduction in the number of tuples reduces the number and the
total size of the transmission that ultimately reducing the total cost of data transfer. Let’s say that we
have two tables R1, R2 on Site S1, and S2. Now, we will forward the joining column of one table say
R1 to the site where the other table say R2 is located. This column is joined with R2 at that site. The
decision whether to reduce R1 or R2 can only be made after comparing the advantages of reducing R1
with that of reducing R2. Thus, semi-join is a well-organized solution to reduce the transfer of data in
distributed query processing.
Example : Find the amount of data transferred to execute the same query given in the above
example using semi-join operation.
Answer : The following strategy can be used to execute the query.
1. Select all (or Project) the attributes of the EMPLOYEE table at site 1 and then transfer them
to site 3. For this, we will transfer NAME, DID(EMPLOYEE) and the size is 25 * 1000 =
25000 bytes.
2. Transfer the table DEPARTMENT to site 3 and join the projected attributes of EMPLOYEE
with this table. The size of the DEPARTMENT table is 25 * 50 = 1250
Applying the above scheme, the amount of data transferred to execute the query will be 25000 + 1250
= 26250 bytes.
One-phase commit
Two-phase commit
Three-phase commit
o The slaves apply the transaction and send a “Commit ACK” message to the controlling
site.
o When the controlling site receives “Commit ACK” message from all the slaves, it considers
the transaction as committed.
After the controlling site has received the first “Not Ready” message from any slave −
o The controlling site sends a “Global Abort” message to the slaves.
o The slaves abort the transaction and send a “Abort ACK” message to the controlling site.
o When the controlling site receives “Abort ACK” message from all the slaves, it considers
the transaction as aborted.
Concurrency controlling techniques ensure that multiple transactions are executed simultaneously
while maintaining the ACID properties of the transactions and serializability in the schedules.
In this chapter, we will study the various approaches for concurrency control.
Locking Based Concurrency Control Protocols
Locking-based concurrency control protocols use the concept of locking data items. A lock is a variable
associated with a data item that determines whether read/write operations can be performed on that
data item. Generally, a lock compatibility matrix is used which states whether a data item can be locked
by two transactions at the same time.
Locking-based concurrency control systems can use either one-phase or two-phase locking protocols.
Conflict Graphs
Another method is to create conflict graphs. For this transaction classes are defined. A transaction class
contains two set of data items called read set and write set. A transaction belongs to a particular class
if the transaction’s read set is a subset of the class’ read set and the transaction’s write set is a subset
of the class’ write set. In the read phase, each transaction issues its read requests for the data items in
its read set. In the write phase, each transaction issues its write requests.
A conflict graph is created for the classes to which active transactions belong. This contains a set of
vertical, horizontal, and diagonal edges. A vertical edge connects two nodes within a class and denotes
conflicts within the class. A horizontal edge connects two nodes across two classes and denotes a
write-write conflict among different classes. A diagonal edge connects two nodes across two classes
and denotes a write-read or a read-write conflict among two classes.
The conflict graphs are analyzed to ascertain whether two transactions within the same class or across
two different classes can be run in parallel.
Checkpointing
Checkpoint is a point of time at which a record is written onto the database from the buffers. As a
consequence, in case of a system crash, the recovery manager does not have to redo the transactions
that have been committed before checkpoint. Periodical checkpointing shortens the recovery process.
The two types of checkpointing techniques are −
Consistent checkpointing
Fuzzy checkpointing
Consistent Checkpointing
Consistent checkpointing creates a consistent image of the database at checkpoint. During recovery,
only those transactions which are on the right side of the last checkpoint are undone or redone. The
transactions to the left side of the last consistent checkpoint are already committed and needn’t be
processed again. The actions taken for checkpointing are −
Example of Checkpointing
Let us consider that in system the time of checkpointing is tcheck and the time of system crash is tfail.
Let there be four transactions T a, Tb, Tc and Td such that −
Ta commits before checkpoint.
Tb starts before checkpoint and commits before system crash.
Tc starts after checkpoint and commits before system crash.
Td starts after checkpoint and was active at the time of system crash.
The situation is depicted in the following diagram −
Events with significance to the system are identified within an event-driven program. An event could
be some user action, a transmission of sensor data or a message from some other program or system,
among an almost infinite number of other possibilities. The ECA rule specifies how events drive the
desired program responses. When an event with significance for the system occurs, the conditions are
checked for or evaluated; if the conditions exist or meet pre-established criteria, the appropriate action
is executed.
ECA rules originated in active databases and have since been used in areas
including personalization, big data management and business process automation. The model is being
explored for M2M (machine-to-machine) networking, Internet of Things (IoT), cognitive
computing and the Semantic Web.
Eg : ATM process
Active Databases
Active Database is a database consisting of set of triggers. These databases are very difficult to be
maintained because of the complexity that arises in understanding the effect of these triggers. In such
database, DBMS initially verifies whether the particular trigger specified in the statement that modifies
the database) is activated or not, prior to executing the statement.
If the trigger is active then DBMS executes the condition part and then executes the action part only if
the specified condition is evaluated to true. It is possible to activate more than one trigger within a single
statement.
In such situation, DBMS processes each of the trigger randomly. The execution of an action part of a
trigger may either activate other triggers or the same trigger that Initialized this action. Such types of
trigger that activates itself is called as ‘recursive trigger’. The DBMS executes such chains of trigger in
some pre-defined manner but it effects the concept of understanding.
A trigger is a procedure which is automatically invoked by the DBMS in response to changes to the
database, and is specified by the database administrator (DBA). A database with a set of associated
triggers is generally called an active database.
Parts of trigger
A triggers description contains three parts, which are as follows −
Condition − A query that is run when the trigger is activated is called as a condition.
Action −A procedure which is executed when the trigger is activated and its condition is true.
Use of trigger
Triggers may be used for any of the following reasons −
To implement any complex business rule, that cannot be implemented using integrity constraints.
Triggers will be used to audit the process. For example, to keep track of changes made to a table.
Trigger is used to perform automatic action when another concerned action takes place.
Types of triggers
The different types of triggers are explained below −
Statement level trigger − It is fired only once for DML statement irrespective of number of rows
affected by statement. Statement-level triggers are the default type of trigger.
Before-triggers − At the time of defining a trigger we can specify whether the trigger is to be fired
before a command like INSERT, DELETE, or UPDATE is executed or after the command is
executed. Before triggers are automatically used to check the validity of data before the action is
performed. For instance, we can use before trigger to prevent deletion of rows if deletion should
not be allowed in a given case.
After-triggers − It is used after the triggering action is completed. For example, if the trigger is
associated with the INSERT command then it is fired after the row is inserted into the table.
Row-level triggers − It is fired for each row that is affected by DML command. For example, if
an UPDATE command updates 150 rows then a row-level trigger is fired 150 times whereas a
statement-level trigger is fired only for once.
The second issue concerns whether the triggered action should be executed before, after, instead of, or concurrently
with the triggering event. A before trigger executes the trigger before executing the event that caused the trigger.
It can be used in applications such as checking for constraint violations. An after trigger executes the trig- ger after
executing the event, and it can be used in applications such as maintaining derived data and monitoring for specific
events and conditions. An instead of trig- ger executes the trigger instead of executing the event, and it can be used
in applica- tions such as executing corresponding updates on base relations in response to an event that is an update
of a view.
A related issue is whether the action being executed should be considered as a separate transaction or whether it
should be part of the same transaction that triggered the rule. We will try to categorize the various options. It is
important to note that not all options may be available for a particular active database system. In fact, most com-
mercial systems are limited to one or two of the options that we will now discuss.
Let us assume that the triggering event occurs as part of a transaction execution. We should first consider the
various options for how the triggering event is related to the evaluation of the rule’s condition. The rule condition
evaluation is also known as rule consideration , since the action is to be executed only after considering whether
the condition evaluates to true or false. There are three main possibilities for rule consideration:
1. Immediate consideration. The condition is evaluated as part of the same transaction as the triggering event, and
is evaluated immediately. This case
2. Deferred consideration. The condition is evaluated at the end of the trans- action that included the triggering
event. In this case, there could be many
3. Detached consideration. The condition is evaluated as a separate transac- tion, spawned from the triggering
transaction.
The next set of options concerns the relationship between evaluating the rule condi- tion and executing the rule
action. Here, again, three options are possible: immediate , deferred, or detached execution. Most active systems
use the first
option. That is, as soon as the condition is evaluated, if it returns true, the action is immediately executed.
The Oracle system uses the immediate consideration model, but it allows the user to specify for each rule whether
the before or after option is to be used with immediate condition evaluation. It also uses the immediate execution
model. The STARBURST system uses the deferred consideration option, meaning that all rules triggered by a
transaction wait until the triggering transaction reaches its end and issues its COMMIT WORK command before
the rule conditions are evaluated.
Another issue concerning active database rules is the distinction between row-level rules and statement-level rules.
Because SQL update statements (which act as trig- gering events) can specify a set of tuples, one has to distinguish
between whether the rule should be considered once for the whole statement or whether it should be con-sidered
separately for each row (that is, tuple) affected by the statement. The SQL-99 standard and the Oracle system
allow the user to choose which of the options is to be used for each rule, whereas STAR- BURST uses statement-
level semantics only.
One of the difficulties that may have limited the widespread use of active rules, in spite of their potential to simplify
database and software development, is that there are no easy-to-use techniques for designing, writing, and verifying
rules. For exam- ple, it is quite difficult to verify that a set of rules is consistent, meaning that two or more rules in
the set do not contradict one another. It is also difficult to guarantee termination of a set of rules under all
circumstances.
UPDATE TABLE2
To illustrate the termination problem briefly, consider the rules in Figure 26.4. Here, rule R1 is triggered by an
INSERT event on TABLE1 and its action includes an update event on Attribute1 of TABLE2 . However, rule R2
’s triggering event is an UPDATE event on Attribute1 of TABLE2 , and its action includes an INSERT event on
TABLE1 . In this example, it is easy to see that these two rules can trigger one another indefinitely, leading to
non- termination. However, if dozens of rules are written, it is very difficult to determine whether termination is
guaranteed or not.
If active rules are to reach their potential, it is necessary to develop tools for the design, debugging, and monitoring
of active rules that can help users design and debug their rules.
What is ODBC?
Open Database Connectivity (ODBC) is an open standard Application Programming Interface (API) for accessing
a database. In 1992, Microsoft partners with Simba to build the world’s first ODBC driver; [Link], and
standards-based data access was born. By using ODBC statements in a program, you can access files in a number
of different common databases. In addition to the ODBC software, a separate module or driver is needed for each
database to be accessed.
ODBC History
Microsoft introduced the ODBC standard in 1992. ODBC was a standard designed to unify access to SQL
databases. Following the success of ODBC, Microsoft introduced OLE (object linking and embedding)DB which
was to be a broader data access standard. OLE DB was a data access standard that went beyond just SQL databases
and extended to any data source that could deliver data in tabular format. Microsoft’s plan was that OLE DB would
supplant ODBC as the most common data access standard. More recently, Microsoft introduced the ADO data
access standard. ADO was supposed to go further than OLE DB, in that ADO was more object oriented. However,
even with Microsoft’s very significant attempts to replace the ODBC standard with what were felt to be “better”
alternatives, ODBC has continued to be the de facto data access standard for SQL data sources. In fact, today the
ODBC standard is more common than OLE DB and ADO(Active x Data Object) because ODBC is widely
supported (including support from Oracle and IBM) and is a cross platform data access standard. Today, the most
common data access standards for SQL data sources continue to be ODBC and JDBC, and it is very likely that
standards like OLE DB and ADO will fade away over time.
ODBC Overview
ODBC has become the de-facto standard for standards-based data access in both relational and non-relational
database management systems (DBMS). Simba worked closely with Microsoft to co-develop the ODBC standard
back in the early 90’s. The ODBC standard enables maximum interoperability thereby enabling application
developers to write a single application to access data sources from different vendors. ODBC is based on the Call-
Level Interface (CLI) specifications from Open Group and ISO/IEC (International Organization for
Standardization/International Electrotechnical Commission)for database APIs and uses Structured Query
Language (SQL) as its database access language.
ODBC Architecture
This is any ODBC compliant application, such as Microsoft Excel, Tableau, Crystal Reports, Microsoft Power BI,
or similar application (Spreadsheet, Word processor, Data Access & Retrievable Tool, etc.). The ODBC enabled
application performs processing by passing SQL Statements to and receiving results from the ODBC Driver
Manager.
ODBC Driver
The ODBC driver processes ODBC function calls, submits SQL requests to a specific data source and returns
results to the application. The ODBC driver may also modify an application’s request so that the request conforms
to syntax supported by the associated database. A framework to easily build an ODBC drivers is available from
Simba Technologies, as are ODBC drivers for many data sources, such as Salesforce, MongoDB, Spark and more.
The Simba SDK is available in C++, Java and C# and supports building drivers for Windows, OSX and many *Nix
distributions.
Data Source
A data source is simply the source of the data. It can be a file, a particular database on a DBMS, or even a live data
feed. The data might be located on the same computer as the program, or on another computer somewhere on a
network.