0% found this document useful (0 votes)
10 views17 pages

F Edit Chapter 8

Distributed databases consist of multiple computers storing data independently, communicating through networks, and supporting local and global transactions. They can be homogeneous, with identical database systems across sites, or heterogeneous, with differing schemas and software. Key concepts include data replication, fragmentation, and the importance of transparency for users, along with challenges in transaction processing and workflow management.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views17 pages

F Edit Chapter 8

Distributed databases consist of multiple computers storing data independently, communicating through networks, and supporting local and global transactions. They can be homogeneous, with identical database systems across sites, or heterogeneous, with differing schemas and software. Key concepts include data replication, fragmentation, and the importance of transparency for users, along with challenges in transaction processing and workflow management.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

CHAPTER - 8

DISTRIBUTED DATABASES

8.1 INTRODUCTION TO DISTRIBUTED DATABASES


In a distributed database system, the database is stored on several computers. The
computers in a distributed system communicate with one another through various
communication media, such as high-speed networks or telephone lines. They do not share
main memory or disk.

The computers in distributed system are referred by names such as sites or nodes.

Site 5 DB

Site 1 DB
Site 4
Communication
Network
Site 2 DB
DB

DB Site 3

Fig. 8.1 Architecture of Distributed Database System


 Distributed database system consists of loosely coupled sites that share no physical
component.
 Database system that run on each site are independent of each other.
 Transactions may access data at one or more sites.
Types of Transactions
Distributed database system supports two types of transactions.
(i) Local transaction: It is one that accesses data only from site where that
transaction was initiated.
Distributed Databases 8.2

(ii) Global transaction: It is one that either accesses data from a site other than the
site where that transaction was initiated or accesses data from several different
sites.

Example: Consider a banking system consisting of four branches in four cities. Each branch
has its own computer, with a database of all the accounts maintained at that branch. There is
also one site that maintains information about all the branches of the bank.

Each branch maintains a relation account (Account_schema)

Account_Schema = (acc_no, branch_name, balance)

The site containing information about all the branches of the bank maintains the
relation branch (Branch-schema).

Branch_schema = (branch_name, branch_city, assets)

There are also other relations maintained at the various sites.

Example of Local Transaction

Consider a transaction to add $50 to account number A102, which is at the Perryridge
branch. If the transaction was initiated at Perryridge branch, it is called as local transaction
else it is global transaction.

Example of Global Transaction

Consider a transaction which transfers $50 from account number A102 to account
number A201, which is at Valleyview branch, is a global transaction, since accounts in two
different sites are accessed.

Types of Distributed Databases

They are classified as:

(a) Homogeneous Distributed Database

 In this, all sites have identical database management system software, are
aware of one another, and agree to cooperate in processing user’s requests.

 In such a system, local sites surrender a portion of their autonomy in terms of


their right to change schemas or database management system software.

 This software must also cooperate with other sites in exchanging information
about transactions, to make transaction processing possible across multiple
sites.

 It appears to user as a single system.


8.3 Database Management Systems

(b) Heterogeneous Distributed Databases


 In this, different sites may use different schemas, and different database
management system software.
 The sites may not be aware of one another, and they may provide only limited
facilities for cooperation in transaction processing.
 The differences in schemas are often a major problem for query processing,
while the difference in software becomes a major problem for transaction
processing.

8.2 HOMOGENEOUS DISTRIBUTED DATABASES


Consider a relation r that is to be stored in the database. There are two approaches to
store this relation in the distributed database.
Data Replication
System maintains multiple copies of data, stored in different sites, for faster retrieval
and fault tolerance.
Fragmentation
The system partitions the relation into several fragments, and stores each fragment at a
different sites.
Replication and fragmentation can be combined. Relation is partioned into several
fragments and there may be several replicas of each fragment.
8.2.1 Data Replication
If relation r is replicated, a copy of relation r is stored in two or more sites. In extreme
case, a copy is stored in every site in the system, which is called as full replication.
Advantages
 Availability: If one of the site containing relation r fails, then the relation r can
be found in another site. Thus, the system can continue to process queries
involving ‘r’, despite the failure of one site.
 Increased Parallelism: Queries on r may be processed by several nodes in
parallel.
Disadvantages
 Increased overhead on update: The system must ensure that all replicas of a
relation r are consistent; otherwise, erroneous computation may result. Thus,
Distributed Databases 8.4

whenever r is updated, the update must be propagated to all sites containing


replicas. The result is increased overhead.
 Increased complexity of concurrency control.
8.2.2 Data Fragmentation
If relation ‘r’ is fragmented, ‘r’ is divided into a number of fragments r1, r2, ..., rn.
These fragments contain sufficient information to allow reconstruction of the original
relation ‘r’.
Types
 Vertical fragmentation
 Horizontal fragmentation.
(a) Vertical Fragmentation: Vertical fragmentation splits the relation by
decomposing the schema R of relation ‘r’. Vertical fragmentation of r(R)
involves the definition of several subsets of attributes R1, R2, ..., Rn of the
relation R so that
R = R1 U R2 U R3, U ..., U Rn

Each fragment ri of ‘r’ is defined by

ri = Ri (r)

We can reconstruct relation ‘r’ by taking the natural join

i.e., r = r1 r2 r3, ..., rn.

Example
Consider a university database with a relation employee_info that stores for each
employee, employee-id, name, designation and salary.
For privacy reasons, this relation may be fragmented into relations.
1. employee_private_info = (emp_id, salary)
2. employee_public_info = (emp_id, name, designation).
These fragments may be stored at different sites, again for security reasons.
(b) Horizontal Fragmentation: In horizontal fragmentation, a relation ‘r’ is
partitioned into a number of subsets r1, r2, ..., rn. Each tuple of relation ‘r’ must
belong to at least one of the fragments, so that the original relation can be
reconstructed, if needed.
8.5 Database Management Systems

In general, a horizontal fragment can be defined as a select on the global


relation ‘r’. The predicate Pi is used to construct fragment ri.
ri = Pi (r)
We reconstruct the relation r by taking the union of all fragments.
That is, r = r1 U r2 U, ..., U rn
Example
Consider the account relation.
Account = (acc_no, branch_name, balance)
Using horizontal fragmentation, the account relation is divided into several different
fragments, each of which consists of tuples of accounts belonging to a particular branch. If the
banking system has only two branches-Hillside and Perryridge, then there are two different
fragments:
account1 = branch_name = ‘Hillside’ (account)
account2 =branch_name = ‘Perryridge’ (account)
8.2.3 Transparency
The user of a distributed database system should not be required to know either where
the data are physically located or how the data can be accessed at the specific local site. This
characteristic called data transparency.
It can take several forms.
(a) Fragmentation Transparency: Users are not required to know how a relation
has been fragmented.
(b) Replication Transparency: Users view each data object as logically unique.
The users are not required to know what data objects have been replicated, or
where replicas have been placed.
(c) Location Transparency: Users are not required to know the physical location
of the data.
Advantages
1. Sharing Data: Using distributed system, users at one site are able to access the
data residing at other site.
2. Autonomy: Each site is able to retain a degree of control over data that are
stored locally.
3. Availability: If one site fails in distributed system, the remaining sites may be
able to continue operation.
Distributed Databases 8.6

Disadvantages

1. Software development cost: The implementation of distributed system is more


costly.

2. Greater potential for bugs: Since the sites that constitute the distributed
system operate in parallel, it is harder to ensure the correctness of algorithms.

3. Increased processing overhead: The exchange of messages and the additional


computation required to achieve intersite co-ordination are a form of overhead,
that does not arise in centralised system.

8.3 HETEROGENEOUS DISTRIBUTED DATABASES


 Many database applications require data from a variety of pre-existing
databases located in heterogeneous collection of hardware and software
platforms.

 Each local database management system may use a different data model. Some
may employ relational model, whereas others may employ network or
hierarchical model.

 Transaction commit protocols may be incompatible.

 Concurrency control may be based on different techniques locking time


stamping etc.

 Manipulation of information located in a heterogeneous distributed database


requires an additional software layer on top of existing database system.

 This software layer is called multidatabase system.

 Full integration of heterogeneous systems into a homogeneous database is


often difficult or impossible.

(a) Technical difficulties - The investment in application programs based on


existing database systems may be huge, and the cost of conversion may be
prohibitive.

(b) Organization difficulties - Even if integration is technically possible, it may not


be politically possible, because the existing database systems belong to
corporation or organizations.

8.4 TRANSACTION PROCESSING


8.4.1 Transaction Processing Monitors
8.7 Database Management Systems

 TP monitors initially developed as multithreaded servers to support large numbers


of terminals from a single process.
 Provide infrastructure for building and administering complex transaction
processing systems with a large number of clients and multiple servers.
 Provide services such as:
 Presentation facilities to simplify creating user interfaces
 Routing of client messages to servers
 Coordination of two-phase commit when transactions access multiple
servers.
 Persistent queuing of client requests and server responses
 Some commercial TP monitors: CICS from IBM, Pathway from Tandem, Top End
from NCR, and Encina from Transarc

Fig. 8.2 TP – Monitor Architectures

 Process per client model - instead of individual login session per terminal, server
process communicates with the terminal, handles authentication, and executes
actions.
 Memory requirements are high
Distributed Databases 8.8

 Multitasking- high CPU overhead for context switching between processes


 Single process model - all remote terminals connect to a single server process.
 Used in client-server environments
 Server process is multi-threaded; low cost for thread switching
 No protection between applications
 Not suited for parallel or distributed databases
 Many-server single-router model - multiple application server processes access a
common database; clients communicate with the application through a single
communication process that routes requests.
 Independent server processes for multiple applications
 Multithread server process
 Run on parallel or distributed database
 Many server many-router model - multiple processes communicate with clients.

 Client communication processes interact with router processes that route


their requests to the appropriate server.
 Controller process starts up and supervises other processes.
8.4.2 Detailed Structure of a TP Monitor

Fig. 8.3 TP – Monitor Components


8.9 Database Management Systems

 Queue manager handles incoming messages

 Some queue managers provide persistent or durable message queueing contents of


queue are safe even if systems fails.

 Durable queueing of outgoing messages is important

 application server writes message to durable queue as part of a transaction


 once the transaction commits, the TP monitor guarantees message is
eventually delevered, regardless of crashes.
 ACID properties are thus provided even for messages sent outside the
database
 Many TP monitors provide locking, logging and recovery services, to enable
application servers to implement ACID properties by themselves.

8.4.3 Application Coordination Using TP Monitors

 A TP monitor treats each subsystem as a resource manager that provides


transactional access to some set of resources.

 The interface between the TP monitor and the resource manager is defined by a set
of transaction primitives

 The resource manager interface is defined by the X/Open Distributed Transaction


Processing standard.

 TP monitor systems provide a transactional remote procedure call


(transactional RPC) interface to their service

 Transactional RPC provides calls to enclose a series of RPC calls within a


transaction.
 Updates performed by an RPC are carried out within the scope of the
transaction, and can be rolled back if there is any failure.

8.5 TRANSACTIONAL WORKFLOWS


 Workflows are activities that involve the coordinated execution of multiple tasks
performed by different processing entities.

 With the growth of networks, and the existence of multiple autonomous database
systems, workflows provide a convenient way of carrying out tasks that involve
multiple systems.
Distributed Databases 8.10

 Example of a workflow delivery of an email message, which goes through several


mails systems to reach destination.

 Each mailer performs a tasks: forwarding of the mail to the next mailer.
 If a mailer cannot deliver mail, failure must be handled semantically
(delivery failure message).
 Workflows usually involve humans: e.g. loan processing, or purchase order
processing.

Fig.8.4 Examples of Workflows

Fig. 8.5 Workflow in loan processing


 In the past, workflows were handled by creating and forwarding paper forms
 Computerized workflows aim to automate many of the tasks. But the humans
still play role e.g. in approving loans.
 Must address following issues to computerize a workflow.
 Specification of workflows - detailing the tasks that must be carried out and
defining the execution requirements.
 Execution of workflows - execute transactions specified in the workflow
while also providing traditional database safeguards related to the
correctness of computations, data integrity, and durability.
 E.g.: Loan application should not get lost even if system fails.
8.11 Database Management Systems

 Extend transaction concepts to the context of workflows.


 State of a workflow - consists of the collection of states of its constituent tasks,
and the states (i.e. values) of all variables in the execution plan.
8.5.1 Workflow Specification

 Static specification of task coordination:


 Tasks and dependencies among them are defined before the execution of
the workflow starts.
 Can establish preconditions for execution of each task: tasks are executed
only when their preconditions are satisfied.
 Defined preconditions through dependencies:
 Execution states of other tasks.
“task ti cannot start until task tj has ended”

 Output values of other tasks.


“task ti can start if task tj returns a value greater than 25”
 External variables, that are modified by external events.
“task ti must be started within 24 hours of the completion of task tj”
 Dynamic task coordination
E.g. Electronic mail routing system in which the text to be schedule for a given
mail message depends on the destination address and on which intermediate
routers are functioning.
8.5.2 Failure-Automicity Requirements of a Work Flow

 Usual ACID transactional requirements are too strong/unimplementable for


workflow applications.
 However, workflows must satisfy some limited transactional properties that
guarantee a process is not left in an inconsistent state.
 Acceptable termination states - every execution of a workflow will terminate in a
state that satisfies the failure-atomicity requirements defined by the designer.
 Committed - objectives of a workflow have been achieved.
 Aborted - valid termination state in which a workflow has failed to achieve
its objectives.
Distributed Databases 8.12

 A workflow must reach an acceptable termination state even in the presence of


system failures.
8.5.3 Execution of Workflows

Workflow management systems include:


 Scheduler - program that process workflows by submitting various tasks for
execution, monitoring various events, and evaluation conditions related to intertask
dependencies
 Task agents - control the execution of a task by a processing entity.
 Mechanism to query to state of the workflow system.
8.5.4 Workflow Management System Architectures

 Centralized - a single scheduler schedules the tasks for all concurrently executing
workflows.
 used in workflow systems where the data is stored in a central database.
 easier to keep track of the state of a workflow.
 Partially distributed - has one (instance of a ) scheduler for each workflow.
 Fully distributed - has no scheduler, but the task agents coordinate their execution
by communicating with each other to satisfy task dependencies and other workflow
execution requirements.
 used in simplest workflow execution systems
 based on electronic mail
8.5.5 Workflow Scheduler

 Ideally scheduler should execute a workflow only after ensuring that it will
terminate in an acceptable state.
 Consider a workflow consisting of two tasks S1 and S2. Let the failure-atomicity
requirement be that either both or neither of the subtransactions should be
committed.
 Suppose systems executing S1 and S2 do not provide prepared-to-commit
states and S1 or S2 do not have compensating transactions.
 It is then possible to reach a state where one subtransaction is committed
and the other aborted. Both cannot then be brought to the same state.
 Workflow specification is unsafe, and should be rejected.
8.13 Database Management Systems

 Determination of safety by the scheduler is not possible in general, and is usually


left to the designer of the workflow.
8.5.6 Recovery of a Workflow

 Ensure that is a failure occurs in any of the workflow-processing components, the


workflow eventually reaches an acceptable termination state.
 Failure-recovery routines need to restore the state information of the scheduler at
the time of failure, including the information about the execution states of each
task. Log status information on stable storage.
 Handoff of tasks between agents should occur exactly once in spite of failure.
 Problem: Repeating handoff on recovery may lead to duplicate execution of task;
not repeating handoff may lead to task not being executed.
 Solution: Persistent messaging systems
 Persistent messages: messages are stored in permanent message queue and
therefore not lost in case of failure.
 Described in detail in Chapter 19 (Distributed Databases)
 Before an agent commits, it writes to the persistent message queue whatever
messages need to be sent out.
 The persistent message system must make sure the messages get delivered
eventually if and only if the transaction commits.
 The message system needs to resend a message when the site recovers, if the
message is not known to have reached its destination.
 Messages must be logged in stable storage at the receiving end to detect multiple
receipts of a message.
8.5.7 High Performance Transaction Systems

 High-performance hardware and parallelism help improve the rate of transaction


processing, but are insufficient to obtain high performance:
 Disk I/O is a bottleneck — I/O time (10 milliseconds) has no decreased at a
rate comparable to the increase in processor speeds.
 Parallel transactions may attempt to read or write the same data item,
resulting in data conflicts that reduce effective parallelism
 We can reduce the degree to which a database system is disk bound by increasing
the size of the database buffer.
Distributed Databases 8.14

IMPORTANT QUESTIONS AND ANSWERS

PART – A

1. What are the two approaches to store a relation in the distributed database?
8.15 Database Management Systems

Replication: System maintains several identical replicas (copies) of the relation and
stores each replica at a different site.
Fragmentation: System Partitions the relation into several fragments and stores
each fragment at a different site.

2. Define Distribute Database.


The computers in a distributed system communicate with one another through
various communication media, such as high-speed networks or telephone lines. They
do not share main memory or disk. The computers in distributed system are referred
by names such as sites or nodes.

3. What are the types of Transactions?


Distributed database system supports two types of transactions.
 Local transaction: It is one that accesses data only from site where that
transaction was initiated.
 Global transaction: It is one that either accesses data from a site other than
the site where that transaction was initiated or accesses data from several
different sites.

4. Difference between homogeneous and heterogeneous database.


[Link] Homogeneous Database Heterogeneous Database
1. Different nodes may have same Database application used at each
hardware & software location must be same or
compatible.

2. Much easier to design and Tough to design and manage


manage

Database application used at Database application used at each


3. each location must be same or location must be incompatible.
compatible.

5.

6. List out the reasons for the development of distributed database.


 In Centralized system data is stored on a single computer. If that computer
fails, complete system fails.
Adv  In client server system also the data is stored on server. If server fails,
ant complete system fails.
age
s:

Distributed Databases 8.16


A


A


D


S


G


I

7. Define Transparency
The user of a distributed database system should not be required to know either
where the data are physically located or how the data can be accessed at the specific
local site. This characteristic called data transparency.
8. Define Vertical Fragmentation
Vertical fragmentation splits the relation by decomposing the schema R of relation
‘r’. Vertical fragmentation of r(R) involves the definition of several subsets of
attributes R1, R2, ..., Rn of the relation R so that
R = R1 U R2 U R3, U ..., U Rn

9. Define Horizontal Fragmentation


In horizontal fragmentation, a relation ‘r’ is partitioned into a number of subsets r1,
r2, ..., rn. Each tuple of relation ‘r’ must belong to at least one of the fragments, so
that the original relation can be reconstructed, if needed.
In general, a horizontal fragment can be defined as a select on the global relation ‘r’.
The predicate Pi is used to construct fragment ri.
ri = Pi (r)
8.17 Database Management Systems

PART-B

1. Explain in detail about distributed databases

2. Discuss in detail about the various types of distributed database with suitable
examples.
3. Compare and contrast Homogenous and Heterogeneous Database.
4. With the neat diagram , explain in detail about Transactional Architecture.
5. Describe briefly on Transaction processing in distributed database.

You might also like