Module:3
Q.1 Define distributed database.
Distributed Database: A distributed database is a collection of multiple interconnected
databases, which are spread physically across various locations that communicate via a
computer network.
Q.2 State the advantages of distributed databases over centralized databases.
Following are the advantages of distributed databases over centralized databases.
1. Modular Development − If the system needs to be expanded to new locations or new
units, in centralized database systems, the action requires substantial efforts and
disruption in the existing functioning. However, in distributed databases, the work
simply requires adding new computers and local data to the new site and finally
connecting them to the distributed system, with no interruption in current functions.
2. More Reliable − In case of database failures, the total system of centralized databases
comes to a halt. However, in distributed systems, when a component fails, the
functioning of the system continues may be at a reduced performance. Hence DDBMS
is more reliable.
3. Better Response − If data is distributed in an efficient manner, then user requests can
be met from local data itself, thus providing faster response. On the other hand, in
centralized systems, all queries have to pass through the central computer for
processing, which increases the response time.
4. Lower Communication Cost − In distributed database systems, if data is located locally
where it is mostly used, then the communication costs for data manipulation can be
minimized. This is not feasible in centralized systems.
Q.3 List the features of distributed databases and brief them.
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
Features
● Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
● Data is physically stored across multiple sites. Data in each site can be managed
by a DBMS independent of the other sites.
● The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
● A distributed database is not a loosely connected file system.
● A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
Q.4 Define Replication.
Replication involves using specialized software that looks for changes in the distributive
database. Once the changes have been identified, the replication process makes all the
databases look the same. The replication process can be complex and time-consuming,
depending on the size and number of the distributed databases. This process can also require
much time and computer resources.
• If the distributed database is (partially or fully) replicated, it is necessary to implement
protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the
same value.
• These protocols can be eager in that they force the updates to be applied to all the replicas
before the transactions completes, or they may be lazy so that the transactions updates one
copy (called the master) from which updates are propagated to the others after the transaction
completes.
Q.5 Write short notes on distributed concurrency control.
Distributed Concurrency Control
• Concurrency control involves the synchronization of access to the distributed database, such
that the integrity of the database is maintained. It is, without any doubt, one of the most
extensively studies problems in the DDBS field.
• The concurrency control problem in a distributed context is somewhat different that in a
centralized framework. One not only has to worry about the integrity of a single database, but
also about the consistency of multiple copies of the database. The condition that requires all
values of multiple copies of every data item to converge to the same value is called mutual
consistency.
• Let us only mention that the two general classes are pessimistic, synchronizing the execution
of the user request before the execution starts, and optimistic, executing requests and then
checking if the execution has compromised the consistency of the database.
• Two fundamental primitives that can be used with both approaches are locking, which is
based on the mutual exclusion of access to data items, and time-stamping, where transactions
executions are ordered based on timestamps.
• There are variations of these schemes as well as hybrid algorithms that attempt to combine
the two basic mechanisms.
Q.6 Explain the design issues of distributed databases.
The following are the design issues related to distributed databases
1. Distributed Database Design
• One of the main questions that is being addressed is how database and the applications that
run
against it should be placed across the sites.
• There are two basic alternatives to placing data: partitioned (or no-replicated) and replicated.
• In the partitioned scheme the database is divided into a number of disjoint partitions each of
which is placed at a different site. Replicated designs can be either fully replicated (also called
fully duplicated) where the entire database is stored at each site, or partially replicated (or
partially duplicated) where each partition of the database is stored at more than one site, but
not at all the sites.
• The two fundamental design issues are fragmentation, the separation of the database into
partitions called fragments, and distribution, the optimum distribution of fragments. The
research in this area mostly involves mathematical programming in order to minimize the
combined cost of storing the database, processing transactions against it, and message
communication among site.
2. Distributed Directory Management
• A directory contains information (such as descriptions and locations) about data items in the
database. Problems related to directory management are similar in nature to the database
placement problem discussed in the preceding section.
• A directory may be global to the entire DDBS or local to each site; it can be centralized at one
site or distributed over several sites; there can be a single copy or multiple copies.
3. Distributed Query Processing
• Query processing deals with designing algorithms that analyze queries and convert them
into a series of data manipulation operations. The problem is how to decide on a strategy for
executing each query over the network in the most cost-effective way, however cost is defined.
• The factors to be considered are the distribution of data, communication cost, and lack of
sufficient locally-available information. The objective is to optimize where the inherent
parallelism is used to improve the performance of executing the transaction, subject to the
above mentioned constraints.
4. Distributed Concurrency Control
• Concurrency control involves the synchronization of access to the distributed database, such
that the integrity of the database is maintained. It is, without any doubt, one of the most
extensively studied problems in the DDBS field.
• The concurrency control problem in a distributed context is somewhat different that in a
centralized framework. One not only has to worry about the integrity of a single database, but
also about the consistency of multiple copies of the database. The condition that requires all
values of multiple copies of every data item to converge to the same value is called mutual
consistency.
• Let us only mention that the two general classes are pessimistic, synchronizing the execution
of the user request before the execution starts, and optimistic, executing requests and then
checking if the execution has compromised the consistency of the database.
• Two fundamental primitives that can be used with both approaches are locking, which is
based on the mutual exclusion of access to data items, and time-stamping, where transactions
executions are ordered based on timestamps.
• There are variations of these schemes as well as hybrid algorithms that attempt to combine
the two basic mechanisms.
5. Distributed Deadlock Management
• The deadlock problem in DDBSs is similar in nature to that encountered in operating
systems.
• The competition among users for access to a set of resources (data, in this case) can result in
a deadlock if the synchronization mechanism is based on locking. The well-known alternatives
of prevention, avoidance, and detection/recovery also apply to DDBSs.
6. Reliability of Distributed DBMS
• It is important that mechanisms be provided to ensure the consistency of the database as
well as to detect failures and recover from them. The implication for DDBSs is that when a
failure occurs and various sites become either inoperable or inaccessible, the databases at the
operational sites remain consistent and up to date.
• Furthermore, when the computer system or network recovers from the failure, the DDBSs
should be able to recover and bring the databases at the failed sites up-to-date. This may be
especially difficult in the case of network partitioning, where the sites are divided into two or
more groups with no communication among them.
7. Replication
• If the distributed database is (partially or fully) replicated, it is necessary to implement
protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the
same value.
• These protocols can be eager in that they force the updates to be applied to all the replicas
before the transaction completes, or they may be lazy so that the transactions update one copy
(called the master) from which updates are propagated to the others after the transaction
completes.
Q.7 What is a homogeneous database?
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and operating
systems. Its properties are −
● The sites use very similar software.
● The sites use identical DBMS or DBMS from the same vendor.
● Each site is aware of all other sites and cooperates with other sites to process user requests.
● The database is accessed through a single interface as if it is a single database.
Q.8 What is a heterogeneous database?
In a heterogeneous distributed database, different sites have different operating systems,
DBMS products and data models. Its properties are −
● Different sites use dissimilar schemas and software.
● The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
● Query processing is complex due to dissimilar schemas.
● Transaction processing is complex due to dissimilar software.
● A site may not be aware of other sites and so there is limited co-operation in processing user
requests.
Q.9 Discuss the types of homogeneous and heterogeneous databases.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
● Autonomous − Each database is independent that functions on its own. They are integrated
by a controlling application and use message passing to share data updates.
● Non-autonomous − Data is distributed across the homogeneous nodes and a central or
master DBMS coordinates data updates across the sites.
Types of Heterogeneous Distributed Databases
● Federated − Heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
● Un-federated − Database systems employ a central coordinating module through which the
databases are accessed.
Q.10 Define Autonomy.
Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
Q.11 Define Distribution.
Distribution − It states the physical distribution of data across the different sites.
Q.12 Draw and explain the client-server architecture.
Client - Server Architecture for DDBMS
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
The two different client - server architectures are −
● Single Server Multiple Client
● Multiple Server Multiple Client (shown in the following diagram)