DISTRIBUTED DBMS
J. Mutai
A distributed database is a collection of
multiple interconnected databases, which
are spread physically across various
locations that communicate via a computer
network.
Features of a Distributed Database
• Databases in the collection are logically interrelated with
each other. Often they represent a single logical database.
• Data is physically stored across multiple sites. Data in each
site can be managed by a DBMS independent of the other
sites.
• The processors in the sites are connected via a network.
They do not have any multiprocessor configuration.
• A distributed database is not a loosely connected file
system.
• A distributed database incorporates transaction
processing, but it is not synonymous with a transaction
processing system.
Distributed Database Management System
A distributed database management
system (DDBMS) is a centralized
software system that manages a
distributed database in a manner as
if it were all stored in a single
location.
Features Of a Distributed DBMS
• It is used to create, retrieve, update and delete
distributed databases.
• It synchronizes the database periodically and provides
access mechanisms by the virtue of which the
distribution becomes transparent to the users.
• It ensures that the data modified at any site is
universally updated.
• It is used in application areas where large volumes of
data are processed and accessed by numerous users
simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the
databases.
Factors Encouraging DDBMS
• Distributed Nature of Organizational Units − Most organizations in the current
times are subdivided into multiple units that are physically distributed over the
globe. Each unit requires its own set of local data. Thus, the overall database of
the organization becomes distributed.
• Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands
common databases or replicated databases that should be used in a synchronized
manner.
• Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may
have common data. Distributed database systems aid both these processing by
providing synchronized data.
• Database Recovery − One of the common techniques used in DDBMS is
replication of data across different sites. Replication of data automatically helps in
data recovery if database in any site is damaged. Users can access data from
other sites while the damaged site is being reconstructed. Thus, database failure
may become almost inconspicuous to users.
• Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a
uniform functionality for using the same data among different platforms.
Advantages of Distributed Databases
1. Need for complex and expensive software − DDBMS demands
complex and often expensive software to provide data transparency
and co-ordination across the several sites.
2. Processing overhead − Even simple operations may require a large
number of communications and additional calculations to provide
uniformity in data across the sites.
3. Data integrity − The need for updating data in multiple sites pose
problems of data integrity.
4. Overheads for improper data distribution − Responsiveness of queries
is largely dependent upon proper data distribution. Improper data
distribution often leads to very slow response to user requests.
Adversities of Distributed Databases
1. Modular Development − If the system needs to be expanded to new
locations or new units, in centralized database systems, the action
requires substantial efforts and disruption in the existing functioning.
However, in distributed databases, the work simply requires adding
new computers and local data to the new site and finally connecting
them to the distributed system, with no interruption in current
functions.
2. More Reliable − In case of database failures, the total system of
centralized databases comes to a halt. However, in distributed systems,
when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
3. Better Response − If data is distributed in an efficient manner, then
user requests can be met from local data itself, thus providing faster
response. On the other hand, in centralized systems, all queries have to
pass through the central computer for processing, which increases the
response time.
4. Lower Communication Cost − In distributed database systems, if data
is located locally where it is mostly used, then the communication
costs for data manipulation can be minimized. This is not feasible in
centralized systems.
Types of Distributed Databases
Homogeneous Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and
operating systems. Its properties are −
1. The sites use very similar software.
2. The sites use identical DBMS or DBMS from the same vendor.
3. Each site is aware of all other sites and cooperates with other sites to
process user requests.
4. The database is accessed through a single interface as if it is a single
database.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
Autonomous − Each database is independent that functions on its own. They
are integrated by a controlling application and use message passing to share
data updates.
Non-autonomous − Data is distributed across the homogeneous nodes and a
central or master DBMS co-ordinates data updates across the sites.
Heterogeneous Distributed Databases
In a heterogeneous distributed database, different sites have different operating
systems, DBMS products and data models. Its properties are −
1. Different sites use dissimilar schemas and software.
2. The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
3. Query processing is complex due to dissimilar schemas.
4. Transaction processing is complex due to dissimilar software.
5. A site may not be aware of other sites and so there is limited co-operation in
processing user requests.
Types of Heterogeneous Distributed Databases
• Federated − The heterogeneous database systems are independent in nature
and integrated together so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module
through which the databases are accessed.