0% found this document useful (0 votes)
9 views9 pages

Distributed Database Essentials

Distributed databases have several advantages over centralized databases including modular development, higher reliability, and better response times. Key features of distributed databases include logical and physical separation of data across multiple interconnected sites managed by independent DBMS. Replication involves using software to identify changes across distributed databases and ensure consistency by making all databases match. Design issues for distributed databases include data placement, directory management, query processing, concurrency control, and reliability.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

Distributed Database Essentials

Distributed databases have several advantages over centralized databases including modular development, higher reliability, and better response times. Key features of distributed databases include logical and physical separation of data across multiple interconnected sites managed by independent DBMS. Replication involves using software to identify changes across distributed databases and ensure consistency by making all databases match. Design issues for distributed databases include data placement, directory management, query processing, concurrency control, and reliability.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module:3

Q.1 Define distributed database.


Distributed Database: A distributed database is a collection of multiple interconnected
databases, which are spread physically across various locations that communicate via a
computer network.

Q.2 State the advantages of distributed databases over centralized databases.


Following are the advantages of distributed databases over centralized databases.

1. Modular Development − If the system needs to be expanded to new locations or new

units, in centralized database systems, the action requires substantial efforts and

disruption in the existing functioning. However, in distributed databases, the work

simply requires adding new computers and local data to the new site and finally

connecting them to the distributed system, with no interruption in current functions.

2. More Reliable − In case of database failures, the total system of centralized databases

comes to a halt. However, in distributed systems, when a component fails, the

functioning of the system continues may be at a reduced performance. Hence DDBMS

is more reliable.

3. Better Response − If data is distributed in an efficient manner, then user requests can

be met from local data itself, thus providing faster response. On the other hand, in

centralized systems, all queries have to pass through the central computer for

processing, which increases the response time.

4. Lower Communication Cost − In distributed database systems, if data is located locally

where it is mostly used, then the communication costs for data manipulation can be

minimized. This is not feasible in centralized systems.

Q.3 List the features of distributed databases and brief them.


A distributed database is a collection of multiple interconnected databases, which are spread

physically across various locations that communicate via a computer network.

Features
● Databases in the collection are logically interrelated with each other. Often they

represent a single logical database.

● Data is physically stored across multiple sites. Data in each site can be managed

by a DBMS independent of the other sites.

● The processors in the sites are connected via a network. They do not have any

multiprocessor configuration.

● A distributed database is not a loosely connected file system.

● A distributed database incorporates transaction processing, but it is not

synonymous with a transaction processing system.

Q.4 Define Replication.

Replication involves using specialized software that looks for changes in the distributive

database. Once the changes have been identified, the replication process makes all the

databases look the same. The replication process can be complex and time-consuming,

depending on the size and number of the distributed databases. This process can also require

much time and computer resources.

• If the distributed database is (partially or fully) replicated, it is necessary to implement

protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the

same value.

• These protocols can be eager in that they force the updates to be applied to all the replicas

before the transactions completes, or they may be lazy so that the transactions updates one

copy (called the master) from which updates are propagated to the others after the transaction

completes.
Q.5 Write short notes on distributed concurrency control.

Distributed Concurrency Control

• Concurrency control involves the synchronization of access to the distributed database, such

that the integrity of the database is maintained. It is, without any doubt, one of the most

extensively studies problems in the DDBS field.

• The concurrency control problem in a distributed context is somewhat different that in a

centralized framework. One not only has to worry about the integrity of a single database, but

also about the consistency of multiple copies of the database. The condition that requires all

values of multiple copies of every data item to converge to the same value is called mutual

consistency.

• Let us only mention that the two general classes are pessimistic, synchronizing the execution

of the user request before the execution starts, and optimistic, executing requests and then

checking if the execution has compromised the consistency of the database.

• Two fundamental primitives that can be used with both approaches are locking, which is

based on the mutual exclusion of access to data items, and time-stamping, where transactions

executions are ordered based on timestamps.

• There are variations of these schemes as well as hybrid algorithms that attempt to combine

the two basic mechanisms.

Q.6 Explain the design issues of distributed databases.

The following are the design issues related to distributed databases

1. Distributed Database Design

• One of the main questions that is being addressed is how database and the applications that

run

against it should be placed across the sites.


• There are two basic alternatives to placing data: partitioned (or no-replicated) and replicated.

• In the partitioned scheme the database is divided into a number of disjoint partitions each of

which is placed at a different site. Replicated designs can be either fully replicated (also called

fully duplicated) where the entire database is stored at each site, or partially replicated (or

partially duplicated) where each partition of the database is stored at more than one site, but

not at all the sites.

• The two fundamental design issues are fragmentation, the separation of the database into

partitions called fragments, and distribution, the optimum distribution of fragments. The

research in this area mostly involves mathematical programming in order to minimize the

combined cost of storing the database, processing transactions against it, and message

communication among site.

2. Distributed Directory Management

• A directory contains information (such as descriptions and locations) about data items in the

database. Problems related to directory management are similar in nature to the database

placement problem discussed in the preceding section.

• A directory may be global to the entire DDBS or local to each site; it can be centralized at one

site or distributed over several sites; there can be a single copy or multiple copies.

3. Distributed Query Processing

• Query processing deals with designing algorithms that analyze queries and convert them

into a series of data manipulation operations. The problem is how to decide on a strategy for

executing each query over the network in the most cost-effective way, however cost is defined.

• The factors to be considered are the distribution of data, communication cost, and lack of

sufficient locally-available information. The objective is to optimize where the inherent

parallelism is used to improve the performance of executing the transaction, subject to the

above mentioned constraints.


4. Distributed Concurrency Control

• Concurrency control involves the synchronization of access to the distributed database, such

that the integrity of the database is maintained. It is, without any doubt, one of the most

extensively studied problems in the DDBS field.

• The concurrency control problem in a distributed context is somewhat different that in a

centralized framework. One not only has to worry about the integrity of a single database, but

also about the consistency of multiple copies of the database. The condition that requires all

values of multiple copies of every data item to converge to the same value is called mutual

consistency.

• Let us only mention that the two general classes are pessimistic, synchronizing the execution

of the user request before the execution starts, and optimistic, executing requests and then

checking if the execution has compromised the consistency of the database.

• Two fundamental primitives that can be used with both approaches are locking, which is

based on the mutual exclusion of access to data items, and time-stamping, where transactions

executions are ordered based on timestamps.

• There are variations of these schemes as well as hybrid algorithms that attempt to combine

the two basic mechanisms.

5. Distributed Deadlock Management

• The deadlock problem in DDBSs is similar in nature to that encountered in operating

systems.

• The competition among users for access to a set of resources (data, in this case) can result in

a deadlock if the synchronization mechanism is based on locking. The well-known alternatives

of prevention, avoidance, and detection/recovery also apply to DDBSs.

6. Reliability of Distributed DBMS


• It is important that mechanisms be provided to ensure the consistency of the database as

well as to detect failures and recover from them. The implication for DDBSs is that when a

failure occurs and various sites become either inoperable or inaccessible, the databases at the

operational sites remain consistent and up to date.

• Furthermore, when the computer system or network recovers from the failure, the DDBSs

should be able to recover and bring the databases at the failed sites up-to-date. This may be

especially difficult in the case of network partitioning, where the sites are divided into two or

more groups with no communication among them.

7. Replication

• If the distributed database is (partially or fully) replicated, it is necessary to implement

protocols that ensure the consistency of the replicas, i.e. copies of the same data item have the

same value.

• These protocols can be eager in that they force the updates to be applied to all the replicas

before the transaction completes, or they may be lazy so that the transactions update one copy

(called the master) from which updates are propagated to the others after the transaction

completes.

Q.7 What is a homogeneous database?

Homogeneous Distributed Databases

In a homogeneous distributed database, all the sites use identical DBMS and operating

systems. Its properties are −

● The sites use very similar software.

● The sites use identical DBMS or DBMS from the same vendor.

● Each site is aware of all other sites and cooperates with other sites to process user requests.

● The database is accessed through a single interface as if it is a single database.


Q.8 What is a heterogeneous database?

In a heterogeneous distributed database, different sites have different operating systems,

DBMS products and data models. Its properties are −

● Different sites use dissimilar schemas and software.

● The system may be composed of a variety of DBMSs like relational, network, hierarchical or

object oriented.

● Query processing is complex due to dissimilar schemas.

● Transaction processing is complex due to dissimilar software.

● A site may not be aware of other sites and so there is limited co-operation in processing user

requests.

Q.9 Discuss the types of homogeneous and heterogeneous databases.

Types of Homogeneous Distributed Database

There are two types of homogeneous distributed database −

● Autonomous − Each database is independent that functions on its own. They are integrated

by a controlling application and use message passing to share data updates.

● Non-autonomous − Data is distributed across the homogeneous nodes and a central or

master DBMS coordinates data updates across the sites.

Types of Heterogeneous Distributed Databases

● Federated − Heterogeneous database systems are independent in nature and integrated

together so that they function as a single database system.

● Un-federated − Database systems employ a central coordinating module through which the

databases are accessed.


Q.10 Define Autonomy.

Autonomy − It indicates the distribution of control of the database system and the degree to

which each constituent DBMS can operate independently.

Q.11 Define Distribution.

Distribution − It states the physical distribution of data across the different sites.

Q.12 Draw and explain the client-server architecture.

Client - Server Architecture for DDBMS

This is a two-level architecture where the functionality is divided into servers and clients. The

server functions primarily encompass data management, query processing, optimization and

transaction management. Client functions include mainly user interface. However, they have

some functions like consistency checking and transaction management.

The two different client - server architectures are −

● Single Server Multiple Client

● Multiple Server Multiple Client (shown in the following diagram)

Common questions

Powered by AI

Distributed databases employ pessimistic and optimistic concurrency control approaches. The pessimistic approach synchronizes requests before execution, reducing the risk of conflicts but potentially leading to delays due to lock contention. The optimistic approach allows transactions to proceed without initial conflict checks but verifies consistency post-execution, which can lead to rollbacks if conflicts are detected. While pessimistic locks ensure immediate consistency, optimistic methods can improve throughput and resource utilization in low-conflict environments .

Distributed databases handle deadlocks similarly to traditional operating systems by employing prevention, avoidance, and detection/recovery strategies. Locking mechanisms can lead to deadlocks if resources are competed over, akin to operating systems. Prevention strategies aim to inhibit conditions that cause deadlocks, while avoidance dynamically assesses potential deadlocks, and detection/recovery manages after deadlocks occur, often through rollbacks. Adaptations for distributed contexts include handling network-related partitions and ensuring system-wide consensus on deadlock states .

Designing distributed databases involves decisions on data fragmentation and distribution. Fragmentation divides the database into partitions called fragments, optimizing storage and processing. Optimal distribution of these fragments is crucial for minimizing costs related to data storage, transaction processing, and inter-site communication. These considerations directly influence system efficiency because improper data placement can lead to increased latency and higher communication expenses. Using mathematical programming, the aim is to balance these factors for efficient database operation .

In distributed databases, concurrency control manages the synchronization of data access to maintain database integrity, accounting for multiple copies of databases unlike the single database scenario in centralized systems. Distributed systems use pessimistic approaches, synchronizing requests beforehand, and optimistic approaches, allowing execution followed by consistency checks. Key techniques include locking for mutual exclusion and time-stamping to order transaction execution. Variations and hybrid algorithms exist to combine these approaches, ensuring mutual consistency across database copies .

Replication in distributed databases uses software to detect changes and ensure all databases reflect the same state. Challenges include maintaining consistency across replicas, which can be complex and resource-intensive due to the size and number of databases involved. Protocols manage these challenges; eager protocols apply updates to all replicas before transaction completion, ensuring immediate consistency. Lazy protocols update one master copy first, propagating changes later, which may lead to temporary inconsistencies. Selecting a protocol depends on the system's consistency requirements .

Distributed databases offer modular development, which allows for easier expansion to new locations or units without disrupting existing systems. They also provide greater reliability since the system can continue functioning even if some components fail, albeit with reduced performance. Faster response times are achieved by meeting user requests from local data instead of routing all queries through a central computer, enhancing performance. Additionally, lower communication costs are ensured by locally storing data that is frequently used, reducing the need for data transmission over long networks .

A directory in distributed databases provides data item descriptions and locations, critical for efficient data retrieval and management. Directory management decisions, such as whether to centralize or distribute directories and whether to maintain single or multiple copies, impact performance. A well-designed directory reduces access times and improves query efficiency, whereas poor design may lead to increased communication costs and slower system performance. Directories must be designed to match the chosen database placement strategy for optimal results .

Network partitioning divides a distributed database into isolated groups of sites, complicating consistency maintenance. Challenges include ensuring transaction consistency across isolated groups and synchronizing data when communication is restored. Solutions involve partition-tolerant protocols that maintain consistency within partitions and resolve conflicts post-partitioning. Systems might employ quorum-based approaches to limit operations or involve versioning strategies to reconcile data after partitions are resolved, minimizing inconsistencies and service disruptions .

Homogeneous distributed databases use identical DBMS and operating systems across all sites, allowing seamless cooperation and access via a single interface. In contrast, heterogeneous distributed databases have different operating systems, DBMS products, and data models, resulting in complex query and transaction processing due to schema and software differences. This complexity can limit cooperation among sites and complicate integration efforts, impacting overall system performance .

Distributed systems enhance reliability through mechanisms that ensure database consistency, failure detection, and recovery strategies. In failure scenarios, the aim is to maintain consistent and up-to-date operational site databases and to recover and synchronize failed site databases upon network restoration. This requires robust recovery protocols capable of dealing with network partitioning where communication is disrupted. These mechanisms are essential to prevent data integrity issues and ensure continuous service availability across different site failures .

You might also like