0% found this document useful (0 votes)

9 views11 pages

DDB Notes 1 11

The document provides an overview of distributed databases, detailing their architecture, design strategies, and advantages over centralized systems. It covers types of distributed databases, storage methods like replication and fragmentation, and the importance of distributed data processing. Additionally, it discusses the benefits and challenges associated with distributed databases and processing, along with examples and applications in various fields.

Uploaded by

PujaSinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views11 pages

DDB Notes 1 11

Uploaded by

PujaSinha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT – I

Introduction : Distributed Data Processing, Distributed Database System, Promises of DDBSs, Problem areas.
Distributed DBMS Architecture : Architectural Models for Distributed DBMS, DDMBS Architecture.
Distributed Database Design : Alternative Design Strategies, Distribution Design issues, Fragmentation,
Allocation.

Introduction – Distributed Databases:

 A distributed database is a database that runs and stores data across multiple computers, as
opposed to doing everything on a single machine.
 Typically, distributed databases operate on two or more interconnected servers on a computer
network.
 Each location where a version of the database is running is often called an instance or a node.
 A distributed database is basically a database that is not limited to one system, it is spread over
different sites, i.e, on multiple computers or over a network of computers.
 A distributed database system is located on various sites that don’t share physical components.
 This may be required when a particular database needs to be accessed by various users globally.
It needs to be managed such that for the users it looks like one single database.

Fig: Distributed Database System

A distributed database system is a type of database management system that stores data across
multiple computers or sites that are connected by a network. In a distributed database system, each
site has its own database, and the databases are connected to each other to form a single, integrated
system.
The main advantage of a distributed database system is that it can provide higher availability and
reliability than a centralized database system. Because the data is stored across multiple sites, the
system can continue to function even if one or more sites fail. In addition, a distributed database
system can provide better performance by distributing the data and processing load across multiple
sites.
Distributed Database Features

Some general features of distributed databases are:

 Location independency - Data is physically stored at multiple sites and managed by an

independent DDBMS.
 Distributed query processing - Distributed databases answer queries in a distributed
environment that manages data at multiple sites. High-level queries are transformed into a
query execution plan for simpler management.
 Distributed transaction management - Provides a consistent distributed database through
commit protocols, distributed concurrency control techniques, and distributed recovery
methods in case of many transactions and failures.
 Seamless integration - Databases in a collection usually represent a single logical database,
and they are interconnected.
 Network linking - All databases in a collection are linked by a network and communicate
with each other.
 Transaction processing - Distributed databases incorporate transaction processing, which is
a program including a collection of one or more database operations. Transaction processing
is an atomic process that is either entirely executed or not at all.

Applications / Uses of Distributed Database

 It is used in Corporate Management Information System.

 It is used in multimedia applications.
 Used in Military’s control system, Hotel chains etc.
 It is also used in manufacturing control system.

Architectures for distributed database systems

There are several different architectures for distributed database systems, including:
 Client-server architecture: In this architecture, clients connect to a central server,
which manages the distributed database system. The server is responsible for
coordinating transactions, managing data storage, and providing access control.

 Peer-to-peer architecture: In this architecture, each site in the distributed database

system is connected to all other sites. Each site is responsible for managing its own data
and coordinating transactions with other sites.

 Federated architecture: In this architecture, each site in the distributed database system
maintains its own independent database, but the databases are integrated through a
middleware layer that provides a common interface for accessing and querying the data.

 Distributed database systems can be used in a variety of applications, including e-

commerce, financial services, and telecommunications. However, designing and
managing a distributed database system can be complex and requires careful
consideration of factors such as data distribution, replication, and consistency.
Distributed Database Types

There are two types of distributed databases:

 Homogenous
 Heterogeneous

Homogeneous

 A homogenous distributed database is a network of identical databases stored on multiple

sites. The sites have the same operating system, DDBMS, and data structure, making them
easily manageable.
 Homogenous databases allow users to access data from each of the databases seamlessly.
 The following diagram shows an example of a homogeneous database:

Heterogeneous

 A heterogeneous distributed database uses different schemas, operating systems, DDBMS,

and different data models.
 In the case of a heterogeneous distributed database, a particular site can be completely
unaware of other sites causing limited cooperation in processing user requests. The limitation
is why translations are required to establish communication between sites.
 The following diagram shows an example of a heterogeneous database:
Distributed Database Storage

Distributed database storage is managed in two ways:

 Replication
 Fragmentation

Replication

 In database replication, the systems store copies of data on different sites. If an entire
database is available on multiple sites, it is a fully redundant database.
 The advantage of database replication is that it increases data availability on different sites
and allows for parallel query requests to be processed.
 However, database replication means that data requires constant updates and synchronization
with other sites to maintain an exact database copy. Any changes made on one site must be
recorded on other sites, or else inconsistencies occur.
 Constant updates cause a lot of server overhead and complicate concurrency control, as a lot
of concurrent queries must be checked in all available sites.

Fragmentation

 When it comes to fragmentation of distributed database storage, the relations are fragmented,
which means they are split into smaller parts. Each of the fragments are stored on a different
site, where it is required.
 The prerequisite for fragmentation is to make sure that the fragments can later be
reconstructed into the original relation without losing data.
 The advantage of fragmentation is that there are no data copies, which prevents data
inconsistency.

There are two types of fragmentation:

 Horizontal fragmentation - The relation schema is fragmented into groups of rows, and each
group (tuple) is assigned to one fragment.
 Vertical fragmentation - The relation schema is fragmented into smaller schemas, and each
fragment contains a common candidate key to guarantee a lossless join.

Note: In some cases, a mix of fragmentation and replication is possible.

Distributed Database Advantages and Disadvantages

Below are some key advantages and disadvantages of distributed databases:

Advantages Disadvantages

Modular development Costly software

Reliability Large overhead

Lower communication costs Data integrity

Better response Improper data distribution

The advantages and disadvantages are explained in detail in the following sections.

Advantages / Benefits of Distributed Databases:

 Modular Development. Modular development of a distributed database implies that a system

can be expanded to new locations or units by adding new servers and data to the existing
setup and connecting them to the distributed system without interruption. This type of
expansion causes no interruptions in the functioning of distributed databases.

 Reliability. Distributed databases offer greater reliability in contrast to centralized databases.

In case of a database failure in a centralized database, the system comes to a complete stop.
In a distributed database, the system functions even when failures occur, only delivering
reduced performance until the issue is resolved.

 Lower Communication Cost. Locally storing data reduces communication costs for data
manipulation in distributed databases. Local data storage is not possible in centralized
databases.

 Better Response. Efficient data distribution in a distributed database system provides a faster
response when user requests are met locally. In centralized databases, user requests pass
through the central machine, which processes all requests. The result is an increase in response
time, especially with a lot of queries.
Disadvantages / Issues of Distributed Databases:

 Costly Software. Ensuring data transparency and coordination across multiple sites often
requires using expensive software in a distributed database system.

 Large Overhead. Many operations on multiple sites requires numerous calculations and
constant synchronization when database replication is used, causing a lot of processing
overhead.

 Data Integrity. A possible issue when using database replication is data integrity, which is
compromised by updating data at multiple sites.

 Improper Data Distribution. Responsiveness to user requests largely depends on proper

data distribution. That means responsiveness can be reduced if data is not correctly distributed
across multiple sites.

Centralized Database Vs Distributed Database

Centralized DBMS Distributed DBMS
In Distributed DBMS the database are stored in
In Centralized DBMS the database are stored in
different site and help of network it can access
a only one site
it
Database and DBMS software distributed over
If the data is stored at a single computer site,
many sites, connected by a computer network
which can be used by multiple users

Database is maintained at a number of

Database is maintained at one site
different sites
If centralized system fails, entire system is If one system fails, system continues work
halted with other site
It is a less reliable It is a more reliable

Centralized database

Fig : Centralized database

Distributed database

Fig : Distributed database

Types of Distributed Databases

Fig : Types of Distributed Databases

Examples of distributed databases

Some common examples of distributed databases include:

 Apache Ignite
 Apache Cassandra
 Apache HBase
 Couchbase Server
 Amazon SimpleDB
 Clusterpoint
 FoundationDB

Distributed data processing

Distributed data processing refers to the approach of handling and analyzing data across multiple
interconnected devices or nodes. (or)
Distributed data processing having different database files located at different sites in a network is
known as DDP (Distributed Data Processing).
In contrast to centralized data processing, where all data operations occur on a single, powerful
system, distributed processing decentralizes these tasks across a network of computers.

Distributed Processing is a computing approach that involves dividing tasks across multiple
machines or nodes in a network. Instead of relying on a single machine to process large amounts of
data, the workload is distributed among multiple machines, enabling parallel processing. The
distributed nature of processing allows for increased performance, scalability, and fault tolerance.

How Distributed Data Processing works?

In a distributed processing system, a central coordinator assigns tasks to different nodes in the
network. Each node processes its assigned task independently and communicates the results back to
the coordinator. The coordinator then combines the results to produce the final output.

Distributed processing can be achieved through various mechanisms, including message passing,
shared memory, or a combination of both. Communication between nodes can occur through direct
point-to-point connections or via a shared communication infrastructure such as a message queue
or distributed file system.

In a Distributed data processing system, a massive amount of data flows through several different
sources into the system. This process of data flow is known as data ingestion.

Once the data streams in, there are different layers in the system architecture that breakdown the
entire processing into several different parts.

Fig : Data Ingestion

Data Collection and Preparation:
 This layer takes care of collecting data from different external sources and prepares it to be
processed by the system.
 It may be Text, audio, video, image, tax returns forms, insurance forms, medical bills, etc.
 The task of the data preparation layer is to convert the data into a consistent standard format,
also to classify it as per the business logic to be processed by the system. This is automated
fashion without any sort of human intervention.
Data Security Layer
The role of this layer is to ensure that the data transit is secure by watching over it through out with
applied security protocols, encryption like that.
Data Storage Layer
Here, Data storage layer is used to store the big amount of data.
Data Processing Layer
This is the layer that contains the business logic for data processing. Machine Learning, predictive,
descriptive and decision modeling are primarily used to extract meaningful information.
Data Visualization Layer
All the information extracted is sent to the data visualization layer which typically contains browser
based dashboards which display the information in the form of graphs, charts and infographics.

Why Distributed Processing is important

Distributed processing offers several benefits that make it important for data processing and
analytics:

 Improved Performance: By distributing the workload across multiple machines, distributed

processing can significantly reduce the processing time compared to a single machine. This is
especially crucial when dealing with large datasets or complex computational tasks.
 Scalability: Distributed processing allows organizations to scale their computing resources by
adding or removing nodes as needed. This flexibility enables businesses to handle increased
workloads and accommodate future growth without a significant impact on performance.
 Fault Tolerance: In a distributed processing system, if one node fails or experiences issues, the
workload can be automatically rerouted to other available nodes. This fault tolerance ensures that
processing continues uninterrupted and reduces the risk of data loss.
 Cost Efficiency: With distributed processing, organizations can utilize commodity hardware
instead of relying on expensive high-end servers. This reduces hardware costs and allows
businesses to achieve higher computing power at a lower price point.
Advantages of distributed data processing (DDP)
Inexpensive:

Data is also distributed so adding and removing nodes (computers) can be easy. To achieve
distributed networking, we can use Beowulf cluster technology. In Beowulf cluster, remote
computers are assigned processing through network switches and routers.

Easy to replace remote computers:

Microsoft Windows server has a feature called failover clustering that helps to remove faulty
computers. If any computer on the network fails or corrupted by some means, then that computer
is automatically replaced by other computers.

Optimized processing:

Managing data on online server solves slow processing. On the personal computer, we can do
extra tasks also. Doing extra tasks consumes processor power. But the online computer is
dedicated to one type of processing and it is more likely to increase processing powers. Database
server can only handle database queries and file server stores files. So data processing is
optimized.

Easy to expand:

Suppose your company needs more data processing than expected then you can easily attach
more computers to the distributed network.

Parallel processing:

Adding and removing computers from the network cannot disturb data flow. All data from
different computers are processed in parallel. Parallel processing means data is updated at the
same time from all nodes.

Better performance:

The overall performance of the company gets better and data is filtered and processed more
rapidly in the distributed environment.

Backup of data:

Data can be backup from any computer connected to the network. So the user can backup data at
a different time and work with that data locally and then upload the data to the server.

Local data synchronization:

All the computers on the network can have local storage of important data. Suppose there are
different office branches interconnected to each other. All branch computers are interlinked with
the main branch office. All office branch computers have a local copy of data. Office users edit
and update data and then upload to the main server. So the data is synced and available to all
computers. Working locally with data is easy and fast and when the user thinks that his work is
complete then at the end of the day he can sync that data with the main server.

Data recovery:

If some data like the database is a loss in any computer then it can be recovered by another
interconnected computer i.e. main database server.

Disadvantages of distributed data processing (DDP)

Complexity:

Computers attached in DDP are difficult to troubleshoot, design and administrate.

Planning data synchronization is difficult:

Doing the correct synchronization of data is difficult to develop. Sometimes data is u pdated in
wrong order. So administrators have to keep the focus on it before making a distributed network.

Data security:

If the unauthorized computer is connected to a distributed network then it can affect other
computer performance and data can be a loss also.

Examples of distributed data processing

 Hosting a website on the online server

 Online photo editing tools
 Airline ticketing system
 Processing user data by mobile companies
 Dropbox, Google drive, MSN drive, Google photos
 Report generation from satellite
 Weather forecast system

Promises of DDBSs
There are four fundamentals which may also be viewed as promises of DDBS technology:
 Transparent management of distributed and replicated data
 Reliable access to data through distributed transactions
 Improved performance
 Easier system expansion
1. Transparent Management of Distributed and Replicated Data
 A transparent system “hides” the implementation details from users.
 The advantage of a fully transparent DBMS is the high level of support that it provides for the
development of complex applications.

Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
36 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
52 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
4 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
157 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
19 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
16 pages
Lecture 5-1
No ratings yet
Lecture 5-1
31 pages
Data Fragmentation in Distributed DBMS
No ratings yet
Data Fragmentation in Distributed DBMS
24 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
11 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
35 pages
Centralized vs. Distributed Databases
No ratings yet
Centralized vs. Distributed Databases
22 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
26 pages
User Reviews on Distributed Databases
No ratings yet
User Reviews on Distributed Databases
10 pages
Centralized vs. Distributed Databases
No ratings yet
Centralized vs. Distributed Databases
16 pages
Overview of RDBMS and DDBMS Concepts
No ratings yet
Overview of RDBMS and DDBMS Concepts
136 pages
Distributed Database System
No ratings yet
Distributed Database System
13 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
17 pages
ADBMS AnitaTYCS
No ratings yet
ADBMS AnitaTYCS
56 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
9 pages
Understanding MySQL Distributed Databases
No ratings yet
Understanding MySQL Distributed Databases
22 pages
Advantages of Distributed Databases
No ratings yet
Advantages of Distributed Databases
16 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
70 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
66 pages
Understanding Distributed Databases
50% (2)
Understanding Distributed Databases
4 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
2 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
95 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
57 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
6 pages
Distributed & Client-Server Databases Overview
No ratings yet
Distributed & Client-Server Databases Overview
23 pages
Overview of Distributed Database Models
No ratings yet
Overview of Distributed Database Models
11 pages
Overview of Distributed Database Systems
100% (1)
Overview of Distributed Database Systems
24 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
19 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
84 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
32 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
9 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
27 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
21 pages
Types and Design of Distributed Databases
No ratings yet
Types and Design of Distributed Databases
22 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
15 pages
Overview of Distributed DBMS Concepts
No ratings yet
Overview of Distributed DBMS Concepts
7 pages
Distributed Database System Overview
No ratings yet
Distributed Database System Overview
46 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
15 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
12 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
25 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
7 pages
Introduction to Distributed Databases
No ratings yet
Introduction to Distributed Databases
13 pages
Data Replication in Distributed DBMS
No ratings yet
Data Replication in Distributed DBMS
21 pages
Features of Distributed Databases
No ratings yet
Features of Distributed Databases
39 pages
Understanding Distributed Data Processing
No ratings yet
Understanding Distributed Data Processing
25 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
44 pages
Overview of Database Management Systems
No ratings yet
Overview of Database Management Systems
55 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
39 pages
Overview of Distributed Database Management
No ratings yet
Overview of Distributed Database Management
12 pages
SCS1613 Removed
No ratings yet
SCS1613 Removed
21 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
18 pages
Understanding VPN Security Associations
No ratings yet
Understanding VPN Security Associations
7 pages
Smart Schools: Transforming Education in Malaysia
No ratings yet
Smart Schools: Transforming Education in Malaysia
9 pages
Combating Telecom Scams in Malaysia
No ratings yet
Combating Telecom Scams in Malaysia
3 pages
ORM Strategy CARS24 Expanded
No ratings yet
ORM Strategy CARS24 Expanded
2 pages
Teacher Tracking System via Android App
100% (1)
Teacher Tracking System via Android App
19 pages
HADR Users Guide: Public SAP Adaptive Server Enterprise 16.0 SP04 Document Version: 1.0 - 2022-04-15
No ratings yet
HADR Users Guide: Public SAP Adaptive Server Enterprise 16.0 SP04 Document Version: 1.0 - 2022-04-15
634 pages
Java Developer with 3 Years Experience
No ratings yet
Java Developer with 3 Years Experience
1 page
C++ Programming Essentials for B.Tech
No ratings yet
C++ Programming Essentials for B.Tech
26 pages
Civil Registry System Project Report
83% (18)
Civil Registry System Project Report
98 pages
.NET Developer at Natobotics
No ratings yet
.NET Developer at Natobotics
1 page
Mapping
No ratings yet
Mapping
3 pages
Remove Virus Using CMD Commands
No ratings yet
Remove Virus Using CMD Commands
9 pages
Axis Bank Domestic Chat Support Process
No ratings yet
Axis Bank Domestic Chat Support Process
4 pages
API Manual BaiBOSS 8.3
No ratings yet
API Manual BaiBOSS 8.3
64 pages
Selecting Low Bandwidth LMS Options
No ratings yet
Selecting Low Bandwidth LMS Options
4 pages
Google Certification Program
0% (1)
Google Certification Program
14 pages
Mini Project Report XXXXXXXX
No ratings yet
Mini Project Report XXXXXXXX
25 pages
Informatica ETL Tool Overview and Best Practices
No ratings yet
Informatica ETL Tool Overview and Best Practices
5 pages
Shanta's Herbal Beauty Parlour Report
No ratings yet
Shanta's Herbal Beauty Parlour Report
24 pages
Write Once, Run Anywhere
No ratings yet
Write Once, Run Anywhere
2 pages
Understanding School Databases and Data Privacy
No ratings yet
Understanding School Databases and Data Privacy
1 page
JDA Dispatcher WMS Overview and Benefits
No ratings yet
JDA Dispatcher WMS Overview and Benefits
47 pages
Java Full Stack Development Internship Report
No ratings yet
Java Full Stack Development Internship Report
13 pages
Day 2 Introduction To ISO 19650
No ratings yet
Day 2 Introduction To ISO 19650
11 pages
Azure AZ-104 Exam Questions Overview
No ratings yet
Azure AZ-104 Exam Questions Overview
6 pages
E-Business Security: Human Factors Explained
No ratings yet
E-Business Security: Human Factors Explained
21 pages
Vu360 Windows 10 Midterm Topics
No ratings yet
Vu360 Windows 10 Midterm Topics
14 pages
ASS7
No ratings yet
ASS7
10 pages
Weekly Evaluation on Technology Topics
No ratings yet
Weekly Evaluation on Technology Topics
14 pages
Flynn Restaurant Group Benefits Guide
No ratings yet
Flynn Restaurant Group Benefits Guide
1 page

DDB Notes 1 11

Uploaded by

DDB Notes 1 11

Uploaded by

UNIT – I

Introduction – Distributed Databases:

Fig: Distributed Database System

Some general features of distributed databases are:

 Location independency - Data is physically stored at multiple sites and managed by an

Applications / Uses of Distributed Database

 It is used in Corporate Management Information System.

Architectures for distributed database systems

 Peer-to-peer architecture: In this architecture, each site in the distributed database

 Distributed database systems can be used in a variety of applications, including e-

There are two types of distributed databases:

 A homogenous distributed database is a network of identical databases stored on multiple

 A heterogeneous distributed database uses different schemas, operating systems, DDBMS,

Distributed database storage is managed in two ways:

There are two types of fragmentation:

Note: In some cases, a mix of fragmentation and replication is possible.

Distributed Database Advantages and Disadvantages

Below are some key advantages and disadvantages of distributed databases:

Modular development Costly software

Reliability Large overhead

Lower communication costs Data integrity

Better response Improper data distribution

Advantages / Benefits of Distributed Databases:

 Modular Development. Modular development of a distributed database implies that a system

 Reliability. Distributed databases offer greater reliability in contrast to centralized databases.

 Improper Data Distribution. Responsiveness to user requests largely depends on proper

Centralized Database Vs Distributed Database

Database is maintained at a number of

Fig : Centralized database

Fig : Distributed database

Fig : Types of Distributed Databases

Examples of distributed databases

Some common examples of distributed databases include:

Distributed data processing

How Distributed Data Processing works?

Fig : Data Ingestion

Why Distributed Processing is important

 Improved Performance: By distributing the workload across multiple machines, distributed

Easy to replace remote computers:

Local data synchronization:

Disadvantages of distributed data processing (DDP)

Computers attached in DDP are difficult to troubleshoot, design and administrate.

Planning data synchronization is difficult:

Examples of distributed data processing

 Hosting a website on the online server

You might also like