0% found this document useful (0 votes)

4 views52 pages

Chapter - 7 Distributed Database System

This chapter covers the fundamentals of distributed databases, including their need, advantages, disadvantages, and the architecture of distributed database management systems (DDBMS). It discusses key concepts such as data fragmentation, replication, and allocation, as well as the differences between homogeneous and heterogeneous distributed databases. Additionally, it addresses transaction transparency, performance optimization, and the two-phase commit protocol for maintaining data integrity across distributed systems.

Uploaded by

wilmaangelo8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views52 pages

Chapter - 7 Distributed Database System

Uploaded by

wilmaangelo8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

In this chapter you will learn:

• The need for distributed databases.

• The differences between distributed database
systems, distributed processing, and parallel
Chapter - 7
database systems.
Distributed • The advantages and disadvantages of
Database distributed DBMSs.
system • The functions that should be provided by a
distributed DBMS.
• An architecture for a distributed DBMS.
• The main issues associated with distributed
database design, namely fragmentation,
replication, and allocation. 1
– Distributed database –

Distributed – logically interrelated collection of shared

Database data (and a description of this data)

Concepts physically distributed over a computer

network.

– DDBMS –

– is a software system that manages a

distributed database while making the
distribution transparent to the user.
– A collection of logically related shared data;

– The data is split into a number of fragments;

– Fragments may be replicated;

– The sites are linked by a communications

Characteristics
network;
of DDBMS:
– The data at each site is under the control of a
DBMS;

– The DBMS at each site can handle local

applications, autonomously;

– Each DBMS participates in at least one global

application.
Advantages DDS
1. Management of distributed data with different levels of
transparency:
 Distribution transparency
– This refers to the physical placement of data (files, relations, etc.) is
not known to the user.
 Network transparency
– Users do not have to worry about operational details of the network.
 Location transparency
– refers to freedom of issuing command from any location
without affecting its work.
Advantages DDS…
 Naming transparency
– Allows access to any named object (files, relations, etc.) from any
location.
 Replication transparency
− Allows to store copies of a data at multiple sites.
− This is done to minimize access time to the required data.
 Fragmentation transparency
− Allows to segment a relation horizontally (create a subset of
tuples of a relation) or vertically (create a subset of columns of a
relation).
Advantages of DDS
2. Increase reliability and availability:
− Reliability refers to system live time, that is, system is running efficiently
most of the time.
− Availability is the probability that the system is continuously available
(usable or accessible) during a time interval.
− A distributed database system has multiple nodes (computers) and if one
fails then others are available to do the job.
3. Improved performance:
− DDBMS fragments the database to keep data closer to where it is needed
most.
− This reduces data management (access and modification) time significantly.
4. Scalability - Easier expansion
− Allows new nodes (computers) to be added anytime without chaining the
entire configuration.
– Complexity

Disadvantages – Cost
of – Security
DDS
– Integrity control more difficult

– Lack of standards

– Lack of experience

– Database design more complex

Database system architectures
 A Database Architecture is a representation of DBMS
design.

 It helps to design, develop, implement, and maintain the

database management system.

 There are three database system architectures:

1. Centralized Database Architecture

2. Parallel Database Architectures

3. Distributed Database Architecture

Centralized database
• A centralized database is basically a type of database that is stored,
located and maintained at a single location only.
• This type of database is modified and managed from that location
itself.
Parallel database architectures

 Parallel DBMSs link multiple, smaller machines to

achieve the same throughput as a single, larger machine,
often with greater scalability and reliability.

 The three main architectures for parallel DBMSs:

 Shared memory - (tightly coupled) – strongly united/tied

 Shared disk - (loosely coupled architecture)-weakly joined

 Shared nothing - (massively parallel processing (MPP))

architecture – extremely/purely parallel
The three main architectures for parallel DBMSs:

■ Shared memory – is a tightly coupled architecture in which

multiple processors share secondary (disk) storage and primary
memory.
The three main architectures for parallel DBMSs:

 Shared disk – is a loosely coupled architecture where multiple

processors share secondary (disk) storage but each has their own
primary memory.
The three main architectures for parallel DBMSs:

 Shared nothing - (massively parallel processing (MPP))

architecture.
• Multiple processor architecture in which each processor is part
of a complete system, with its own memory and disk storage.
Distributed database
• A distributed database system allows applications to
access data from local and remote databases.
• There are two Types of distributed database
system:
Type of • Homogeneous Distributed Database.
Distributed • Heterogeneous Distributed Database.
database system
Homogeneous
• All sites of the database system have identical setup, i.e., same
database system software.
• The underlying operating systems can be a mixture of Linux,
Window, Unix, etc.
• For example, all sites run Oracle or DB2, or Sybase or some
other database system. Window
Advantages Site 5 Unix
Oracle Site 1
 Easy to use Oracle
 Easy to mange Window
Site 4 Communications
 Easy to Design neteork

Disadvantages Oracle
 Difficult for most organizations to Site 3 Site 2
Linux Oracle Linux Oracle
enforce a homogeneous environment
Homogeneous Distributed Database Systems

 Autonomy determines the extent to which individual nodes or DBs

in a connected DDB can operate independently.

 Here are some types of autonomy in a Homogeneous DDB:

• Design autonomy refers to independence of data model usage and
transaction management techniques among nodes.

• Communication autonomy determines the extent to which each node can

decide on sharing of information with other nodes.

• Execution autonomy refers to independence of users to act as they please.

 Non-autonomous − Data is distributed across the homogeneous nodes

and a central or master DBMS co-ordinates data updates across the sites.
Heterogeneous

 Different data center may run different DBMS products, with possibly different
underlying data models. Object Unix Relational
Oriented Site 5 Unix
 Translations required to allow for: Site 1
Hierarchical
 Different hardware. Window
Site 4 Communications
 Change of codes and word lengths. network
 Different DBMS products. Network
 Mapping of data structures in one Object DBMS
data model to the equivalent data Oriented Site 3 Site 2 Relational
structures in another data model Linux Linux
 Translate the query language used (for example, a relational model SQL SELECT
statements are mapped to the network FIND and GET statements)
 Different hardware and different DBMS products.
 If both the hardware and software are different, then both these types of
translation are required. This makes the processing extremely complex.
Heterogeneous

 Advantages
 Huge data can be stored in one Global center from different data
center
 Remote access is done using the global schema.
 Different DBMSs may be used at each node

 Disadvantages
 Difficult to mange
 Difficult to design.

.
Multidatabase system (MDBS)

• Multidatabase system (MDBS)- a distributed DBMS in which

each site maintains complete autonomy.

• MDBSs logically integrate a number of independent DDBMSs while allowing

the local DBMSs to maintain complete control of their operations.

• MDBS allows users to access and share data without requiring full database
schema integration.

• Federated database system - collection of cooperating database

systems that are autonomous and possibly heterogeneous.
 Differences in data models

 Differences in constraints

 Differences in query language

Distributed Processing and Distributed Database
DDBMS Components
 DDBMS protocol
 Computer workstations
 To form the network system.
 Network hardware and software
 Components that reside in each workstation.
 Communications media
 Carry the data from one workstation to another.
 Transaction processor (TP)
 Receives and Processes the application’s data requests.
 Data processor (DP)
 Stores and Retrieves data located at the site.
 Also Known as data manager (DM).
DDBMS protocol
• DDBMS protocol determines how the DDBMS will:

– Interface with the network to transport data and commands

between DPs and TPs.

– Synchronize all data received from DPs (TP side) and route
retrieved data to the appropriate TPs (DP side).

– Ensure common database functions in a distributed system --

security, concurrency control, backup, and recovery.
Distributed Database Design
• The design of a distributed database introduces three
new issues:

– How to partition the database into fragments?

– Which fragments to replicate?

– Where to locate those fragments and replicas?

Data Fragmentation
 Data fragmentation allows us to break a single object into
two or more segments or fragments.
 There are three Types of Fragmentation Strategies:
 Horizontal Fragmentation

 Vertical Fragmentation

 Mixed Fragmentation
Horizontal Fragmentation

 Horizontal Fragmentation - Consists of a subset of the tuples

of a relation.

 Fragment represents the equivalent of a SELECT statement, with

the WHERE clause on a single attribute.
Vertical fragment

 Vertical fragment Consists of a subset of the attributes of a

relation.

 Equivalent to the PROJECT statement.

Mixed fragment

 Mixed fragment - Consists of a horizontal

fragment that is subsequently vertically
fragmented, or a vertical fragment that is
then horizontally fragmented.

 A mixed fragment is defined using the

Selection and Projection operations of the
relational algebra.
Data Replication

 Data replication refers to the storage of data copies at

multiple sites served by a computer network.

– Enhance data availability and response time, reducing

communication and total query costs.
Data Replication
• Mutual Consistency Rule
– All copies of data fragments be identical.
– DDBMS must ensure that a database update is performed at all
sites where replicas exist.
• Replication Conditions
– Fully Replicated database stores multiple copies of all database
fragments at multiple sites.
– Partially Replicated database stores multiple copies of some
database fragments at multiple sites.
• Factors for Data Replication Decision
– Database Size
– Usage Frequency
Data Allocation

 Data allocation describes the processing of deciding where to

locate data.
 Data Allocation Strategies
– Centralized
The entire database is stored at one site.
– Partitioned
The database is divided into several disjoint parts
(fragments) and stored at several sites.
– Replicated
Copies of one or more database fragments are stored at
several sites.
Data allocation algorithms
• Data allocation algorithm take into consideration a
variety of factors:

– Performance and data availability goals

– Size, number of rows, the number of relations that an

entity maintains with other entities.

– Types of transactions to be applied to the database, the

attributes accessed by each of those transactions.
Transparencies in a DDBMS

 Transparency hides implementation details from

the user.
‒ Distribution transparency
– Transaction transparency
– Failure transparency
– Performance transparency
Distribution Transparency
• Distribution transparency allows the user to perceive the
database as a single, logical entity.

• Allows us to manage a physically dispersed database as though it

were a centralized database.

• Three Levels of Distribution Transparency

– Fragmentation transparency

– Location transparency

– Local mapping transparency

Distribution Transparency
• Example :
• Employee data (EMPLOYEE) are distributed over three locations: New York,
Atlanta, and Miami.

• Depending on the level of distribution transparency support, three different

cases of queries are possible:
Distribution Transparency
• Case 1: DB Supports Fragmentation Transparency
SELECT * FROM EMPLOYEE WHERE EMP_DOB < '01-JAN-1940';

• Case 2: DB Supports Location Transparency

SELECT * FROM E1 WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E2 WHERE EMP_DOC < '01-JAN-1940';
UNION
SELECT * FROM E3 WHERE EMP_DOC < '01-JAN-1940';

• Case 3: DB Supports Local Mapping Transparency

SELECT * FROM E1 NODE NY WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E2 NODE ATL WHERE EMP_DOB < '01-JAN-1940';
UNION
SELECT * FROM E3 NODE MIA WHERE EMP_DOB < '01-JAN-1940';
Transaction Transparency
• Transaction transparency - ensures that database
transactions will maintain the database’s integrity and
consistency.
• Transaction transparency consists:
– Remote Requests
– Remote Transactions
– Distributed Transactions
– Distributed Requests
A Remote Request
 Allows us to access data to be processed by a single
remote database processor.
A Remote Transaction
 Composed of several requests, may access data at only
a single site.
Distributed Transactions

 Allows a transaction to reference several (local or

remote) DP sites.
A Distributed Request
 Reference data from several remote DP sites.
 Allows a single request to reference a physically partitioned table.
Distributed Transactions and 2 Phase Commit

 Transaction transparency in a DDBMS environment ensures

that all distributed transactions maintain the distributed
database’s integrity and consistency.
 Transaction may access data at several sites.
 Each site has a local transaction manager responsible for:
– Maintaining a log for recovery purposes
– Participating in coordinating the concurrent execution of
the transactions executing at that site.
Distributed Transactions and 2 Phase Commit

 Each site has a transaction coordinator, which is

responsible for:
– Starting the execution of transactions that originate at
the site.
– Distributing sub transactions at appropriate sites for
execution.
– Coordinating the termination of each transaction that
originates at the site.
Two-Phase Commit Protocol
 DO performs the operation and records the “before” and “after” values in the
transaction log.

 UNDO reverses an operation, using the log entries written by the DO portion
of the sequence.

 REDO redoes an operation, using the log entries written by DO portion of the
sequence.

– The write-ahead protocol forces the log entry to be written to permanent

storage before the actual operation takes place.

• Two-phase commit protocol defines the operations between two nodes;

• Coordinator and

• Subordinates or cohorts - one or more

Two-Phase Commit Protocol

• The protocol is implemented in two phases:

• Phase 1: Preparation
• The coordinator sends a PREPARE TO COMMIT
message to all subordinates.

• The subordinates receive the message, write the transaction

log using the write-ahead protocol, and send an
acknowledgement message to the coordinator.

• The coordinator makes sure that all nodes are ready to

commit, or it aborts the transaction.
Two-Phase Commit Protocol
 Phase 2: The Final Commit
– The coordinator broadcasts a COMMIT message to all
subordinates and waits for the replies.
– Each subordinate receives the COMMIT message then updates
the database, using the DO protocol.
– The subordinates reply with a COMMITTED or NOT COMMITTED
message to the coordinator.
– If one or more subordinates uncommitted, the coordinator sends
an ABORT message, thereby forcing them to UNDO all changes.
Performance Transparency and Query Optimization

• Query optimization must provide distribution transparency as well

as replica transparency.

• Replica transparency refers to the DDBMSs ability to hide the

existence of multiple copies of data from the user.

• Query optimization algorithms are based on two principles:

• Selection of the optimum execution order

• Selection of sites to be accessed to minimize communication

costs
Operation Modes of Query Optimization
 Automatic query optimization
– DDBMS finds the most cost-effective access path without user intervention.
 Manual query optimization
– Optimization is selected and scheduled by the end user or programmer.
Timing of Query Optimization
– Static query optimization takes place at compilation time.
– Dynamic query optimization takes place at execution time.
• Optimization Techniques -
– Statistically based query optimization - uses statistical information about the
database.
– Rule-based query optimization algorithm - based on a set of user-defined
rules to determine the best query access strategy.
Date’s Twelve Rules for a DDBMS
• In this final section, we list Date’s twelve rules (or objectives) for
DDBMSs (Date, 1987b).
• Fundamental principle
• To the user, a distributed system should look exactly like a non-
distributed system.
1) Local autonomy
2) No reliance on a central site
3) Continuous operation
4) Location independence
Date’s Twelve Rules for a DDBMS

5) Fragmentation independence

6) Replication independence

7) Distributed query processing

8) Distributed transaction processing

9) Hardware independence

10) Operating system independence

11) Network independence

12) Database independence

Review Questions ?
1. Explain what is meant by a DDBMS and discuss the motivation
in providing such a system.
2. Compare and contrast a DDBMS with a parallel DBMS. Under
what circumstances would you choose a DDBMS over a parallel
DBMS?
3. Discuss the advantages and disadvantages of a DDBMS.
4. What is the difference between a homogeneous and a
heterogeneous DDBMS? Under what circumstances would such
systems generally arise?
The End

Question !

Advantages and Disadvantages of DDBMS
No ratings yet
Advantages and Disadvantages of DDBMS
50 pages
Overview of Distributed Database Systems
0% (1)
Overview of Distributed Database Systems
54 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
60 pages
Principles of Distributed Database Systems
No ratings yet
Principles of Distributed Database Systems
38 pages
Distributed Database System Architecture
No ratings yet
Distributed Database System Architecture
49 pages
Chapter 4 - Distribute Database System
No ratings yet
Chapter 4 - Distribute Database System
65 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
69 pages
Data Fragmentation in Distributed DBMS
No ratings yet
Data Fragmentation in Distributed DBMS
24 pages
DBMS Classification: Types & Architectures
No ratings yet
DBMS Classification: Types & Architectures
48 pages
Parallel and Distributed DBMS Overview
No ratings yet
Parallel and Distributed DBMS Overview
58 pages
Parallel vs. Distributed Databases Explained
No ratings yet
Parallel vs. Distributed Databases Explained
7 pages
Types and Allocation in DDBMS
No ratings yet
Types and Allocation in DDBMS
56 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
44 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
73 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
35 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
12 pages
Overview of Distributed Databases
100% (2)
Overview of Distributed Databases
81 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
21 pages
DDB Notes 1 11
No ratings yet
DDB Notes 1 11
11 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
42 pages
Advanced Database Architectures Overview
No ratings yet
Advanced Database Architectures Overview
38 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
19 pages
Advantages of Distributed Databases
No ratings yet
Advantages of Distributed Databases
58 pages
Advanced Database Management Techniques
No ratings yet
Advanced Database Management Techniques
86 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
52 pages
Parallel and Distributed Database Systems
No ratings yet
Parallel and Distributed Database Systems
16 pages
Types of Distributed Database Systems
No ratings yet
Types of Distributed Database Systems
37 pages
Distributed Database Management Overview
No ratings yet
Distributed Database Management Overview
30 pages
Understanding Distributed DBMS Features
No ratings yet
Understanding Distributed DBMS Features
8 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
27 pages
Overview of Distributed Database Systems
100% (2)
Overview of Distributed Database Systems
54 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
55 pages
Overview of Distributed Database Management
No ratings yet
Overview of Distributed Database Management
12 pages
SCS1613 Removed
No ratings yet
SCS1613 Removed
21 pages
Understanding Distributed Databases
50% (2)
Understanding Distributed Databases
4 pages
Evolution of Distributed Database Systems
No ratings yet
Evolution of Distributed Database Systems
55 pages
Overview of Parallel Database Systems
No ratings yet
Overview of Parallel Database Systems
23 pages
Overview of RDBMS and DDBMS Concepts
No ratings yet
Overview of RDBMS and DDBMS Concepts
136 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
30 pages
Advantages and Disadvantages of DDBS
No ratings yet
Advantages and Disadvantages of DDBS
10 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
24 pages
Advantages of Distributed Databases
No ratings yet
Advantages of Distributed Databases
16 pages
Understanding Distributed Data Processing
No ratings yet
Understanding Distributed Data Processing
25 pages
Unit 1
No ratings yet
Unit 1
21 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
105 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
19 pages
Introduction To Parallel and Distributed Databases
No ratings yet
Introduction To Parallel and Distributed Databases
12 pages
Trends in Distributed Database Management
No ratings yet
Trends in Distributed Database Management
23 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
66 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
6 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
73 pages
Overview of Distributed Database Systems
100% (1)
Overview of Distributed Database Systems
22 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
37 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
23 pages
CH 5
No ratings yet
CH 5
66 pages
Mapping Data in Distributed Databases
No ratings yet
Mapping Data in Distributed Databases
28 pages
Understanding Database Management Systems
No ratings yet
Understanding Database Management Systems
23 pages
Abu Dhabi BIM Code of Practice 2020
No ratings yet
Abu Dhabi BIM Code of Practice 2020
27 pages
Computer Hardware Sales System
No ratings yet
Computer Hardware Sales System
80 pages
Carrefour's Founding and Growth History
No ratings yet
Carrefour's Founding and Growth History
7 pages
Admission and Enrollment Guidelines 2023-2024
No ratings yet
Admission and Enrollment Guidelines 2023-2024
5 pages
Java Constructor Overview and Examples
No ratings yet
Java Constructor Overview and Examples
5 pages
8086 Microprocessor Quiz Questions
No ratings yet
8086 Microprocessor Quiz Questions
11 pages
IT/OT Alignment: A Security Roadmap
100% (1)
IT/OT Alignment: A Security Roadmap
6 pages
Cloud Computing Data Flow Fundamentals
No ratings yet
Cloud Computing Data Flow Fundamentals
15 pages
L J - X Series Terminal-Software: User's Manual
No ratings yet
L J - X Series Terminal-Software: User's Manual
46 pages
New ISO Standards Published October 2017
No ratings yet
New ISO Standards Published October 2017
22 pages
Load Balancing Algorithms in Distributed Systems
No ratings yet
Load Balancing Algorithms in Distributed Systems
4 pages
Football Analysis - Draft
No ratings yet
Football Analysis - Draft
32 pages
Integers Worksheet for Class 4 Maths
No ratings yet
Integers Worksheet for Class 4 Maths
4 pages
Multimedia Technology Challenges & Solutions
No ratings yet
Multimedia Technology Challenges & Solutions
3 pages
D Series Controller AS Language Manual
No ratings yet
D Series Controller AS Language Manual
552 pages
Tokenization Methods Overview
No ratings yet
Tokenization Methods Overview
29 pages
Introduction to Pattern Recognition Systems
No ratings yet
Introduction to Pattern Recognition Systems
295 pages
H&M Execution Report: All Tests Failed
No ratings yet
H&M Execution Report: All Tests Failed
5 pages
INF4831 Online Exam Guidelines 2021
No ratings yet
INF4831 Online Exam Guidelines 2021
7 pages
CVMC Purchase Request for Equipment
No ratings yet
CVMC Purchase Request for Equipment
2 pages
Machine Requirement Calculations for Production
No ratings yet
Machine Requirement Calculations for Production
7 pages
IoT Sleep Monitoring System Proposal
No ratings yet
IoT Sleep Monitoring System Proposal
8 pages
6.0001 Word Game Problem Set Guide
No ratings yet
6.0001 Word Game Problem Set Guide
11 pages
Cache Coherence Protocol for CVA6
No ratings yet
Cache Coherence Protocol for CVA6
4 pages
Google’s Scalable Data Architecture Insights
No ratings yet
Google’s Scalable Data Architecture Insights
3 pages
Data Science Resume: 2 Years Experience
No ratings yet
Data Science Resume: 2 Years Experience
2 pages
FortiGate 300D End of Life Announcement
No ratings yet
FortiGate 300D End of Life Announcement
1 page
Utumishi VPN Form HAZINA SACCOS-1
No ratings yet
Utumishi VPN Form HAZINA SACCOS-1
3 pages
PGC ThinkFast Quiz Challenge Overview
No ratings yet
PGC ThinkFast Quiz Challenge Overview
30 pages
Overview of Number Systems
No ratings yet
Overview of Number Systems
143 pages