0% found this document useful (0 votes)
2 views47 pages

Overview of Distributed DBMS Concepts

Uploaded by

Mahmoud Elnahas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views47 pages

Overview of Distributed DBMS Concepts

Uploaded by

Mahmoud Elnahas
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Outline

• Introduction
➡ What is a distributed DBMS
➡ Distributed DBMS Architecture

• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase query processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/1
File Systems

program 1

Redundant data
File 1
data description 1

program 2
data description 2 File 2

program 3
data description 3 File 3

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/2


Database Management
Application
program 1
(with data
semantics)
DBMS

description
Application
program 2 manipulation
(with data database
semantics) control

Application
program 3
(with data
semantics)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/3


Motivation

Database Computer
Technology Networks
integration distribution

Distributed
Database
Systems
integration

integration ≠ centralization
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/4
Distributed Computing
•A number of autonomous processing elements (not necessarily
homogeneous) that are interconnected by a computer network
and that cooperate in performing their assigned tasks.
• Processing element is a computing device that can execute a
program on its own.
• What is being distributed?
➡ Processing logic( function)
➡ Data
➡ Control

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/5


What is a Distributed
Database System?
A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.

A distributed database management system (D–DBMS) is the


software that manages the DDB and provides an access
mechanism that makes this distribution transparent to the users.

Distributed database system (DDBS) = DDB + D–


DBMS

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/6


What is not a DDBS?
• A timesharing computer system
➡ the simultaneous access to a computer system by a number of
independent users.

• A loosely or tightly coupled multiprocessor system(shared-


nothing multiprocessors).
✦ Database systems that run over multiprocessor systems are
called parallel database systems

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/7


Centralized DBMS on a
Network
• A database system which resides at one of the nodes
of a network of computers - this is a centralized
database on a network node

Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/8
Distributed DBMS
Environment

Site 1
Site 2

Site 5
Communication
Network

Site 4 Site 3

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/9


Implicit Assumptions
• Data stored at a number of sites  each site logically consists of a
single processor.
• Processors at different sites are interconnected by a computer
network  not a multiprocessor system
➡ Parallel database systems

• Distributed database is a database, not a collection of files  data


logically related as exhibited in the users’ access patterns
➡ Relational data model

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/10


Data Delivery Alternatives

•In distributed databases, data are “delivered” from the


sites where they are stored to where the query is
created.
•We characterize the data delivery alternatives along
three orthogonal dimensions:
✦ Deliverymodes
✦ Frequency measurements
✦ Communication methods

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/11


Delivery modes

• Pull only mode


➡ The transfer of data from servers to clients is started by a client pull.
When a client request is received at a server, the server responds
by locating the requested information.
• Push-only mode
➡ The transfer of data from servers to clients is initiated by a server
push in the absence of any specific request from clients.
• The hybrid mode
➡ Combines the client-pull and server-push mechanisms.
➡ The transfer of information from servers to clients is first initiated by
a client pull (by posing the query), and the subsequent transfer of
updated information to clients is initiated by a server push.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/12


Frequency
•periodic delivery
➡ data are sent from the server to clients at regular intervals.
➡ The intervals can be defined by system default or by clients using
their profiles

•Conditional delivery,
➡ data are sent from servers whenever certain conditions installed by
clients in their profiles are satisfied.
➡ An application that sends out stock prices only when they change

•Ad-hoc delivery
➡ irregular and is performed mostly in a pure pull-based system.
✦ Data are pulled from servers to clients in an ad-hoc fashion whenever
clients request it.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/13


Communication Methods

•Unicast
➡ the communication from a server to a client is one-to-one: the
server sends data to one client using a particular delivery mode
with some frequency.

•One-to-many,
➡ the server sends data to a number of clients.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/14


Distributed DBMS Promises
 Transparent management of distributed, fragmented, and
replicated data

 Improved reliability/availability through distributed transactions

 Improved performance

 Easier and more economical system expansion

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/15


Transparency
• Transparency: “hides” the implementation details from users.
➡ data independence : the immunity of user
applications to changes in the definition and
organization of data, and vice versa.
✦ Logical data independence : refers to the immunity of
user applications to changes in the logical structure
(i.e., schema) of the database.
✦ Physical data independence : hiding the details of
the storage structure from user applications.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/16


Network (distribution)
transparency
• Network (distribution) transparency
➡ Requires that users do not have to specify where data are located.
✓ Location transparency: perform a task independent of both
the location of the data and the system on which an
operation is carried out.
✓ Naming transparency : unique name is provided for each
object in the database.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/17


Ch.1/17
Replication Transparency

➡ Data that are commonly accessed by one user can be

placed on that user’s local machine as well as on the


machine of another user with the same access
requirements.

➡ Replication transparency: The system should handle

the management of copies and the user should act as


if there is a single copy of the data

➡ A copy of the data are still available on another

machine on the network.


Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/18
Fragmentation Transparency
• Divide each database relation into smaller fragments and treat
each fragment as a separate database object (i.e., another
relation).
➡ horizontal fragmentation
➡ vertical fragmentation
➡ hybrid

• This is commonly done for reasons of


✦ Performance
✦ Availability
✦ Reliability

✦ Fragmentation can reduce the negative effects of replication. Each


replica is not the full relation but only a subset of it; thus less space is
required and fewer data items need be managed.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/19
Example

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/20


Transparent Access
SELECT ENAME,SAL
Tokyo
FROM EMP,ASG,PAY
WHERE DUR > 12 Boston Paris
AND [Link] = [Link] Paris projects
Paris employees
AND [Link] = [Link] Communication Paris assignments
Network Boston employees

Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/21


Distributed Database - User
View

Distributed Database

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/22


Distributed DBMS - Reality
User
Query

User
DBMS
Application
Software
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query
DBMS
Software

User
Query

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/23


Layers of Transparency

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/24


Reliability Through
Transactions
•Distributed DBMSs are intended to improve
reliability
➡ They have replicated components
✦ In the failure of one site , some of the data may be unreachable, but with
proper care, users may be permitted to access other parts of the
distributed database.

•Distributed transaction support requires


implementation of
✓ Distributed concurrency control protocols
✓ Commit protocols

•Data replication
➡ Great for read-intensive workloads, problematic for
updates
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/25
Potentially Improved
Performance
• Proximity of data to its points of use (Data localization).
➡ Requires some support for fragmentation and replication
✦ Since each site handles only a portion of the database, contention for
CPU and I/O services is not as severe as for centralized databases.
✦ Localization reduces remote access delays that are usually involved in
wide area networks

• Parallelism in execution parallelism.


➡ Inter-query parallelism :the ability to execute multiple queries at the
same time.
➡ intra-query parallelism : breaking up a single query into a number of
subqueries each of which is executed at a different site, accessing a
different part of the distributed database.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/26


Easier System Expansion
• Issue is database scaling
• Emergence of microprocessor and workstation technologies
➡ Client-server model of computing(add processes at the site, add
more sites)

• Data communication cost vs telecommunication cost

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/27


Complications Introduced by
Distribution
• Data may be replicated, the distributed database system is
responsible for
➡ choosing one of the stored copies of the requested data for access
in case of retrievals.
➡ making sure that the effect of an update is reflected on each and
every copy of that data item.
• If some sites fail or communication fail, DBMS will ensure update
for fail site , the effects will be reflected on the data residing at
the failing or unreachable sites as soon as the system can recover
from the failure
• The synchronization of transactions on multiple sites is
considerably harder than for a centralized system.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/28


Distributed DBMS Issues
• Distributed Database Design
➡ How to distribute the database
➡ Replicated & non-replicated database distribution
➡ A related problem in directory management

• Query Processing
➡ Convert user transactions to data manipulation instructions
➡ Optimization problem
✦ min{cost = data transmission + local processing}
➡ General formulation is NP-hard

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/29


Distributed DBMS Issues
• Concurrency Control
➡ Synchronization of concurrent accesses
➡ Consistency and isolation of transactions' effects
➡ Deadlock management

• Reliability
➡ How to make the system resilient to failures
➡ Atomicity and durability

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/30


Relationship Between Issues
Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/31
Architecture
• Defines the structure of the system
➡ Components identified

➡ Functions of each component defined

➡ Interrelationships and interactions between components and


interactions among these components are defined.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/32


ANSI/SPARC Architecture

Users

External External External External


Schema view view view

Conceptual Conceptual
view
Schema

Internal Internal view


Schema

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/33


Differences between
Three Levels
of ANSI-SPARC
Architecture

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/34


DBMS Implementation
Alternatives

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/35


Architectural Models for
Distributed DBMSs
1. Autonmy
➡ The distribution of control, not of data. It indicates the
degree to which individual DBMSs can operate
independently.
➡ The dimensions of autonomy can be specified as follows
✓ Design autonomy: Individual DBMSs are free to use the data
models and transaction management techniques that they
prefer.
✓ Communication autonomy: Each of the individual DBMSs is
free to make its own decision as to what type of information
it wants to provide to the other DBMSs or to the software
that controls their global execution.
✓ Execution autonomy: Each DBMS can execute the
transactions that are submitted to it in any way that it wants
Distributed DBMS
to. © M. T. Özsu & P. Valduriez Ch.1/36
Autonmy
• Tight integration,
➡ where a single-image of the entire database is available to any user
who wants to share the information
• Semiautonomous systems
➡ DBMSs that can operate independently,
➡ Each of these DBMSs determine what parts of their own database
they will make accessible to users of other DBMSs.
• Total isolation,
➡ the individual systems are stand-alone DBMSs that know neither of
the existence of other DBMSs nor how to communicate with them.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/37


Architectural Models for
Distributed DBMSs
2. Distribution
➡ Whether the components of the system are located on the same
machine or not

3. Heterogeneity
➡ Various levels (hardware, communications, operating system)
➡ DBMS important one
✦ data model, query language, transaction management algorithms

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/38


Client/Server Architecture(A0,
D1, H0)
• Data management duties at
servers
• The clients focus on
providing the application
environment including the
user interface.
• The communication duties
are shared between the client
machines and servers.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/39


Database Server(three-tier
distributed system
architecture)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/40


Distributed Database Servers
(n-tier distributed approach)

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/41


Data logical Distributed
DBMS Architecture
ES1 ES2 ... ESn

GCS

LCS1 LCS2 ... LCSn

LIS1 LIS2 ... LISn

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/42


Peer-to-Peer Component
Architecture (full distribution)
(A0, D2, H0)
There is no distinction of client machines versus servers.
Each
machine has full DBMS functionality and can communicate with
other machines to execute queries and transactions
USER PROCESSOR DATA PROCESSOR

Global Local Syste Local


Extern
Concept Concept m Intern
al ual GD/ ual Log al
Schem Schema D Schema Sche
User a ma
requests Database
Controller
Interface

Semantic

Processor
Optimizer

Recovery
Handler

Manager
Monitor

Process
Executi

Runtim

Suppor
Global
Global

Query
Query
User

Data

Local
Local
on

or
USER

t
System
responses

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/43


Multidatabase systems
(MDBS) (A2, D2, H1)
Multidatabase systems (MDBS) represent the case where
individual DBMSs (whether distributed or not) are fully
autonomous and have no concept of cooperation; they
may not even “know” of each other’s existence or how to
talk to each other.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/44


Data logical Multi-DBMS
Architecture
GES1 GES2 ... GESn

LES11 … LES1n GCS LESn1 … LESnm

LCS1 LCS2 … LCSn

LIS1 LIS2 … LISn

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/45


MDBS Components &
Execution
Global
User
Request

Local Local
User Multi-DBMS User
Request Layer Request
Global Global Global
Subrequest Subrequest Subrequest

DBMS1 DBMS2 DBMS3

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/46


Mediator/Wrapper
Architecture

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.1/47

You might also like