DEFINITION OF PARALLEL AND DISTRIBUTED DATABASES
A Parallel database system is a design that seeks to enhance performance by
concurrent implementation of various operations such as loading of data, building
of indexes and query evaluation. Although data may be stored in a distributed
fashion, such distribution is governed solely by performance.
A Distributed database system is one whose data is stored across several sites, and
each site is managed by a DBMS that is capable of running independently of other
sites.
CHARACTERISTICS OF THE ARCHITECTURES
PARALLEL DATABASE SYSTEM DISTRIBUTED DATABASE SYSTEM
1. Machines/computers are physically 1. Machines/computers can be far from
close to each other each other, e.g different continents
2. Machines connect with dedicated 2. It can be connected using public-
high speed LANs and switches purpose network, e.g the internet
3. Communication cost is assumed to be 3. Communication cost and problems
minimal cannot be ignored
4. Can be shared-memory, shared-disk, 4. It is usually shared-nothing
or shared nothing structure architecture
ARCHITECTURE FOR PARALLEL DATABASE
There are three main architectures for building parallel databases:
1. Shared Memory System: In this case multiple CPUs are attached to a linked
network and can access a common region of main memory.
2. Shared Disk System: In this case, each CPU has a private memory and direct
access to all disks through a linked network
3. Shared-Nothing System: Each CPU has local main memory and disk space, but
two CPUs can access the same storage area; all communication between the
CPUs is through a network connection.
PROPERTIES OF DISTRIBUTED DATABASE SYSTEM
a. Distributed Data Independence: The user should be able to access the
database without having to know the data location.
b. Distributed Transaction Atomicity: The user should be able to write
transactions that access and update data in several sites just as he would over
local data. That means all changes persist if the transaction commits, and none
persists if it aborts.
TYPES OF DISTRIBUTED DATABASES
There are basically two types of Distributed Database:
1. Homogenous Distributed Database: This is where data stored across multiple
sites is managed by same DBMS software
2. Heterogeneous Distributed Database/Multidatabase System: This is where
data running autonomously but connected to access data across multiple sites
run different DBMS.
DISTRIBUTED DATABASE ARCHITECTURE:
There are basically three types of Distributed Database Architecture. They are:
1. Client-Server: It has one or more client processes and one or more server
processes. Client process can send a query to any one server process. Thus a
client process can run a personal computer and send queries to a server
running on a mainframe.
Advantages:
a. Simple to implement because of the separation of functionality and
centralization of the server
b. Expensive server machines are effectively utilized
c. The users can have a familiar and friendly client side user interface
2. Collaborating Server: The client-server architecture does not allow a single
query to span multiple servers. In collaborating server we can have collection
of database servers, each capable of running transactions against local data
and cooperatively execute transactions spanning multiple servers.
3. Middleware Systems: This allows single query to span multiple servers
without requiring all database servers to be capable of managing such
multisite execution strategies.
STORING DATA IN A DISTRIBUTED DBMS
This involves two concepts:
1. Fragmentation
2. Replication
Fragmentation: This involves breaking down relations into smaller portions called
fragments and stored, possibly at different sites. There are two types:
(a) Horizontal Fragmentation(This is where each fragment is the subset of rows,
hence the union of the horizontal fragments should reproduce the original
relation
(b) Vertical Fragmentation :( This is where each fragment is the subset of the
columns.) The system assigns a unique tuple id to each tuple in the original
relation so that fragments when joined should form a lossless join- that the
collection of all vertical fragments should reproduce the original relation.
Replication: This occurs when we store more than one copy of a relation or its
fragment at multiple sites.
Advantages:
1. Increased availability of data
2. Faster query evaluation: Queries can execute faster by using local copy of a
relation instead of going to a remote site
IMPORTANCE OF PARALLEL AND DISTRIBUTED DBMS
ARCHITECTURE
1. They improve reliability and availability
2. Data can be shared across multiple sites
3. Data can be managed with different levels of transparency
4. Local autonomy – a department can control the data about them as they are the
ones familiar with it
5. Faster response time for queries