Database Architecture Overview
Database Architecture Overview
❏ Parallel Databases
Agenda
❏ Distributed Databases
Introduction to Database Architectures
Run on
DB System Centralized Systems
Single Computer System
Multi-User
Support
Contains
DB System
Two or more users Mini
Computer
Mainframe
access
Computer
Database
simultaneously
Recall: View of Data Database
Management
System
View Level
Hide details of Data types
Logical Level
Describes data stored in DB,
Relationship among the data
Physical Level
Describes how a records is stored
Unit-V
Dr. Mahesh R. Sanghavi
Introduction to Database Architectures
Client-Server Systems
Server System
Satisfy
requests generated
M
Client System
Introduction to Database Architectures
One Tier
Parallel System
Consists
Parallel System Coarse-grain parallel
a small number of powerful
Machine processors
Speed-Up
Speed-Up and Scale-Up
Scale-Up
❏ Review of Last Session
Startup costs
Speed-Up
Cost of starting up multiple processes may dominate
Scale-Up
computation time, if the degree of parallelism is high.
are
sublinear
Due to
Interference
Processes accessing shared resources compete with each
other, thus spending time waiting on other processes,
rather than performing useful work.
Skew
Increasing the degree of parallelism increases the
variance in service times of parallely executing tasks
Parallel Database Architectures
Hierarchical
hybrid of the above
architectures
Parallel Database Architectures
Shared- Memory
Extremely efficient communication
between processors
Processor
data in shared memory can be accessed by any
Common Memory processor without having to move it using software.
Disk via
access Bus
Downside
through
architecture is not scalable beyond 32 or 64
interconnection network.
processors since the bus or the interconnection
network becomes a bottleneck
not
Processors Disk
access via
Bottleneck
have interconnection network.
Degree
Private memories
of
Provide
fault-tolerance
Architecture
Parallel Database Architectures
Cont.
Shared- Disk
Examples Downside
now part of Compaq
IBM Sysplex & bottleneck now occurs at interconnection
DEC clusters to the disk subsystem.
RDatabase
❏ Data accessed from local disks (and local memory accesses) do not
Consists pass through interconnection network, thereby minimizing the
of interference of resource sharing.
Processor ❏ Shared-nothing multiprocessors can be scaled up to
thousands of processors without interference.
Memory
❏ Main drawback: cost of communication and non-local disk
access; sending data involves software interaction at both ends.
Disk
Parallel Database Architectures
❏ Reduce the complexity of programming such ❏ Each node of the system could be a
systems by distributed virtual-memory shared-memory system with a few
architectures processors.
❏ Also called non-uniform
memory architecture (NUMA)
❏ Review of Last Session
❏ Data Fragmentation
Outline
❏ Data Transparency
Distributed System
Source: [Link]
Distributed System
Data Data
Fragmentation Fragmentation
Division of
relation
R Horizontal Vertical
fragmentation fragmentation
fragments
each tuple of r is assigned the schema for relation r is
r1, r2, ..., rn
to one or more fragments split into several smaller
Contains schemas
❏ Data Transparency
Outline
❏ Distributed Database Architecture
Distributed System
Data
Fragmentation
Advantages Horizontal Vertical
fragmentation fragmentation
Homogeneous Heterogeneous
Distributed Database System
❏ Review of Last Session
Commit Protocol
Used to
❏ a transaction which executes at multiple sites must either
be committed at all the sites, or aborted at all the sites.
ensure atomicity
❏ not acceptable to have a transaction committed at one site
across and aborted at another
Site
Commit Protocol
Commit Protocol
is is
more complicated and
widely used
more expensive
assume
❏ Execution of the protocol is initiated by the coordinator
Fail-stop after the last step of the transaction has been reached.
model
❏ The protocol involves all the local sites at which the
transaction executed
failed
Site
simply Let T be a transaction initiated at site Si
Co-ordinator
ask adds
Ci the records <prepare T> to the log and forces log to
all participants stable storage
to sends
Prepare prepare T messages to all sites at which T executed
to
Commit
transaction
Ti
Phase 1: Obtaining a Decision
Transaction if
Manager Transaction
at Commit
Site then
T
Committed
Ci
Transaction
❏ Review of Last Session
In
Phase 2 of 2PC is split into 2 phases
Phase 3
In Phase 2 and Phase 3 of 3PC
Coordinator
Phase 2 Sends
commit/abort message
Makes Coordinator to all participating sites
and
knowledge of pre-commit decision can be
a decision as in 2PC records it in used to commit despite coordinator failure
The pre-commit multiple sites
Avoids blocking problem as long as < K sites fail
decision (at least K)
Called
Concurrency Control
❏ The transaction can read the data item from anyone of the sites at
which a replica of the data item resides.
Advantages Disadvantages
is
Functionality
of locking
Advantages
implemented
work is distributed and can be made robust
to failures
lock managers by
Disadvantages
at
deadlock detection is more complicated
each site
❏ Biased Protocol
Outline
❏ Quorum Consensus Protocol
Distributed Lock Manager
Variant
Approaches
Majority
Primary protocol
copy
Quorum
Consensus
Biased
protocol
Distributed Lock Manager
Choose one Implicitly gets lock on all replicas of the data item
replica
Distributed Lock Manager
Primary Copy
Benefits Drawbacks
Majority Protocol
❏ Local lock manager at each site administers lock and unlock requests for data
items stored at that site.
Majority Protocol
In case of replicated data
Majority Protocol
Benefits Drawbacks
Biased Protocol
❏ Local lock manager at each site as in majority protocol, however, requests for
shared locks are handled differently than requests for exclusive locks.
Shared locks When a transaction needs to lock data item Q, it simply requests a lock
on Q from the lock manager at one site containing a replica of Q.
Exclusive locks When transaction needs to lock data item Q, it requests a lock on Q
from the lock manager at all sites containing a replica of Q.
Benefits Drawback
imposes less overhead on
additional overhead on
read operations.
writes
Distributed Lock Manager
❏ Each read must lock enough replicas that the sum of the site weights is >= Qr
❏ Each write must lock enough replicas that the sum of the site weights is >= Qw
based
TimeStamp Each Transaction
Concurrency-Control Must be given
Protocol
Unique
Used in TimeStamp
Distributed Systems
Timestamp
How to generate
timestamp
Still logically correct: ❏ In this case, site Si advances its logical clock to
serializability not affected the value x+ 1.
But: “disadvantages”
transactions
Deadlock Handling
Consider the following two transactions and history, with item X and
transaction T1 at site 1, and item Y and transaction T2 at site 2
Consider the following two transactions and history, with item X and
transaction T1 at site 1, and item Y and transaction T2 at site 2
Site 1 Site 2
Global
Example Wait-For Graph for False Cycles
Initial State
Site 1
Site 2
Coordinator
Cont. False Cycles
Suppose further that the insert message reaches before the delete message
❏ The coordinator would then find a false cycle this can happen due to network delays
Unnecessary rollbacks may result when deadlock has indeed occurred and a victim has
been picked, and meanwhile one of the transactions was aborted for reasons unrelated
to the deadlock.
Unnecessary rollbacks can result from false cycles in the global wait-for graph; however,
likelihood of false cycles is low.