0% found this document useful (0 votes)
7 views77 pages

Database Architecture Overview

SQL

Uploaded by

niketa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views77 pages

Database Architecture Overview

SQL

Uploaded by

niketa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Describe different database architecture

and analyses the use of appropriate


architecture in real time environment.
❏ Introduction to Database Architectures

❏ Parallel Databases
Agenda
❏ Distributed Databases
Introduction to Database Architectures
Run on
DB System Centralized Systems
Single Computer System

Centralize Client Server do not interact with other


System System computer systems

a number of device controllers that are connected


General-purpose computer system
through a common bus that provides access to shared
memory
Single-user system desk-top unit, single user, usually
has only one CPU and one or two
hard disks; the OS may support
only one user.
Introduction to Database Architectures
more disks, more memory, multiple CPUs, and a multi-user OS.
Multi-user system Serve a large number of users who are connected to the system
vie terminals. Often called server systems.
Introduction to Database Architectures

Multi-User
Support

Contains
DB System
Two or more users Mini
Computer

Mainframe
access
Computer
Database
simultaneously
Recall: View of Data Database
Management
System

View Level
Hide details of Data types

Logical Level
Describes data stored in DB,
Relationship among the data

Physical Level
Describes how a records is stored

Unit-V
Dr. Mahesh R. Sanghavi
Introduction to Database Architectures

Client-Server Systems

Server System
Satisfy
requests generated

at General structure of Client-Server

M
Client System
Introduction to Database Architectures

One, Two, ree Tier

One Tier
Parallel System
Consists
Parallel System Coarse-grain parallel
a small number of powerful
Machine processors

Consists massively parallel Utilize


fine grain parallel thousands of smaller processors
multiple Machine
processors
multiple performance measures
memory multiple disks
throughput response time
connected by a fast
interconnection network
Speed-Up and Scale-Up

a fixed-sized problem executing on a small system is given to a system


Speed-Up
which is N-times larger.

Measured by Speedup is linear if equation


equals N.

N-times larger system used to


perform N-times larger job
increase the size of both the problem and the system
Scale-Up
Scale up is linear if equation
Measured by equals 1
Speed-Up and Scale-Up

Speed-Up
Speed-Up and Scale-Up

Scale-Up
❏ Review of Last Session

❏ Parallel Databases: Overview


Outline
❏ Parallel Database Architecture
Speed-Up and Scale-Up

a fixed-sized problem executing on a small system is given to a system


Speed-Up
which is N-times larger.

Measured by Speedup is linear if equation


equals N.

N-times larger system used to


perform N-times larger job
increase the size of both the problem and the system
Scale-Up
Scale up is linear if equation
Measured by equals 1
Factors Limiting Speedup and Scaleup

Startup costs
Speed-Up
Cost of starting up multiple processes may dominate
Scale-Up
computation time, if the degree of parallelism is high.
are

sublinear
Due to
Interference
Processes accessing shared resources compete with each
other, thus spending time waiting on other processes,
rather than performing useful work.
Skew
Increasing the degree of parallelism increases the
variance in service times of parallely executing tasks
Parallel Database Architectures

Shared- Memory Shared- Disk Shared- nothing


processors share a processors share a processors share neither a
common Memory common Disk common memory nor
common disk
Parallel Database Architectures

Hierarchical
hybrid of the above
architectures
Parallel Database Architectures

Shared- Memory
Extremely efficient communication
between processors
Processor
data in shared memory can be accessed by any
Common Memory processor without having to move it using software.
Disk via
access Bus
Downside
through
architecture is not scalable beyond 32 or 64
interconnection network.
processors since the bus or the interconnection
network becomes a bottleneck

❏ Widely used for lower degrees of parallelism (4 to 8).


Parallel Database Architectures

Shared- Disk Memory Bus

not
Processors Disk
access via
Bottleneck
have interconnection network.
Degree
Private memories
of
Provide
fault-tolerance
Architecture
Parallel Database Architectures
Cont.

Shared- Disk

Examples Downside
now part of Compaq
IBM Sysplex & bottleneck now occurs at interconnection
DEC clusters to the disk subsystem.

RDatabase

(now Oracle Rdb)


❏ Shared-disk systems can scale to a somewhat larger number of
processors, but communication between processors is slower.
Parallel Database Architectures

Shared- nothing Examples


Oracle-n CUBE
Tandem
Node

❏ Data accessed from local disks (and local memory accesses) do not
Consists pass through interconnection network, thereby minimizing the
of interference of resource sharing.
Processor ❏ Shared-nothing multiprocessors can be scaled up to
thousands of processors without interference.
Memory
❏ Main drawback: cost of communication and non-local disk
access; sending data involves software interaction at both ends.
Disk
Parallel Database Architectures

Hierarchical of Shared- Memory is


Top level architecture

Combines Shared- nothing


Shared- Disk

Characteristics ❏ nodes connected by an interconnection


Shared- nothing network, and do not share disks or
memory with each other.

❏ Reduce the complexity of programming such ❏ Each node of the system could be a
systems by distributed virtual-memory shared-memory system with a few
architectures processors.
❏ Also called non-uniform
memory architecture (NUMA)
❏ Review of Last Session

❏ Data Fragmentation
Outline
❏ Data Transparency
Distributed System

Source: [Link]
Distributed System

Data Data
Fragmentation Fragmentation
Division of
relation
R Horizontal Vertical
fragmentation fragmentation
fragments
each tuple of r is assigned the schema for relation r is
r1, r2, ..., rn
to one or more fragments split into several smaller
Contains schemas

sufficient information to reconstruct relation


R
Distributed System

Horizontal fragmentation Account


R
Distributed System

Vertical fragmentation ❏ All schemas must contain a common candidate key


(or superkey) to ensure lossless join property
Emp
R ❏ A special attribute, the tuple-id attribute may be
added to each schema to serve as a candidate key.
❏ Review of Last Session

❏ Data Transparency
Outline
❏ Distributed Database Architecture
Distributed System
Data
Fragmentation
Advantages Horizontal Vertical
fragmentation fragmentation

❏ allows parallel processing on ❏ allows parallel processing on a


fragments of a relation relation

❏ allows a relation to be split so that ❏ allows tuples to be split so that each


tuples are located where they are part of the tuple is stored where it is
most frequently accessed most frequently accessed

❏ Fragments may be successively fragmented to an arbitrary depth.


Distributed System
Data Transparency
Transparency issues
in
Degree To which Relation

System user Fragmentation transparency


May remain

unaware of the details of how Replication transparency


and
where the data items are stored
in a distributed system Location transparency
Distributed Database System

Distributed Database Architecture

Homogeneous Heterogeneous
Distributed Database System
❏ Review of Last Session

❏ Two Phase Locking Protocol


Outline
❏ Log based Recovery
Distributed System
Distributed Transactions
Site
has
Transaction
transaction manager transaction Coordinator
May access
Maintaining a log for recovery Starting the execution of
purposes transactions that originate at the
Data site.
Participating in coordinating the
Distributing subtransactions
at concurrent execution of the
at appropriate sites for
transactions executing at that site.
several sites execution.
Transaction System Architecture
Distributed System
System Failure modes

Failures unique to distributed systems Network partition


a subsystem may consist of
Failure of a site a single node

Loss of massages Handled by

TCP-IP ❏ Network partitioning and site


failures are generally
Failure of a communication link indistinguishable.
N/W
Handled by
Protocols
Commit Protocol

Commit Protocol

Used to
❏ a transaction which executes at multiple sites must either
be committed at all the sites, or aborted at all the sites.
ensure atomicity
❏ not acceptable to have a transaction committed at one site
across and aborted at another

Site
Commit Protocol

Commit Protocol

Two phase Three phase


commit protocol commit protocol

is is
more complicated and
widely used
more expensive

but avoids some drawbacks of


two-phase commit protocol
Two Phase Commit Protocol

assume
❏ Execution of the protocol is initiated by the coordinator
Fail-stop after the last step of the transaction has been reached.
model
❏ The protocol involves all the local sites at which the
transaction executed
failed
Site
simply Let T be a transaction initiated at site Si

Stop Working , and let the transaction

Do not cause any other harm coordinator at Si be Ci


Such as

sending incorrect messages to other sites.


Phase 1: Obtaining a Decision

Co-ordinator

ask adds
Ci the records <prepare T> to the log and forces log to
all participants stable storage
to sends
Prepare prepare T messages to all sites at which T executed

to
Commit
transaction
Ti
Phase 1: Obtaining a Decision

Transaction if
Manager Transaction
at Commit

Site then

add the record <ready T> to the log


If not
Determines
if
force all records for T to stable storage
add a record <no T> to
It can commit
the log and send abort send ready T message to Ci
the transaction
T message to Ci
Phase 2: Recording the Decision

T
Committed
Ci

Transaction
❏ Review of Last Session

❏ Three Phase Locking Protocol


Outline
❏ Concurrency Control
Three Phase Commit

Phase 1: Obtaining Preliminary Decision


Assumptions
Identical to 2PC Phase 1
No network partitioning
Every site is ready to commit if instructed to do so
At any point, at least one
site must be up

At most K sites (participants as


well as coordinator) can fail
Three Phase Commit

In
Phase 2 of 2PC is split into 2 phases
Phase 3
In Phase 2 and Phase 3 of 3PC

Coordinator
Phase 2 Sends

commit/abort message
Makes Coordinator to all participating sites
and
knowledge of pre-commit decision can be
a decision as in 2PC records it in used to commit despite coordinator failure
The pre-commit multiple sites
Avoids blocking problem as long as < K sites fail
decision (at least K)
Called
Concurrency Control

In We assume that each site participates in the execution of a


commit protocol to ensure global transaction atomicity.
Distributed
Environment
We assume all replicas of any item are updated
use
Modify
concurrency
control schemes
Single Lock Manager Approach
When

System It sends a transaction needs


to lock a data item
Maintains

a lock request to Si and


A Single Lock Manager

That resides in lock manager determines whether


the lock can be granted immediately
a single chosen site if
Yes
Si No lock manager sends a message to
the site which initiated the request
request is delayed until it can be granted, at which
time a message is sent to the initiating site
Single Lock Manager Approach

❏ The transaction can read the data item from anyone of the sites at
which a replica of the data item resides.

❏ Writes must be performed on all replicas of a data item

Advantages Disadvantages

★ Simple implementation ★ Bottleneck: lock manager site becomes a bottleneck

★ Simple deadlock handling ★ Vulnerability: system is vulnerable to lock manager


site failure.
Distributed Lock Manager

is
Functionality
of locking
Advantages
implemented
work is distributed and can be made robust
to failures
lock managers by
Disadvantages
at
deadlock detection is more complicated
each site

❏ Lock managers control access to local


data items
❏ But special protocols may be used
❏ Review of Last Session

❏ Biased Protocol
Outline
❏ Quorum Consensus Protocol
Distributed Lock Manager

Variant

Approaches
Majority
Primary protocol
copy

Quorum
Consensus
Biased
protocol
Distributed Lock Manager

Primary Copy ★ Site containing the replica is called the


To be the primary site for that data item

Data item ★ Different data items can have different


primary sites

of When a transaction needs to lock a data item Q, it


requests a lock at the primary site of Q

Choose one Implicitly gets lock on all replicas of the data item
replica
Distributed Lock Manager

Primary Copy

Benefits Drawbacks

Concurrency control for replicated


data handled similarly to If the primary site of Q fails, Qis
unreplicated data -simple inaccessible even though other sites
implementation. containing a replica may be accessible.
Distributed Lock Manager

Majority Protocol

❏ Local lock manager at each site administers lock and unlock requests for data
items stored at that site.

❏ When a transaction wishes to lock an unreplicated data item Qresiding at site


Si, a message is sent to Si‘s lock manager.
if then

Q is locked in an incompatible mode the request is delayed until it can be


granted.
❏ When the lock request can be granted, the lock
manager sends a message back to the initiator
indicating that the lock request has been granted.
Cont.
Distributed Lock Manager

Majority Protocol
In case of replicated data

If Q is replicated at n sites, then a lock request message


must be sent to more than half of the n sites in which
Qis stored.

The transaction does not operate on Q until it has


obtained a lock on a majority of the replicas of Q.

When writing the data item, transaction performs


writes on all replicas.
Cont.
Distributed Lock Manager

Majority Protocol

Benefits Drawbacks

Requires 2(n/2 + 1) messages for handling lock


Can be used even when some sites
requests, and (n/2 + 1) messages for handling
are unavailable
unlock requests
Potential for deadlock even with single item
details on how handle writes in
-e.g., each of 3 transactions may have locks on
the presence of site failure later
1/3rd of the replicas of a data
Distributed Lock Manager

Biased Protocol

❏ Local lock manager at each site as in majority protocol, however, requests for
shared locks are handled differently than requests for exclusive locks.

Shared locks When a transaction needs to lock data item Q, it simply requests a lock
on Q from the lock manager at one site containing a replica of Q.

Exclusive locks When transaction needs to lock data item Q, it requests a lock on Q
from the lock manager at all sites containing a replica of Q.
Benefits Drawback
imposes less overhead on
additional overhead on
read operations.
writes
Distributed Lock Manager

Quorum Consensus Protocol


choose
❏ A generalization of both majority and biased protocols Two values
❏ Each site is assigned a weight.
Be the
S

total of all site weights Read quorum Write quorum


Qr Qw
❏ Such that Qr +Qw > S and 2 *Qw> S
❏ Quorums can be chosen (and S computed)
separately for each item
Cont. Distributed Lock Manager

Quorum Consensus Protocol

❏ Each read must lock enough replicas that the sum of the site weights is >= Qr

❏ Each write must lock enough replicas that the sum of the site weights is >= Qw

For now we assume all replicas are written

Extensions to allow some sites to be unavailable described later


Timestamp

based
TimeStamp Each Transaction
Concurrency-Control Must be given
Protocol
Unique

Used in TimeStamp

Distributed Systems
Timestamp
How to generate
timestamp

❏ Each site generates a unique local timestamp


using either a logical counter or the local clock.
in
❏ Global unique timestamp is obtained by
concatenating the unique local timestamp with the
Distributed Fashion
unique identifier.
Timestamp

with To fix this problem


A Site
❏ Define within each site Si a logical clock(LCi),
Slow which generates the unique local timestamp
Clock
❏ Require that Si advance its logical clock
whenever a request is received from a
Smaller Will assign transaction Ti with timestamp < x,y> and x is
TimeStamp greater that the current value of LCi.

Still logically correct: ❏ In this case, site Si advances its logical clock to
serializability not affected the value x+ 1.

But: “disadvantages”
transactions
Deadlock Handling

Consider the following two transactions and history, with item X and
transaction T1 at site 1, and item Y and transaction T2 at site 2

Result: deadlock which


cannot be detected locally
at either site
Deadlock Handling

Consider the following two transactions and history, with item X and
transaction T1 at site 1, and item Y and transaction T2 at site 2

Result: deadlock which


cannot be detected locally
at either site
Centralized approach

❏ A global wait-for graph is constructed and maintained in a single site; the


deadlock-detection coordinator
Real graph
Real, but unknown, state of the system.

Constructed graph Approximation generated by the controller during the


execution of its algorithm
the global wait-for graph can be constructed when
if the coordinator finds a cycle, it
❏ a new edge is inserted in or removed from one of selects a victim and notifies all sites.
the local wait-for graphs The sites roll back the victim
transaction.
❏ a number of changes have occurred in a local
wait-for graph.
❏ the coordinator needs to invoke cycle-detection.
Local and Global Wait-For Graphs

Site 1 Site 2

Global
Example Wait-For Graph for False Cycles

Initial State

Site 1
Site 2

Coordinator
Cont. False Cycles

Suppose that starting from the state shown in figure,


1. T2 releases resources at S1

2. then T2 requests a resource held by T3 at site S2

Suppose further that the insert message reaches before the delete message
❏ The coordinator would then find a false cycle this can happen due to network delays

❏ The false cycle above never existed in reality.


❏ False cycles cannot occur if two-phase locking is
used
Unnecessary rollback

Unnecessary rollbacks may result when deadlock has indeed occurred and a victim has
been picked, and meanwhile one of the transactions was aborted for reasons unrelated
to the deadlock.

Unnecessary rollbacks can result from false cycles in the global wait-for graph; however,
likelihood of false cycles is low.

You might also like