Distributed Database Concepts Explained

The document discusses distributed databases (DDB) and their management systems (DDBMS), emphasizing concepts such as data fragmentation, sharding, replication, and query processing. It explains horizontal and vertical fragmentation methods for distributing data across sites, as well as the complexities involved in query optimization and data allocation. Additionally, it categorizes types of distributed database systems based on homogeneity and local autonomy, highlighting federated and multidatabase systems.

Uploaded by

Abenezer Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views19 pages

Distributed Database Concepts Explained

Uploaded by

Abenezer Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Chapter Two

Distributed Database
Concepts
Distributed Database Concepts
A distributed database (DDB) is a collection of
multiple logically interrelated databases distributed
over a computer network.
A distributed database management system
(DDBMS) is a software system that manages a
distributed database while making the distribution
transparent to the user.
Data Fragmentation and Sharding
 The simplest logical units are the relations
themselves; that is, each whole relation is to be
stored at a particular site.
 For example, we must decide on a site to store
each of the relations EMPLOYEE, DEPARTMENT,
PROJECT, WORKS_ON, and DEPENDENT in
COMPANY DATABASE.
 In many cases, however, a relation can be divided
into smaller logical units for distribution.
 For example, assume that there are three computer
sites—one for each department in the company.
 We may want to store the database information
relating to each department at the computer site
for that department.
Cont ..
A horizontal fragment or shard of a relation is a
subset of the tuples in that relation.
 The tuples that belong to the horizontal fragment
can be specified by a condition on one or more
attributes of the relation, or by some other
mechanism.
 Often, only a single attribute is involved in the
condition.
 For example, we may define three horizontal
fragments on the EMPLOYEE with the following
conditions: (Dno = 5), (Dno = 4), and (Dno = 1)—
each fragment contains the EMPLOYEE tuples
working for a particular department.
 Derived horizontal fragmentation applies the
partitioning of a primary relation (DEPARTMENT in
Cont ..
 Horizontal fragmentation divides a relation
horizontally by grouping rows to create subsets of
tuples, where each subset has a certain logical
meaning.
 These fragments can then be assigned to different
sites (nodes) in the distributed system.
 Each site may not need all the attributes of a
relation, which would indicate the need for a
different type of fragmentation.
 Vertical fragmentation divides a relation “vertically”
by columns.
 A vertical fragment of a relation keeps only certain
attributes of the relation.
 For example, we may want to fragment the
EMPLOYEE relation into two vertical fragments.
Cont ..
 This vertical fragmentation is not quite proper,
because if the two fragments are stored separately,
we cannot put the original employee tuples back
together since there is no common attribute
between the two fragments.
 It is necessary to include the primary key or some
unique key attribute in every vertical fragment so
that the full relation can be reconstructed from the
fragments.
 Hence, we must add the Ssn attribute to the
personal information fragment.
 We can intermix the two types of fragmentation,
yielding a mixed fragmentation.
 For example, we may combine the horizontal and
vertical fragmentations of the EMPLOYEE relation
Cont ..
The following figures show horizontal and vertical
fragments from primary relation-departement and
secondary relation-employee, for departement 5 and
4, which can be stored at different sites.
Data Replication and Allocation
 If a fragment is stored at more than one site, it is
said to be replicated.
 Each fragment—or each copy of a fragment—must
be assigned to a particular site in the distributed
system.
 This process is called data distribution (or data
allocation).
 The choice of sites and the degree of replication
depend on the performance and availability goals
of the system and on the types and frequencies of
transactions submitted at each site.
 For example, if high availability is required,
transactions can be submitted at any site, and
most transactions are retrieval only, a fully
replicated database is a good choice.
Cont ..
 Data that is accessed at multiple sites can be
replicated at those sites.
 If many updates are performed, it may be useful to
limit replication.
 Finding an optimal or even a good solution to
distributed data allocation is a complex
optimization problem.
 The next figure shows one example of data
distribution and replication among distributed
databases based on the company database.
Cont ..
Query Processing and Optimization
in Distributed Databases
 In a distributed system, several additional factors
further complicate query processing.
 The first is the cost of transferring data over the
network.
 This data includes intermediate files that are
transferred to other sites for further processing, as
well as the final result files that may have to be
transferred to the site where the query result is
needed.
 Although these costs may not be very high if the
sites are connected via a high-performance local
area network, they become significant in other
types of networks.
 Hence, DDBMS query optimization algorithms
Cont ..
 For example, suppose that the EMPLOYEE (10,000
records, 100 bytes long each) and DEPARTMENT
(100 records, 35 bytes long each) relations are
distributed at site one and two resepectively.
 We will assume in this example that neither relation
is fragmented.
 The size of the EMPLOYEE relation is 100 * 10,000 =
106 bytes, and the size of the DEPARTMENT relation
is 35 * 100 = 3,500 bytes.
 Consider the query Q: For each employee, retrieve
the employee name and the name of the
department for which the employee works.
 This can be stated as follows in the relational
algebra:
Cont ..
 The result of this query will include 10,000 records,
assuming that every employee is related to a
department.
 Suppose that each record in the query result is 40
bytes long.
 The query is submitted at a distinct site 3, which is
called the result site because the query result is
needed there.
 Neither the EMPLOYEE nor the DEPARTMENT
relations reside at site 3.
 There are three simple strategies for executing this
distributed query:
 1. Transfer both the EMPLOYEE and the
DEPARTMENT relations to the result site, and
perform the join at site 3.
Cont ..
 2. Transfer the EMPLOYEE relation to site 2, execute
the join at site 2, and send the result to site 3.
 The size of the query result is 40 * 10,000 =
400,000 bytes, so 400,000 + 1,000,000 =
1,400,000 bytes must be transferred.
 3. Transfer the DEPARTMENT relation to site 1,
execute the join at site 1, and send the result to
site 3. In this case, 400,000 + 3,500 = 403,500
bytes must be transferred.
 If minimizing the amount of data transfer is our
optimization criterion, we should choose strategy 3.
 With distributed query processing using the
semijoin operation, we can reduce the number of
tuples in a relation before transferring it to another
site.
Cont ..
 Intuitively,the idea is:
 1. Project the join attributes of one relation R
(DEPARTMENT in our example) at site 2 and send
the joining column of R (DEPARTMENT) to site 1
where the other relation S (EMPLOYEE in our
example) is located;
 2. This transferred column file is then joined with S
(EMPLOYEE) at site 1.
 Following that, the join attributes, along with the
attributes required in the result, are projected out
and transferred back to site 2.
 3. Join the transferred file with R (DEPARTMENT)
and present the result to the user at site 2.
Cont ..
 Hence, only the joining column of R (DEPARTMENT)
is transferred in one direction, and
 A subset of S (EMPLOYEE) with no extraneous
tuples or attributes is transferred in the other
direction.
 We limited the EMPLOYEE attributes and tuples
transmitted to site 2 in step 2 to only those that will
actually be joined with a DEPARTMENT tuple in step
3.
 For query Q, this turned out to include all
EMPLOYEE tuples, so little improvement was
achieved.
 If only a small fraction of the tuples in S
(EMPLOYEE) participate in the join, this can be an
efficient solution to minimizing data transfer.
Types of Distributed Database Systems
 There are criteria and factors that make DDBMSs
different.
 The first factor is the degree of homogeneity of the
DDBMS software.
 If all servers (or individual local DBMSs) use
identical software and all users (clients) use
identical software, the DDBMS is called
homogeneous; otherwise, it is called
heterogeneous.
 Another factor related to the degree of
homogeneity is the degree of local autonomy.
 If there is no provision for the local site to function
as a standalone DBMS, then the system has no
local autonomy.
 On the other hand, if direct access by local
Cont ..
 For a centralized database, there is complete
autonomy but a total lack of distribution and
heterogeneity.
 But, the degree of local autonomy provides further
ground for classification into federated and
multidatabase systems.
 In such systems, each server is an independent and
autonomous centralized DBMS that has its own
local users, local transactions, and DBA, and hence
has a very high degree of local autonomy.
 The term federated database system (FDBS) is
used when there is some global view or schema of
the federation of databases that is shared by the
applications.
 On the other hand, a multidatabase system has full
Cont ..
 Both systems are hybrids between distributed and
centralized systems, and the distinction we made
between them is not strictly followed.
 We will refer to them as FDBSs in a generic sense.
 There is also a system with full local autonomy and
full heterogeneity—this could be a peer-to-peer
database system.
 In a heterogeneous FDBS, one server may be a
relational DBMS, another a network DBMS (such as
Computer Associates’ IDMS or HP’S IMAGE/3000),
and a third an object DBMS (such as Object
Design’s ObjectStore) or hierarchical DBMS (such as
IBM’s IMS);
 In such a case, it is necessary to have a canonical
system language and to include language

Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
36 pages
Distributed Database Management Overview
No ratings yet
Distributed Database Management Overview
36 pages
Distributed Database Systems Overview
No ratings yet
Distributed Database Systems Overview
35 pages
Distributed Databases Overview and Concepts
No ratings yet
Distributed Databases Overview and Concepts
37 pages
Distributed Databases Overview and Architecture
No ratings yet
Distributed Databases Overview and Architecture
29 pages
Distributed Databases and Architectures
No ratings yet
Distributed Databases and Architectures
27 pages
Advantages and Disadvantages of DDBMS
No ratings yet
Advantages and Disadvantages of DDBMS
26 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
22 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
42 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
30 pages
Data Allocation in Distributed Databases
No ratings yet
Data Allocation in Distributed Databases
45 pages
Overview of Distributed Database Concepts
No ratings yet
Overview of Distributed Database Concepts
52 pages
Spatial and Temporal Database Concepts
No ratings yet
Spatial and Temporal Database Concepts
16 pages
Chapter 6-Distributed Database System
No ratings yet
Chapter 6-Distributed Database System
32 pages
Understanding DDB Replication and Fragmentation
No ratings yet
Understanding DDB Replication and Fragmentation
67 pages
Distributed Database Architecture Overview
No ratings yet
Distributed Database Architecture Overview
27 pages
Linking Data in Distributed Databases
No ratings yet
Linking Data in Distributed Databases
41 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
42 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
14 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
30 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
14 pages
Overview of Distributed Databases
No ratings yet
Overview of Distributed Databases
28 pages
Overview of Distributed DBMS Concepts
No ratings yet
Overview of Distributed DBMS Concepts
6 pages
UC Irvine's Distributed Database Systems
No ratings yet
UC Irvine's Distributed Database Systems
67 pages
Unit 5 - 1
No ratings yet
Unit 5 - 1
7 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
88 pages
F Edit Chapter 8
No ratings yet
F Edit Chapter 8
17 pages
Distributed Database System Overview
No ratings yet
Distributed Database System Overview
46 pages
Advanced Distributed Databases Overview
No ratings yet
Advanced Distributed Databases Overview
22 pages
Distributed Database Management Techniques
No ratings yet
Distributed Database Management Techniques
58 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
11 pages
Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Adobe Scan 02 Dec 2023
No ratings yet
Adobe Scan 02 Dec 2023
27 pages
Distributed Databases and Client-Server Architectures
No ratings yet
Distributed Databases and Client-Server Architectures
41 pages
Centralized vs Distributed Databases
No ratings yet
Centralized vs Distributed Databases
29 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
9 pages
Benefits of Data Replication & Partitioning
No ratings yet
Benefits of Data Replication & Partitioning
33 pages
Advantages and Disadvantages of DDBs
No ratings yet
Advantages and Disadvantages of DDBs
46 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
39 pages
Distributed Databases Overview and Concepts
No ratings yet
Distributed Databases Overview and Concepts
28 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
66 pages
Distributed Database System Overview
No ratings yet
Distributed Database System Overview
23 pages
Distributed Database Its Types, Fragmentation and Replication
No ratings yet
Distributed Database Its Types, Fragmentation and Replication
7 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
27 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
24 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
9 pages
Overview of Distributed Database Systems
100% (1)
Overview of Distributed Database Systems
24 pages
Types and Design of Distributed Databases
No ratings yet
Types and Design of Distributed Databases
22 pages
Chapter 6 DDBMS
No ratings yet
Chapter 6 DDBMS
41 pages
Understanding Distributed Database Systems
No ratings yet
Understanding Distributed Database Systems
24 pages
Chapter 6 (Autosaved)
No ratings yet
Chapter 6 (Autosaved)
52 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
12 pages
Distributed Database Management Design
No ratings yet
Distributed Database Management Design
37 pages
Overview of Distributed Database Systems
No ratings yet
Overview of Distributed Database Systems
21 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
67 pages
Advanced Databases Assignment Questions
No ratings yet
Advanced Databases Assignment Questions
34 pages
Understanding Distributed Databases
No ratings yet
Understanding Distributed Databases
15 pages
Cost Accounting Overview and Practices
No ratings yet
Cost Accounting Overview and Practices
169 pages
SPPC2000 Security Solutions Overview
No ratings yet
SPPC2000 Security Solutions Overview
9 pages
Corruption as Economic Grease Explained
No ratings yet
Corruption as Economic Grease Explained
10 pages
Kathmandu-Nagarkot-Pokhara Tour Package
No ratings yet
Kathmandu-Nagarkot-Pokhara Tour Package
3 pages
QR Code - Wikipedia
No ratings yet
QR Code - Wikipedia
19 pages
Community Health Nursing Overview 2024
No ratings yet
Community Health Nursing Overview 2024
120 pages
India NIRF 2025 Agriculture Rankings
No ratings yet
India NIRF 2025 Agriculture Rankings
25 pages
GSRTC E-Ticket for Gandhidham to Rajkot
No ratings yet
GSRTC E-Ticket for Gandhidham to Rajkot
1 page
Welcome To The Forest: Author: Bhavna Menon Illustrator: Kavita Singh Kale
No ratings yet
Welcome To The Forest: Author: Bhavna Menon Illustrator: Kavita Singh Kale
25 pages
ICT History Timeline in the Philippines
No ratings yet
ICT History Timeline in the Philippines
9 pages
Bayonetta 3 Cemu Shader Cache Setup
No ratings yet
Bayonetta 3 Cemu Shader Cache Setup
1 page
Understanding Amines: Types and Properties
No ratings yet
Understanding Amines: Types and Properties
25 pages
Cho Thuê Phòng Trọ Giá Rẻ TPHCM
No ratings yet
Cho Thuê Phòng Trọ Giá Rẻ TPHCM
93 pages
Hecht Lawn Mower Parts List
No ratings yet
Hecht Lawn Mower Parts List
7 pages
Additional Premium Debit Note Details
No ratings yet
Additional Premium Debit Note Details
1 page
Document Formatting and Styling Guide
No ratings yet
Document Formatting and Styling Guide
2 pages
RBI Moratorium Impact on Indian Banks
No ratings yet
RBI Moratorium Impact on Indian Banks
6 pages
Lahti Precision Belt Scales Overview
No ratings yet
Lahti Precision Belt Scales Overview
8 pages
Recovery Radio Schedule & Events Guide
No ratings yet
Recovery Radio Schedule & Events Guide
1 page
SAP SD Item Proposal Overview
No ratings yet
SAP SD Item Proposal Overview
5 pages
Mallya Extradition Case Judgment 2020
No ratings yet
Mallya Extradition Case Judgment 2020
46 pages
Defining Dishonesty in Employment
100% (1)
Defining Dishonesty in Employment
5 pages
Ruby Program Structure and Execution
No ratings yet
Ruby Program Structure and Execution
34 pages
Three-Phase Power Measurement Methods
No ratings yet
Three-Phase Power Measurement Methods
26 pages
Hazard Identification Reporting Guide
No ratings yet
Hazard Identification Reporting Guide
10 pages
Understanding the SPC-700 Sound Chip
No ratings yet
Understanding the SPC-700 Sound Chip
10 pages
Conversation Analysis and Early Childhood Education: The Co-Production of Knowledge and Relationships
100% (4)
Conversation Analysis and Early Childhood Education: The Co-Production of Knowledge and Relationships
15 pages
General Symbology for Construction Plans
No ratings yet
General Symbology for Construction Plans
1 page
Ferroptosis in Gastric Cancer Review
No ratings yet
Ferroptosis in Gastric Cancer Review
34 pages
User Acceptance Testing Expertise in SAP
No ratings yet
User Acceptance Testing Expertise in SAP
5 pages

Distributed Database Concepts Explained

Uploaded by

Distributed Database Concepts Explained

Uploaded by

Chapter Two

You might also like