0% found this document useful (0 votes)

9 views47 pages

Distributed Query Optimization Techniques

ddb unit 2

Uploaded by

nomulaashok0609

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views47 pages

Distributed Query Optimization Techniques

ddb unit 2

Uploaded by

nomulaashok0609

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Distributed query Optimization

• Query Optimization
• Centralized Query Optimization
• Join Ordering
• Distributed Query Optimization Algorithm
Query Optimization

• Query optimization is a crucial and difficult part of the overall query processing.
Objective of query optimization is a minimize the cost function: I/O cost + CPU
cost + Communication cost.

• Query optimization tries to minimize the response time for a given query
language and max of query types in a given system environment.

• Activity of choosing an efficient execution strategy for processing query. As there

are many equivalent transformations of same high-level query, aim of query
optimization is to choose one that minimizes resource usage.
Query Optimization
(Cont.)
Components of Distributed Query Optimization
1. Access Method
• Table can be accessed by 2 different ways:
a) Complete scanning the entire table
b) Using an index
2. Join Criteria
• If more than one table is accessed, the manner in which they are to be
joined together must be determined. Usually will provide several
different methods of joining tables.
3. Transmission Costs
• If data from multiple sites must be joined to satisfy a single query, then the
cost of transmitting the results from intermediate steps needs to
be factored into the equation.
Search
Space
Search
Space
• Query execution plans are typically abstracted by means of
operator trees which define the order in which the operations are
executed.
• Search space characterized by alternative execution
• Focus on join trees

• For N relations, there are @N( equivalent join trees that can
be obtained by applying commutatively and associatively rules
Search Space
(Cont.)
Query optimizers typically restrict the size of the search space they
consider.
The first restriction is to use heuristics.

The most common heuristic is to perform selection and projection when

accessing base relations.
Perform unary operations before binary operations.
The second restriction on shape of the join tree.

A linear tree is a tree such that at least one operand of each operator node is a base relation.

Abushy tree is more general and may have operators with no base
relations as operands
Search Space
(Cont.)
Search Strategy
• How to “move” in the search space.
• Deterministic algorithm
• Start from base relations and build plans by adding one relation at each step
• Dynamic programming: breadth-first
• Greedy: depth-first
• Small relations

Randomized algorithm
 Search for optimality s around a particular starting point Do not guarantee that the best
solution is obtained
 But avoid the high cost of optimization (terms of memory and time) Trade
optimization time for execution time
 Better when > 10 relations
Search Strategy (Cont.)

• Deterministic

R1 R1 R1

Step
1
Search Strategy (Cont.)

• Randomized

R, R2 R R3
1
Cost Functions

Distribution Cost Model

• Total Time (or Total Cost)
• Reduce each cost (in terms of time) component individually
• Do as little of each cost component as possible
• Optimizes the utilization of the resources

Increases system throughput

Distribution Cost Model
(Cont.)
• Local Processing Time
• CPU time and I/O time

• Communication Time
• Fixed time to initiate a message and Time to transmit the data

• Response Time
• Do as many things as possible in parallel
• May increase total time because of increased total activity
Cost based Query Optimization for Distributed Database

• Summation of all cost factor

• Total cost= CPU cost+1/0 cost+ communication cost
• CPU cost = unit instruction cost * no. of instructions
• 1/0 cost = unit disk 1/0 cost * no. of disk I/0s
• communication cost= message initiation+ transmission
• Response time = CPU time + I/O time + Communication time
• CPU time = unit instruction time * number of sequential instructions
• I/O time = unit I/O time * number of sequential I/0s

• Communication time = (unit message initiation time * number of sequential

message) + (unit transmission time * number of sequential bytes)
Steps in a Cost-based Query Optimization
1. Parsing
2. Transformation
3. Implementation
4. Plan selection based on cost estimates
Steps in a Cost-based Query Optimization
(Cont.)

• Query parser:
• Verify validity of the SQL statement. Translate query into an internal structure
using relational calculus.

• Query Optimizer:
• Find the best expression from various different algebraic expressions. Criteria
used is “ Cheapness ”.

• Code generator:
• Make calls for the query processor as a result of the work done by the
optimizer.
Steps in a Cost-based Query Optimization(Cont.)
• Query processor:
• Execute the calls obtained from the code generator

• When there is an interactive user query, the query goes through the query
parser, query optimizer, code generator and query processor each time.
• Embedded query:
• When there is an embedded query, the query does not have to through
the query parser, query optimizer, code generator, and the query processor
each time. In an embedded query, the calls generated by the code generator
are stored in the database.
Total Cost Factors

• Wide area network

• Message initiation and transmission costs high

• Local processing cost is law (fast mainframes or minicomputers)

• Ratio of communication to I/O costs= 20:1

• Local area networks

• Communication and local processing casts are more or lex equal

• Ratio= 1:1.6
Centralized Query Optimization
• In this section we present the main query optimization techniques for centralized

• systems. This presentation is a prerequisite to understanding distributed

query optimization
• for three reasons. First, a distributed query is translated into local queries,

• each of which is processed in a centralized way. Second, distributed query

optimization
• techniques are often extensions of the techniques for centralized systems.
• Finally, centralized query optimization is a simpler problem; the minimization of
• communication costs makes distributed query optimization more complex.
• the optimization timing, which can be dynamic, static
Centralized Query Optimization (Cont.)

• In a centralized system, query processing is done with the following

aim —
• Minimization of response time of query (time taken to produce the
results to user's query).
• Maximize system throughput (the number of requests that are
processed in a given amount of time).
• Reduce the amount of memory and storage required for processing.
• Increase parallelism.
Centralized Query Optimization (Cont.)

• Centralized query optimization has divide into 3 approaches:

• Static Query Optimization
• Dynamic Query Optimization
• Hybrid Query Optimization
Dynamic Query Optimization
• Dynamic query optimization combines the two phases of
query decomposition and

• optimization with execution. The QEP is dynamically constructed

by the query

• optimizer which makes calls to the DBMS execution engine

for executing the query's
• operations. Thus, there is no need for a cost model.

• The most popular dynamic query optimization algorithm is that

of INGRES one of the first relational DBMS.
Dynamic Query Optimization (Cont.)

• The algorithm recursively breaks up a query expressed in relational

calculus (i.e., SQL) into smaller pieces which are executed along the
way. The query is first decomposed into a sequence of queries having
a unique relation in common. Then each monorelation query is
processed by selecting, based on the predicate, the best access
method to that relation (e.g., index, sequential scan).
• Advantage
• All information required to select an optimum strategy is up to date

• Disadvantage
• The performance of the query is affected because the query has to be parsed, validated and
optimized before it can be executed.
Dynamic Query Optimization (Cont.)

INGRES Algorithm:
• Que.
• To illustrate the detachment technique, we apply it to the following
query: “Names of employees working on the CAD/CAM project”
• Ans.
• This query can be expressed in SQL by the following query q1 on
the engineering database
Dynamic Query Optimization (Cont.)

• After detachment of the selections, query q1 is replaced by q11

followed by q* where JVAR is an intermediate relation.

• The successive detachments of q” may generate

Dynamic Query Optimization (Cont.)

• Note that other sub queries are also possible.

• Thus query q1 has been reduced to the subsequent queries q11 -> q12 -
> q13. Query q11 is monorelation and can be executed. However,
q12 and q13 are not monorelation and cannot be reduced by detachment.
Static Query Optimization

• Simple (i.e., mono-relation) queries are executed according to the best access
path
• Execute joins
• Determine the possible ordering of joins
• Determine the cost of each ordering
• Choose the join ordering with minimal cost
Static Query Optimization (Cont.)

• For joins, two alternative algorithms :

• Nested loops
for each tuple of external relation (cardinality n,)
for each tuple of erno/ relation (cardinality n2) join two tuples if the
join predicate is true
end end
• Complexity: n,* n2
• Merge join
sort relations
merge relations
Static Query Optimization (Cont.)

• Names of employees working on the CAD/CAM project

• Assume
• EMP has an index on ENO,
• ASG has an index on PNO,
• PROJ hasan index on PNO and an index on PNAME
Hybrid Query Optimization

• It is mixer of static and dynamic approaches; the approach is mainly

static, but dynamic query optimization may take place when high
difference between predicated and actual sizes is detected
Join Ordering in Distributed Queries

• The distributed join is a query operator that combines two relations

stored at different sites in the following way: each tuple from the first
relation is concatenated with each tuple from the second relation that
satisfies a given join condition, e.g., the match in two attributes.

• The main characteristics of a distributed join is that at least one of the

operand relations has to be transferred to another site.
Join Ordering in Distributed Queries (Cont.)
• Two basic approaches exist to order joins in distributed queries. One tries to
optimize the ordering of joins directly, whereas the other replaces joins
by combinations of semi joins in order to minimize communication costs.
• Let us first concentrate on the simple problem of operand transfer in a single join.
Consider 2 relations only.

If size(R) < Size(S)

If size(R) > Size(S)

Join Ordering in Distributed Queries (Cont.)

• The query is R l>1 S, where R and S are relations stored at diPerent sites. The
obvious choice of the relation to transfer is to send the smaller relation to the site of the
larger one.
• Multiple relations more difficult because too many alternatives.

• Compute the cost of all alternatives and select the best one.

• Necessary to compute the size of intermediate relations which is difficult.

• Use heuristics
Join Ordering- Example
Join Ordering- Example (Cont.)

• Note that we have made certain assumptions about the locations of the
three relations. This query can be executed in at least five different
ways. We describe these strategies by the following programs,
where (R Osite j) stands for “relation R is transferred to site j.”
Join Ordering- Example (Cont.) Site
2
ASG

EN PNO
O

Site 1 PROJ Site 3

EMP
Join Ordering- Example (Cont.)
• To select one of these programs, the following sizes must be known or predicted:
size(EMP), size(ASG), size(PROJ), size(EMP ASG), and size(ASG PROJ).
Furthermore, if it is the response time that is being considered, the optimization
must take into account the fact that transfers can be done in parallel with strategy
5.

• An alternative to enumerating all the solutions is to use heuristics that consider

only the sizes of the operand relations by assuming, for example, that the
cardinality of the resulting join is the product of operand cardinalities. In this
case, relations are ordered by increasing sizes and the order of execution is given by
this ordering and the join graph. For instance, the order (EMP, ASG, PROJ) could
use strategy 1, while the order (PROJ, ASG, EMP) could use strategy 4.
Semijoi-based Algorithm
• In this section we show how the semijoin operation can be used to decrease
the total time of join queries. The main shortcoming of the join
approach described in the preceding section is that entire operand
relations must be transferred between sites. The semijoin acts as a size
reducer for a relation much as a selection does.

• The join of two relations R and S over attribute A, stored at sites 1 and
2, respectively, can be computed by replacing one or both operand
relations by a semijoin with the other relation, using the following rules:
Semijoi-based Algorithm (Cont.)

• Consider the following relations, where attribute CITY has been added
to relations EMP (renamed ET), PROJ (renamed PT) and ASG
(renamed AT) of the engineering database. Attribute CITY of AT
corresponds to the city where the employee identified by ENO lives.
Semijoi-based Algorithm (Cont.)

• Transformation of Cyclic Query

Join Vs Semijoin Approaches
Distributed Query Optimization

• Distributed query optimization requires evaluation of a large number of

query trees each of which produce the required results of a query. This is
primarily due to the presence of large amount of replicated and fragmented
data. Hence, the target is to find an optimal solution instead of the best
solution.
• The main issues for distributed query optimization are —
• Optimal utilization of resources in the distributed system.
• Query trading.
• Reduction of solution space of the query.
Distributed Query Optimization (Cont.)
1. The optimization timing is dynamic for distributed INFRES, while it is static for
the others.

2. The objective function of SDD-1 and R* is to minimize total time, while

distributed INGRES aims at decreasing a combination of response time and
total time.

3. The optimization factors of the cost function are the message size for SDD-1.
system R*, which takes into account local processing time, uses message size,
number of message , I/O and CPU costs. Distributed INGRES considers both
message size and local processing time.

4. The network topology is assumed to be a wide area point-to-point network by

SDD-1. Distributed INGRES and R* can work in both local and wide area
networks.
Distributed Query Optimization
(Cont.)
5. The use of semi joins as a query optimization technique is employed by SDD-1.
Distributed INGRES and R* perform joins in a fashion similar to that of
centralized query optimization algorithms of their counterparts: INGRES and
system R.

6. Each algorithm assumes statistical information about the data. Semijoi

algorithms typically use more information.

7. INGRES can handle fragments.

SDD-1 Algorithm
• Refinements of an initial feasible solution are recursively computed until
no more cost improvement can be made. It is devised for wide area point
to point network.
• The cost of transferring the result to final site is ignored.

• It is based on hill climbing algorithm. It's a greedy approach it finds the

local minimum and iteratively tries to improve it. It may not reach global
solution all the time.
SDD-1 Algorithm
(Cont.)

• SDD-1 algorithm has 3 important characteristics as follows:

• It uses the semi-join operation to handle strategy.
• The relationship of the whole sites is not repetitive and fragmented.
• During price estimation of the whole algorithm, the transmission cost to
the starting site is not calculated.
SDD-1 Algorithm (Cont.)

• The basic algorithm is as follows:

1. Evaluate the income of all semi-join programs in the query graph.

2. Choose the semi-join with maximum income, execute it and recount the
income of all the semi-join.
3. Execute step 2 circularly until all of the semi-join value of whose income is
greater than 0 has been executed.

4. Select the site with minimum communication cost as the last executing
site. Transmit all the relationship to the site and connect them to got the ultimate
result.

Layers of Query Processing in DBMS
No ratings yet
Layers of Query Processing in DBMS
24 pages
Survey of Distributed Query Optimization
No ratings yet
Survey of Distributed Query Optimization
10 pages
Optimization of Distributed Queries
No ratings yet
Optimization of Distributed Queries
45 pages
Distributed Query Optimization Techniques
No ratings yet
Distributed Query Optimization Techniques
4 pages
Distributed Query Optimization Strategies
86% (7)
Distributed Query Optimization Strategies
48 pages
Distributed Query Processing Strategies
No ratings yet
Distributed Query Processing Strategies
15 pages
Query Optimization in Distributed Databases
No ratings yet
Query Optimization in Distributed Databases
7 pages
Distributed Database Query Optimization
No ratings yet
Distributed Database Query Optimization
106 pages
Query Optimization in DDBMS
No ratings yet
Query Optimization in DDBMS
7 pages
Query Processing in Distributed Databases
No ratings yet
Query Processing in Distributed Databases
48 pages
ADB CH 2 Query Processing and Optimization STV 2 1
No ratings yet
ADB CH 2 Query Processing and Optimization STV 2 1
56 pages
Query Processing in Advanced Databases
No ratings yet
Query Processing in Advanced Databases
27 pages
Distributed Query Optimization Factors
No ratings yet
Distributed Query Optimization Factors
10 pages
Query Processing Layers and Characterization
No ratings yet
Query Processing Layers and Characterization
28 pages
Distributed Query Processing Overview
No ratings yet
Distributed Query Processing Overview
8 pages
Query Decomposition in CSE 453
No ratings yet
Query Decomposition in CSE 453
72 pages
7 ddb07
No ratings yet
7 ddb07
46 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
21 pages
Optimizing Queries in Distributed Databases
No ratings yet
Optimizing Queries in Distributed Databases
4 pages
C4 - Decomposing Global Queries Into Fragment Queries
No ratings yet
C4 - Decomposing Global Queries Into Fragment Queries
78 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
5 pages
Understanding Distributed Transactions
No ratings yet
Understanding Distributed Transactions
17 pages
Lecture 7 - Query Processing
No ratings yet
Lecture 7 - Query Processing
26 pages
RDBMS Query Optimization Techniques
No ratings yet
RDBMS Query Optimization Techniques
11 pages
Database Query Optimization Techniques
No ratings yet
Database Query Optimization Techniques
53 pages
Heuristic Query Optimization Techniques
No ratings yet
Heuristic Query Optimization Techniques
6 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
11 pages
Query Processing and Optimization Overview
No ratings yet
Query Processing and Optimization Overview
28 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
30 pages
Query Optimization in RDBMS Explained
No ratings yet
Query Optimization in RDBMS Explained
12 pages
Query Processing in Distributed Databases
No ratings yet
Query Processing in Distributed Databases
11 pages
Query Processing and Optimization Overview
No ratings yet
Query Processing and Optimization Overview
18 pages
Adbms Unit1
No ratings yet
Adbms Unit1
16 pages
Unit-04 DBMS Notes
No ratings yet
Unit-04 DBMS Notes
131 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
36 pages
Query Optimization Techniques in DBMS
No ratings yet
Query Optimization Techniques in DBMS
20 pages
Distributed Query Processing Overview
No ratings yet
Distributed Query Processing Overview
31 pages
Distributed Query Processing Overview
No ratings yet
Distributed Query Processing Overview
69 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
18 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
127 pages
Distributed Database Query Optimization
No ratings yet
Distributed Database Query Optimization
36 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
99 pages
Unit-4 DBMS
No ratings yet
Unit-4 DBMS
130 pages
Equivalence Rules and Parsing in DBMS
No ratings yet
Equivalence Rules and Parsing in DBMS
33 pages
Query Processing in DBMS Explained
No ratings yet
Query Processing in DBMS Explained
48 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
16 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
94 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
10 pages
Pipelining in Query Processing
No ratings yet
Pipelining in Query Processing
12 pages
Query Optimization in Relational Databases
No ratings yet
Query Optimization in Relational Databases
8 pages
Query Optimization in Distributed Databases
No ratings yet
Query Optimization in Distributed Databases
42 pages
Query Processing and Optimization Steps
100% (1)
Query Processing and Optimization Steps
43 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
47 pages
Global Query Optimization Techniques
No ratings yet
Global Query Optimization Techniques
46 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
33 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
31 pages
Data Mining Exam Questions 2023
No ratings yet
Data Mining Exam Questions 2023
3 pages
Elementary Algorithms PDF
No ratings yet
Elementary Algorithms PDF
642 pages
Understanding Recursion in C Programming
No ratings yet
Understanding Recursion in C Programming
23 pages
Insert Node in Sorted Linked List
No ratings yet
Insert Node in Sorted Linked List
4 pages
DSA Laboratory Course Overview
No ratings yet
DSA Laboratory Course Overview
2 pages
AI State Space Search Strategies
No ratings yet
AI State Space Search Strategies
90 pages
C++ Singly Linked List Implementation
No ratings yet
C++ Singly Linked List Implementation
3 pages
Understanding Circular Queues
No ratings yet
Understanding Circular Queues
2 pages
Stock Exchange Lab Report
No ratings yet
Stock Exchange Lab Report
24 pages
Tree Traversal and Node Structure
No ratings yet
Tree Traversal and Node Structure
48 pages
Sorted List Operations in C++
No ratings yet
Sorted List Operations in C++
27 pages
Java Programs for Smith Numbers and Data Structures
No ratings yet
Java Programs for Smith Numbers and Data Structures
28 pages
Data Structures MCQs with Answers
No ratings yet
Data Structures MCQs with Answers
6 pages
CS2800 Problem Set 2: Algorithms Analysis
No ratings yet
CS2800 Problem Set 2: Algorithms Analysis
2 pages
Efficient Algorithm Design Course Outline
No ratings yet
Efficient Algorithm Design Course Outline
6 pages
Introduction to Linked Lists in Data Structures
No ratings yet
Introduction to Linked Lists in Data Structures
36 pages
Sorting Algorithms and Knapsack Solutions
No ratings yet
Sorting Algorithms and Knapsack Solutions
18 pages
B-Tree Operations and Sorting Explained
No ratings yet
B-Tree Operations and Sorting Explained
46 pages
Tree Data Structure Overview and Types
No ratings yet
Tree Data Structure Overview and Types
31 pages
Parallel Processing Concepts Overview
No ratings yet
Parallel Processing Concepts Overview
10 pages
Circular Doubly Linked Lists in C
No ratings yet
Circular Doubly Linked Lists in C
3 pages
DSA Syllabus Overview for Final Exam
No ratings yet
DSA Syllabus Overview for Final Exam
8 pages
Backtracking and Branch-and-Bound Algorithms
No ratings yet
Backtracking and Branch-and-Bound Algorithms
37 pages
Wbest03 Marys Radiation
No ratings yet
Wbest03 Marys Radiation
3 pages
Trees and Graphs: Lecture Notes
0% (1)
Trees and Graphs: Lecture Notes
161 pages
Interior Point Method Overview
No ratings yet
Interior Point Method Overview
64 pages
C++ Data Structures: Stacks, Queues, Trees
No ratings yet
C++ Data Structures: Stacks, Queues, Trees
14 pages
2 Marks With Ans
No ratings yet
2 Marks With Ans
39 pages
CPU Scheduling Algorithms Explained
No ratings yet
CPU Scheduling Algorithms Explained
5 pages
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
32 pages

Distributed Query Optimization Techniques

Uploaded by

Distributed Query Optimization Techniques

Uploaded by

Distributed query Optimization

• Activity of choosing an efficient execution strategy for processing query. As there

The most common heuristic is to perform selection and projection when

Distribution Cost Model

Increases system throughput

• Summation of all cost factor

• Communication time = (unit message initiation time * number of sequential

• Wide area network

• Message initiation and transmission costs high

• Local processing cost is law (fast mainframes or minicomputers)

• Ratio of communication to I/O costs= 20:1

• Local area networks

• Communication and local processing casts are more or lex equal

• systems. This presentation is a prerequisite to understanding distributed

• each of which is processed in a centralized way. Second, distributed query

• In a centralized system, query processing is done with the following

• Centralized query optimization has divide into 3 approaches:

• optimization with execution. The QEP is dynamically constructed

• optimizer which makes calls to the DBMS execution engine

• The most popular dynamic query optimization algorithm is that

• The algorithm recursively breaks up a query expressed in relational

• After detachment of the selections, query q1 is replaced by q11

• The successive detachments of q” may generate

• Note that other sub queries are also possible.

• For joins, two alternative algorithms :

• Names of employees working on the CAD/CAM project

• It is mixer of static and dynamic approaches; the approach is mainly

• The distributed join is a query operator that combines two relations

• The main characteristics of a distributed join is that at least one of the

If size(R) < Size(S)

If size(R) > Size(S)

• Necessary to compute the size of intermediate relations which is difficult.

Site 1 PROJ Site 3

• An alternative to enumerating all the solutions is to use heuristics that consider

• Transformation of Cyclic Query

• Distributed query optimization requires evaluation of a large number of

2. The objective function of SDD-1 and R* is to minimize total time, while

4. The network topology is assumed to be a wide area point-to-point network by

6. Each algorithm assumes statistical information about the data. Semijoi

7. INGRES can handle fragments.

• It is based on hill climbing algorithm. It's a greedy approach it finds the

• SDD-1 algorithm has 3 important characteristics as follows:

• The basic algorithm is as follows:

You might also like