0% found this document useful (0 votes)

51 views7 pages

Query Optimization in DDBMS

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views7 pages

Query Optimization in DDBMS

Uploaded by

Gamer Bhagvan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT – II – DISTRIBUTED DATABASE – KCA045

UNIT - II
QUERIES AND OPTIMAZATION
Global Queries to Fragment Queries-Equivalence Transformations for
Queries-Distributed Grouping and Aggregate Function Evaluation-
Parametric Queries-Optimization of Access Strategies-Framework for Query
Optimization-Join Queries- General Queries-Introduction to Distributed
Transactions.

Global Queries to Fragment Queries

When a query is placed, it is at first scanned, parsed and validated. An internal
representation of the query is then created such as a query tree or a query graph. Then
alternative execution strategies are devised for retrieving results from the database tables.
The process of choosing the most appropriate execution strategy for query processing is
called query optimization.

Query Optimization Issues in DDBMS

In DDBMS, query optimization is a crucial task. The complexity is high since number of
alternative strategies may increase exponentially due to the following factors −

 The presence of a number of fragments.

 Distribution of the fragments or tables across various sites.

 The speed of communication links.

 Disparity in local processing capabilities.

Hence, in a distributed system, the target is often to find a good execution strategy for query
processing rather than the best one. The time to execute a query is the sum of the following

 Time to communicate queries to databases.

 Time to execute local query fragments.

 Time to assemble data from different sites.

 Time to display results to the application.

Query Processing
Query processing is a set of all activities starting from query placement to displaying the
results of the query. The steps are as shown in the following diagram −
Figure 2.1 step in query processing

Global Query Optimization

Input: Fragment query

• Find the best (not necessarily optimal) global schedule

➡ Minimize a cost function

➡ Distributed join processing

✦ Bushy vs. linear trees

✦ Which relation to ship where?

✦ Ship-whole vs ship-as-needed

➡ Decide on the use of semijoins

✦ Semijoin saves on communication at the expense of more local
processing.

➡ Join methods

✦ nested loop vs ordered joins (merge join or hash join)

Cost-Based Optimization

• Solution space

➡ The set of equivalent algebra expressions (query trees).

• Cost function (in terms of time)

➡ I/O cost + CPU cost + communication cost

➡ These might have different weights in different distributed environments

(LAN vs WAN).

➡ Can also maximize throughput

• Search algorithm

➡ How do we move inside the solution space?

➡ Exhaustive search, heuristic algorithms (iterative improvement, simulated

annealing, genetic,…)

Query Optimization Process

Figure 2.2 Query Optimization Process

Search Space

• Search space characterized by alternative execution

• Focus on join trees

• For N relations, there are O(N!) equivalent join trees that can be obtained by applying
commutativity and associativity rules

SELECT ENAME,RESP

FROM EMP, ASG,PROJ

WHERE [Link]=[Link]

AND [Link]=[Link]

Cost Functions

• Total Time (or Total Cost)

➡ Reduce each cost (in terms of time) component individually

➡ Do as little of each cost component as possible

➡ Optimizes the utilization of the resources

Increases system throughput
• Response Time

➡ Do as many things as possible in parallel

➡ May increase total time because of increased total activity

• Summation of all cost factors

• Total cost = CPU cost + I/O cost + communication cost

• CPU cost = unit instruction cost * [Link] instructions

• I/O cost = unit disk I/O cost * no. of disk I/Os

• communication cost = message initiation + transmission

2- Step – Problem Definition

• Given

➡ A set of sites S = {s1, s2, …,sn} with the load of each site

➡ A query Q ={q1, q2, q3, q4} such that each subqueryqiis the maximum
processing unit that accesses one relation and communicates with its
neighboring queries

➡ For each qi in Q, a feasible allocation set of sites Sq={s1, s2, …,sk} where each
site stores a copy of the relation in qi

• The objective is to find an optimal allocation of Q to S such that

➡ the load unbalance of S is minimized

➡ The total communication cost is minimized

• For each q in Q compute load (Sq)

• While Q not empty do

➡ Select subquerya with least allocation flexibility

➡ Select best site b fora (with least load and best benefit)
➡ Remove a from Q and recompute loads if needed

2- Step Algorithm Example

•
Let Q = {q1, q2, q3, q4} where q1 is associated with R1, q2 is associated with R2 joined
with the result of q1, etc.

•
Iteration 1: select q4, allocate to s1, set load(s1)=2

•
Iteration 2: select q2, allocate to s2, set load(s2)=3

•
Iteration 3: select q3, allocate to s1, set load(s1) =3

•
Iteration 4: select q1, allocate to s3 or s4

Relational Algebra :
 The Relational Algebra is used to define the ways in which relations (tables) can be
operated to manipulate their data.
 This Algebra is composed of Unary operations (involving a single table) and Binary
operations (involving multiple tables).
 Join, Semi-join these are Binary operations in Relational Algebra.
Join
•
Join is a binary operation in Relational Algebra.
•
It combines records from two or more tables in a database.
•
A join is a means for combining fields from two tables by using values common to
each.
Semi-Join
•A Join where the result only contains the columns from one of the joined tables.
•Useful in distributed databases, so we don't have to send as much data over the network.
•Can dramatically speed up certain classes of queries.
What is “Semi-Join” ?
Semi-join strategies are technique for query processing in distributed database systems. Used
for reducing communication cost.
A semi-join between two tables returns rows from the first table where one or more matches
are found in the second table.
The difference between a semi-join and a conventional join is that rows in the first table will
be returned at most once. Even if the second table contains two matches for a row in the first
table, only one copy of the row will be returned.
Semi-joins are written using EXISTS or IN.

A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a conventional join:
SELECT [Link], [Link] FROM dept D, emp E WHERE [Link] = [Link]
ORDER BY [Link];
◦ A department with N employees will appear in the list N times.
◦ We could use a DISTINCT keyword to get each department to appear only once.

A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a semi-join:
SELECT [Link], [Link] FROM dept D WHERE EXISTS (SELECT 1 FROM
emp E WHERE [Link] = [Link]) ORDER BY [Link];
◦ No department appears more than once.
◦ Oracle stops processing each department as soon as the first employee in that
department is found.

Common questions

In distributed databases, semi-join operations decrease the amount of data transferred across the network by eliminating unnecessary data that doesn't match query conditions from one of the join tables . This optimization reduces communication costs significantly and speeds up processing by limiting the volume of data that needs to be processed at various sites .

The two-step algorithm aims to minimize load imbalance among sites and reduce total communication costs during query allocation . These goals are achieved by iteratively selecting subqueries with the least allocation flexibility and allocating them to sites with the least load that offer the best benefit. The process involves computing loads for feasible allocation sites and adjusting after each subquery is allocated .

Exhaustive search explores all possible execution strategies to find the optimal solution, which is computationally intensive and often impractical in distributed environments due to the large search space . In contrast, heuristic algorithms use methods like iterative improvement or simulated annealing to quickly converge on a good solution without exploring every possibility, thus offering a more feasible approach to query optimization in distributed databases .

Parallel execution in query processing aims to do as many things simultaneously as possible, effectively reducing response time . However, it may increase the total time due to the rise in total activity resulting from increased simultaneous operations . By managing the balance between parallel execution and total activity, distributed systems can optimize for faster query responses without unnecessary increases in resource usage .

Bushy trees allow multiple joins to be processed in parallel, potentially improving execution time in systems where such parallelism can be exploited, whereas linear trees process joins sequentially . The choice between them impacts the query optimization process, with bushy trees offering more optimization possibilities but requiring more complex resource scheduling .

Relational algebra operations, such as joins and semi-joins, impact the optimization of parametric queries by defining efficient methods for data retrieval and manipulation . These operations allow the query optimizer to find alternative execution plans that minimize resource use and execution time. In distributed databases, employing operations like semi-joins can greatly reduce communications costs and improve response times by processing only necessary data across networked sites .

Commutativity and associativity rules in relational algebra allow for the rearrangement of operations in query trees to generate equivalent trees, offering multiple execution paths for a given query . These transformations increase the search space for optimization by providing diverse strategies, enabling the query optimizer to select a route that minimizes execution costs according to the system's specific constraints .

Query optimization in DDBMS is complex because the number of alternative execution strategies can increase exponentially due to factors such as the presence of numerous fragments, the distribution of fragments across various sites, varying speeds of communication links, and disparities in local processing capabilities . To manage these complexities, strategies such as minimizing cost functions and employing distributed join processing (bushy vs. linear trees, ship-whole vs. ship-as-needed), and the use of semi-joins are adopted, which help reduce communication costs at the expense of more local processing .

The semi-join strategy is beneficial in distributed databases as it reduces communication costs by only returning rows from the first table for which matches are found in the second table, thereby reducing the amount of data transmitted over the network . Unlike a conventional join, which includes all matches between the tables and potentially duplicates rows, a semi-join returns rows from the first table at most once, making it more efficient for certain queries .

In distributed systems, cost functions in query optimization measure the total cost of executing a query in terms of CPU, I/O, and communication costs . These functions guide optimization decisions by aiming to reduce each cost component individually, which increases system throughput. Cost functions may also prompt parallel execution to minimize response time, although this might increase the total activity time . By balancing these costs, the system can optimize resource utilization .

Query Optimization in Distributed Databases
No ratings yet
Query Optimization in Distributed Databases
7 pages
CHAPTER 5 Chat GPT
No ratings yet
CHAPTER 5 Chat GPT
33 pages
Layers of Query Processing in DBMS
No ratings yet
Layers of Query Processing in DBMS
24 pages
Distributed Query Processing Strategies
No ratings yet
Distributed Query Processing Strategies
15 pages
Query Processing in Distributed Databases
No ratings yet
Query Processing in Distributed Databases
48 pages
Query Optimization in Distributed Databases
No ratings yet
Query Optimization in Distributed Databases
42 pages
Query Decomposition in CSE 453
No ratings yet
Query Decomposition in CSE 453
72 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
ADB CH 2 Query Processing and Optimization STV 2 1
No ratings yet
ADB CH 2 Query Processing and Optimization STV 2 1
56 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
47 pages
Heuristic Query Optimization in DBMS
No ratings yet
Heuristic Query Optimization in DBMS
5 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
79 pages
Query Processing in Distributed Databases
No ratings yet
Query Processing in Distributed Databases
11 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
36 pages
RDBMS Query Optimization Techniques
No ratings yet
RDBMS Query Optimization Techniques
11 pages
Chapter 2 - Query-processing-And-optimizationFinal ADVANCED DATA BASE
No ratings yet
Chapter 2 - Query-processing-And-optimizationFinal ADVANCED DATA BASE
59 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
50 pages
Distributed Query Processing Overview
No ratings yet
Distributed Query Processing Overview
25 pages
Query Processing in Distributed Database
No ratings yet
Query Processing in Distributed Database
20 pages
w5 - Query Processing and Optimization
No ratings yet
w5 - Query Processing and Optimization
41 pages
Distributed Query Optimization Strategies
86% (7)
Distributed Query Optimization Strategies
48 pages
Query Optimization in Advanced Databases
No ratings yet
Query Optimization in Advanced Databases
65 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
16 pages
Query Processing in Distributed Databases
No ratings yet
Query Processing in Distributed Databases
97 pages
DDBMS Chapter 4 SE LectureNote (Version 1)
No ratings yet
DDBMS Chapter 4 SE LectureNote (Version 1)
19 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
45 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
31 pages
Query Processing Steps Explained
No ratings yet
Query Processing Steps Explained
21 pages
ADBMS Notes 3
No ratings yet
ADBMS Notes 3
34 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
31 pages
Query Processing Steps in DBMS
No ratings yet
Query Processing Steps in DBMS
20 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
25 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
21 pages
Steps in Database Query Processing
No ratings yet
Steps in Database Query Processing
14 pages
Unit-4 DBMS
No ratings yet
Unit-4 DBMS
130 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
27 pages
Advanced Database Systems by Natan Asrat
No ratings yet
Advanced Database Systems by Natan Asrat
101 pages
7 ddb07
No ratings yet
7 ddb07
46 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
94 pages
Query Optimization in Advanced DBMS
67% (3)
Query Optimization in Advanced DBMS
48 pages
Translating SQL to Relational Algebra
100% (1)
Translating SQL to Relational Algebra
47 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
129 pages
Parsing and Translation in Query Processing
No ratings yet
Parsing and Translation in Query Processing
63 pages
Query Processing and Optimization Steps
100% (1)
Query Processing and Optimization Steps
43 pages
Overview of Query Processing in DBMS
100% (1)
Overview of Query Processing in DBMS
22 pages
MODULE2 DBMSPDF
No ratings yet
MODULE2 DBMSPDF
67 pages
Steps in Distributed Query Processing
No ratings yet
Steps in Distributed Query Processing
13 pages
Query Processing and Optimization Algorithms
No ratings yet
Query Processing and Optimization Algorithms
46 pages
Unit-04 DBMS Notes
No ratings yet
Unit-04 DBMS Notes
131 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
13 pages
Distributed Database Query Optimization
No ratings yet
Distributed Database Query Optimization
106 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
10 pages
Query Processing & Optimization Techniques
No ratings yet
Query Processing & Optimization Techniques
61 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
10 pages
Query Optimization Techniques in DBMS
100% (1)
Query Optimization Techniques in DBMS
38 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
21 pages
CH-19 Summary & Quize
No ratings yet
CH-19 Summary & Quize
24 pages
Database Integration and Management Overview
No ratings yet
Database Integration and Management Overview
27 pages
Software Quality Metrics Overview
No ratings yet
Software Quality Metrics Overview
8 pages
2024010865
No ratings yet
2024010865
1 page
Data Warehousing & Mining Overview
100% (4)
Data Warehousing & Mining Overview
24 pages
MCQs on Measures of Central Tendency
100% (2)
MCQs on Measures of Central Tendency
16 pages
Modern Physics Prelim Exam Guide
No ratings yet
Modern Physics Prelim Exam Guide
5 pages
Solving Circle Equations in Conic Sections
No ratings yet
Solving Circle Equations in Conic Sections
43 pages
Roundabout Performance and Gap Acceptance
No ratings yet
Roundabout Performance and Gap Acceptance
16 pages
Centrality Measures in Social Networks
No ratings yet
Centrality Measures in Social Networks
28 pages
Grade 2 Math Daily Lesson Plan
No ratings yet
Grade 2 Math Daily Lesson Plan
8 pages
(936BCB0C) The Practice of Prolog (Sterling 1990-10-30)
No ratings yet
(936BCB0C) The Practice of Prolog (Sterling 1990-10-30)
331 pages
Length and Volume Measurement Techniques
No ratings yet
Length and Volume Measurement Techniques
6 pages
Transcription Guidelines and Best Practices
No ratings yet
Transcription Guidelines and Best Practices
7 pages
Competing Function Model Validation
No ratings yet
Competing Function Model Validation
5 pages
DNP3 Function Codes Overview
No ratings yet
DNP3 Function Codes Overview
5 pages
MIPS Syscall Table Overview
No ratings yet
MIPS Syscall Table Overview
1 page
Probability Concepts and Techniques
100% (1)
Probability Concepts and Techniques
21 pages
IEEE Conference Paper Sample Format
0% (1)
IEEE Conference Paper Sample Format
2 pages
9709 s11 Ms 62 PDF
No ratings yet
9709 s11 Ms 62 PDF
6 pages
Analysis in Euclid's Geometry
No ratings yet
Analysis in Euclid's Geometry
2 pages
JEE Mains: Quadratic Equations & Complex Numbers
No ratings yet
JEE Mains: Quadratic Equations & Complex Numbers
149 pages
Digital System Fundamentals Overview
No ratings yet
Digital System Fundamentals Overview
77 pages
BSc Applied Mathematics Admission Guide
No ratings yet
BSc Applied Mathematics Admission Guide
36 pages
General Intelligence Sample Paper
No ratings yet
General Intelligence Sample Paper
24 pages
Operations Research Exam Paper 2011
No ratings yet
Operations Research Exam Paper 2011
2 pages
Revisiting the QED Manifesto
No ratings yet
Revisiting the QED Manifesto
14 pages
Class VI Math Revision Worksheet
No ratings yet
Class VI Math Revision Worksheet
4 pages
Employee Data Preprocessing with Pandas
No ratings yet
Employee Data Preprocessing with Pandas
95 pages
Bioinformatics Master's Program Application
No ratings yet
Bioinformatics Master's Program Application
30 pages
CMG-3T Broadband Seismometer Overview
No ratings yet
CMG-3T Broadband Seismometer Overview
7 pages
Introduction to Deep Q-Learning in Python
No ratings yet
Introduction to Deep Q-Learning in Python
8 pages
Linear Prediction in Signal Processing
No ratings yet
Linear Prediction in Signal Processing
31 pages
Solving Linear Equations and Graphs
No ratings yet
Solving Linear Equations and Graphs
12 pages
2001 Armstrong Principlesforecasting PDF
100% (1)
2001 Armstrong Principlesforecasting PDF
862 pages
HIGH-SIM: A New Vehicle Trajectory Dataset
No ratings yet
HIGH-SIM: A New Vehicle Trajectory Dataset
15 pages

Query Optimization in DDBMS

Uploaded by

Query Optimization in DDBMS

Uploaded by

UNIT – II – DISTRIBUTED DATABASE – KCA045

Global Queries to Fragment Queries

Query Optimization Issues in DDBMS

 The presence of a number of fragments.

 Distribution of the fragments or tables across various sites.

 The speed of communication links.

 Disparity in local processing capabilities.

 Time to communicate queries to databases.

 Time to execute local query fragments.

 Time to assemble data from different sites.

 Time to display results to the application.

Global Query Optimization

Input: Fragment query

• Find the best (not necessarily optimal) global schedule

➡ Minimize a cost function

➡ Distributed join processing

✦ Bushy vs. linear trees

✦ Which relation to ship where?

➡ Decide on the use of semijoins

✦ nested loop vs ordered joins (merge join or hash join)

➡ The set of equivalent algebra expressions (query trees).

• Cost function (in terms of time)

➡ I/O cost + CPU cost + communication cost

➡ These might have different weights in different distributed environments

➡ Can also maximize throughput

➡ How do we move inside the solution space?

➡ Exhaustive search, heuristic algorithms (iterative improvement, simulated

Query Optimization Process

• Search space characterized by alternative execution

• Focus on join trees

FROM EMP, ASG,PROJ

• Total Time (or Total Cost)

➡ Reduce each cost (in terms of time) component individually

➡ Do as little of each cost component as possible

➡ Optimizes the utilization of the resources

➡ Do as many things as possible in parallel

➡ May increase total time because of increased total activity

• Summation of all cost factors

• Total cost = CPU cost + I/O cost + communication cost

• CPU cost = unit instruction cost * [Link] instructions

• I/O cost = unit disk I/O cost * no. of disk I/Os

• communication cost = message initiation + transmission

2- Step – Problem Definition

• The objective is to find an optimal allocation of Q to S such that

➡ the load unbalance of S is minimized

➡ The total communication cost is minimized

• For each q in Q compute load (Sq)

• While Q not empty do

➡ Select subquerya with least allocation flexibility

2- Step Algorithm Example

Common questions

How does the use of a semi-join operation specifically help in optimizing queries in distributed database systems?

How does the use of a semi-join operation specifically help in optimizing queries in distributed database systems?

In the context of distributed databases, what are the goals of the two-step algorithm for allocating queries to sites, and how are these goals achieved?

In the context of distributed databases, what are the goals of the two-step algorithm for allocating queries to sites, and how are these goals achieved?

Discuss how exhaustive search and heuristic algorithms differ in optimizing queries in distributed environments.

Discuss how exhaustive search and heuristic algorithms differ in optimizing queries in distributed environments.

Explain how parallel execution affects the response time and total time in query processing within distributed systems.

Explain how parallel execution affects the response time and total time in query processing within distributed systems.

What are the implications of employing bushy trees versus linear trees in distributed join processing?

What are the implications of employing bushy trees versus linear trees in distributed join processing?

In what ways can the relational algebra operations impact the optimization of parametric queries in distributed databases?

In what ways can the relational algebra operations impact the optimization of parametric queries in distributed databases?

What are the roles of commutativity and associativity in generating equivalent join trees for query optimization?

What are the roles of commutativity and associativity in generating equivalent join trees for query optimization?

How does the complexity of query optimization in distributed database management systems (DDBMS) arise, and what strategies are involved in managing these complexities?

How does the complexity of query optimization in distributed database management systems (DDBMS) arise, and what strategies are involved in managing these complexities?

Why is the semi-join strategy beneficial in distributed database systems, and how does it differ from a conventional join?

Why is the semi-join strategy beneficial in distributed database systems, and how does it differ from a conventional join?

What is the role of cost functions in query optimization within distributed systems, and how do these functions impact optimization decisions?

What is the role of cost functions in query optimization within distributed systems, and how do these functions impact optimization decisions?

You might also like