0% found this document useful (0 votes)

6 views29 pages

Query Optimization Algorithms

Query processing involves converting high-level queries into efficient execution plans, utilizing techniques such as rule-based and cost-based optimization. The process includes syntax analysis, query decomposition, optimization, and execution plan generation, ensuring the query adheres to integrity constraints and user privileges. Effective query optimization is crucial for enhancing database performance by minimizing resource usage and execution time.

Uploaded by

vivekkumars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views29 pages

Query Optimization Algorithms

Uploaded by

vivekkumars

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Query Processing and Optimization

Chapter 11
11.1 Introduction

Query processing requires that the DBMS identify and execute a strategy for retrieving the
results of the query. The query determines what data is to be found, but does not define
the method by which the data manager searches the database. Therefore Query optimization
is necessary to determine the optimal alternative to process a query. There are two main
techniques for query optimization. The first approach is to use a rule based or heuristic
method for ordering the operations in a query execution strategy. The second approach
estimates the cost of different execution strategies and chooses the best solution. In general
most commercial database systems use a combination of both techniques.

11.2 Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in high-

level language (Ex. SQL, QBE) into a correct and efficient execution plan expressed in low-
level language, which is used for data manipulation.
Query Processor : Query processor is responsible for generating execution plan.
Execution Plan : Query processing is a stepwise process. Before retrieving or updating
data in database, a query goes through a series of query compilation steps. These steps are
known as execution plan.
The success of a query language also depends upon its query processor i.e., how much
efficient execution plan it can create? The better execution plan leads to low time and cost.
In query processing, the first phase is transformation in which parser first checks the syntax
of query and also checks the relations and attributes used in the query that are defined in
the database. After checking the syntax and verifying the relations, query is transformed into
equivalent expression that are more efficient to execute. Transformation, depends upon various

419
420 Introduc tion to Database Management System
factors like existence of certain database structures, presence of different indexes, file is sorted
or not, cost of transformation, physical characteristics of data etc. After transformation of
query, transformed query is evaluated by using number of strategies known as access plans.
While generating access plans, factors like physical properties of data and storage are taken
into account and the optimal access plan is executed. The next step is to validate the user
privileges and ensure that the query does not disobey the relevant integrity constraints.
Finally, execution plan is executed to generate the result.

11.2.1 General Strategy for Query Processing

The general strategy for query processing is as follows:
(i) Representation of query : Query written by user cannot be processed directly by
system. Query processor first checks the syntax and existence of relations and their
attributes in database. After validations, query processor transform it into equivalent
and more efficient expression for example query will be converted into a standard
internal format that parser can manipulate. Parser also adds some additional predicates
to the query to enforce security. Internal form may be relational algebra, relational
calculus, any low-level language, operator graphs etc.
(ii) Operator graphs : Operator graphs are used to represent query. It gives the sequence
of operations that can be performed. It is easy to understand the query represented
by operator graphs. It is useful to determine redundancy in query expressions, result
of transformation, simplify the view etc.
(iii) Response time and Data characteristics consideration : Data characteristics like length
of records, expected sizes of both intermediate and final results, size of relations etc.,
are also considered for optimizing the query. In addition to this overall response time
is also determined.

11.2.2 Steps in Query Processing

Various steps in query processing are shown in Figure 11.1. Suppose that user inputs a query
in general query language say QBE, then it is first converted into high-level query language
say SQL etc. Other steps in query processing are discussed below in detail:
(i) Syntax Analysis : Query in high-level language is parsed into tokens and tokens are
analyzed for any syntax error. Order of tokens are also maintained to make sure that all the
rules of language grammars are followed. In case of any error, query is rejected and an error
code with explanation for rejection is returned to the user. (Only syntax is checked in this
step).
(ii) Query Decomposition : In this step, query is decomposed into query blocks which
are the low-level operations. It starts with the high-level query that is transformed into low-
level operations and checks whether that query is syntactically and semantically correct. For
example, a SQL query is decomposed into blocks like Select block, From block, Where block
etc. Various stages in query decomposition are shown in Figure 11.2.
Query Processing and Optimiz ation 421

User input General query

Transform into

High level query language

ex. SQL

Scanning, parsing, Syntax checking and verification

validating by parser

Check existence of relations and attributes,

Database Query decomposer semantic analysis, decomposing complex
catalog query into smaller ones

Optimization to reduce execution time and

Query optimizer cost by substituting equivalent expressions
for those in the query

Execution plan Query modification

Query code generator Generate code for queries

Main Runtime database processor Deals with database to do necessary

database operation

Query result

FIGURE 11.1. Steps in query processing.

SQL query Query analysis

Query normalization

Semantic analysis

Query simplifier

Query restructuring

Algebraic expressions

FIGURE 11.2. Steps in query decomposition.

422 Introduc tion to Database Management System
(a) Query Analysis : In the query analysis stage, programming language compiler checks
that the query is lexically and syntactically correct. A syntactically correct query is
analyzed using system catalogues to verify the existence of relations and attributes used
in query. After analysis a correct query is converted into some internal representation,
which is more efficient for processing.
The type specification of the query qualifier and result is also checked at this stage.
The internal representation may be, query tree or query graph.

Query tree notation : A Typical internal representation of query is query tree. It is
also known as relational algebra tree. A query tree is constructed using tree data
structure that corresponds to the relational algebra expression. Main components of
query tree are:

 Root of tree–represents result of query.

 Leaf nodes–represent input relations of the query.

 Internal nodes–represent intermediate relation that is the output of applying an
operation in the algebra.

 The sequence of operations is directed from leaves to the root node.

For example,
Employee_Name, Address, JOB, Department, Department-location

[Link] = [Link]

E D
Employee Department

FIGURE 11.3. Query treee notation.

Query graph notation : Graph data structure is also used for internal representation
of query. In graphs:

 Relation nodes–represent relations by single circle.

 Constant nodes–represent constant values by double circle.

 Edges–represent relation and join conditions.

 Square brackets–represent attributes retrieved from each relation.

[E.Employee_Name, [Link],
[[Link]-ID]
[Link], [Link]]

‘Delhi’ D E
Department-Location = Delhi [Link] = [Link]

FIGURE 11.4. Query graph notation.

Query Processing and Optimiz ation 423
(b) Query normalization : After query analysis, it is normalized to remove any redundancy.
In this phase query is converted into normalized form that can be easily manipulated.
A set of equivalency rules are applied to query to simplify the projection and selection
operations to avoid redundancy. Query can be converted into one of the following
two normal forms :

 Conjunctive normal form : It is a sequence of conjuncts that are connected with
‘AND’ operator. A conjunct consists of one or more terms connected with ‘OR’
operator. A conjuctive selection consists only those tuples that satisfy all conjuncts.
Example. (Emp_Job = ‘Analyst’ ∨ salary < 50000) ∧ (Hire_Date > 1–1–2000)
 Disjunctive normal forms : It is a sequence of disjuncts that are connected with ‘OR’
operator. A disjunct consists of one or more terms connected with ‘AND’ operator.
A disjunctive selection contains those tuples that satisfy anyone of the disjunct.
Example. (Emp_Job= ‘Analyst’ ∧ salary < 5000) ∨ (Hire_Date > 1–1–2000)
Disjunctive normal form is more useful as it allows the query to break into a series
of independent sub-queries linked by union.
(c) Semantic analyzer : The semantic analyzer performs the following tasks :
 It helps in reducing the number of predicates.

 It rejects contradictory normalized forms.

 In case of missing join specification, components of query do not contribute to
generation of results. It identifies these queries and rejects them.

 It makes sure that each object in query is referenced correctly according to its data
type.
(d) Query simplifier : The major tasks of query simplifier are as follows :
 It eliminates common sub-expressions.

 It eliminates redundant qualification.

 It introduces integrity constraints, view definitions into the query graph representation.

 It eliminates query that voids any integrity constraint without accessing the database.

 It transforms sub-graphs into semantically equivalent and more efficient form.

 It deals with user access rights.

Idempotence rules of Boolean Algebra are applied to get final form of simplification.
(e) Query Restructuring : At the final stage of query decomposition, transformation rules
are applied to restructure the query to give a more efficient implementation.

(iii) Query Optimization : The aim of the query optimization step is to choose the best
possible query execution plan with minimum resources required to execute that plan. Query
optimization is discussed in detail in section 11.3.

(iv) Execution Plan : Execution plan is the basic algorithm used for each operation in
the query. Execution plans are classified into following Four types : (a) Left-deep tree query
execution plane, (b) Right-deep tree query execution plan, (c) Linear tree execution plan,
(d) Bushy execution plan.
424 Introduc tion to Database Management System
(a) Left-deep tree query execution plan : In left-deep tree query execution plan, development
of plan starts with a single relation and successively adding a operation involving
a single relation until the query is completed. For example, Only the left hand side
of a join is allowed to participate in result from a previous join and hence named
left-deep tree. It is shown in Figure 11.5.

Result

R1 R2

FIGURE 11.5. Left-deep execution plan.

Advantages : The main advantages of left-deep tree query execution plan are

 It reduces search space.

 Query optimiser is based on dynamic programming techniques.

 It is convenient for pipelined evaluation as only one input to each join is pipelined.

Disadvantage : The disadvantages of left-deep tree query execution plan are

 Reduction in search space leads to miss some lower cost execution strategies.

(b) Right-deep tree query execution plan : It is almost same as left-deep query execution
plan with the only difference that only the right hand side of a join is allowed to
participate in result from a previous join and hence named right-deep tree. It is
applicable on applications having a large main memory. It is shown in Figure 11.6.
Result

R2 R1

FIGURE 11.6. Right-deep execution plan.

(c) Linear tree execution plan : The combination of left-deep and right-deep execution
plans with a restriction that the relation on one side of each operator is always a
base relation is known as linear trees. It is shown in Figure 11.7.
Query Processing and Optimiz ation 425

Result

R2 R1

FIGURE 11.7. Linear tree execution plan.

(d) Bushy execution plan : Bushy execution plan is the most general type of execution
plan. More than one relation can participate in intermediate results. It is shown in
Figure 11.8.

Result

R3 R4 R5

R2 R1

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it in

choosing the best execution plan by increasing search space but this flexibility may
leads to considerably increase the search space.
(v) Query Code Generator : After selecting the best possible execution plan query is
converted into low-level language so that it can be taken as input by runtime database
process.
(vi) Runtime Database Processor : It deals directly with main database and do the
necessary operation mentioned in query and returns the result to user.

11.3 Query Optimization

Query performance of a database systems is dependent not only on the database structure,
but also on the way in which the query is optimized. Query optimization means converting a
query into an equivalent form which is more efficient to execute. It is necessary for high-level
relation queries and it provides an opportunity to DBMS to systematically evaluate alterative
426 Introduc tion to Database Management System
query execution strategies and to choose an optimal strategy. A typical query optimization
process is shown in Figure 11.9.
Statistical data
Estimation formulas
Simplified relational
(determine cardinality
algebra query tree
of intermediate result tables)
Query optimiser

Cost model Execution plan generator

Execution plan in form of

optimized relational algebra query

FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:

1. Reduction of data transfer with database.
2. Use of available indexes for fast searching.
3. Reduction of number of times the database is manipulated.
4. The order in which joins should be performed.
5. How to store intermediate results?
Following are the three relations we used in each example:
Employee (Emp-ID, Emp-Name, Age, Salary, Dept-ID)
Department (Dept-ID, Proj-ID, Dept-Name)
Project (Proj-ID, Name, Location, Duration)
There are two main techniques used to implement query optimization. These are heuristic
query optimization and cost based query optimization.

11.3.1 Transformation Rules for Relational Algebra

The transformation rules are used to formulate a relational algebra expression into different
ways and query optimizer choose the most efficient equivalent expression to execute. Two
expressions are considered to be equivalent if they have same set of attributes in different
order but representing the same information.
Let us consider relations R, S, and T with set of attributes
X = {X1, X2, ..., Xn}, Y = {Y1, Y2, ..., Yn} and Z = {Z1, Z2, ..., Zn}
respectively, where X, Y, and Z represent predicates and L, L1, L2, M, M1, M2, and N denote
sets of attributes.
Rule 1. Cascading of selection (σ)
σX ∧ Y ∧ Z (R) ≡ σX (σY(σZ (R))).
Query Processing and Optimiz ation 427
It means that conjunctive selection operations can be transformed into individual selection
operations and vice versa.
σAge
Ex. = 35∧ salary > 50000 (Employee) = σAge = 35 (σsalary> 50000 (Employee)).
Rule 2. Commutativity of selection (σ)
σX (σY (R)) ≡ σY (σX (R)).
Ex. σAge = 35 (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (σAge = 35 (Employee)).
Rule 3. Cascading of projection (π)
πL πM ... πN (R) ≡ πL (R)
Ex. πEmp-name πEmp-name, Age (Employee) ≡ πEmp-name (Employee)
Rule 4. Commutativity of selection (σ) and projection (π)
pX1, X2, ..., Xn (sA (R)) ≡ sA (pX1, X2, ..., Xn (R))
πEmp-name, Age (σSalary > 50000 (Employee)) ≡ σSalary > 5000 (πEmp-name, Age (Employee))
Rule 5. Commutativity of join () and Cartesian Product (X)
R Y S ≡ S Y R
R × S ≡ S × R
Ex. Employee [Link]-ID = [Link]-ID Department ≡
Department [Link]-ID = [Link]-ID Employee

Rule 6. Commutavity of selection (σ) and join () or Cartesian product (X)
(σX R Y S ≡ (σX (R) Y S)
σX (R × S) ≡ (σX (R)) × S
Ex. σ[Link] > 30 ∧ Dept-Name = ‘MARKETING’ (Employee) [Link]-ID = [Link]-ID
(Department) ≡ σ[Link]>30 (Employee) [Link]-ID = [Link]-ID
(σDept-Name = ‘MARKETING’ (Department))

Rule 7. Commutavity of projection (π) and join () or Cartesian product (X).
pL1 ∪ L2 (R Z S) ≡ (pL1 (R)) Z (pL2 (S))
Ex. πEmp-Name, Dept-Name, Dept-ID (Employee [Link]-ID = [Link]-ID Department) ≡
(πEmp-name, Dept-ID (Employee)) [Link]-ID = [Link]-ID (πDept-Name, Dept-ID (Department))

Rule 8. Commutativity of Union (∪) and Intersection (∩)

R ∪ S = S ∪ R
R ∩ S = S ∩ R
Rule 9. Commutativity of Selection (σ) and Union (∪) or Intersection (∩) or
Differerence (–).
σX (R ∪ S ) = σX (S) ∪ σX (R)
σX (R ∩ S) = σX (S) ∩ σX (R)
σX (R – S) = σX (S) – σX (R)
428 Introduc tion to Database Management System
Rule 10. Comutativity of projection (π) and Union (∪)
πL (R ∪ S) = πL (S) ∪ πL (R)
Rule 11. Associativity of Join () and Cartesian product (X)
(R  S)  T ≡ R  (S  T)
(R X S) X T ≡ R X (S X T)
Rule 12. Associativity of Union (∪) and Intersection (∩)
(R ∪ S ) ∪ T ≡ S ∪ (R ∪ T)
(R ∩ S) ∩ T ≡ S ∩ (R ∩ T)
Rule 13. Converting a selection (σ) and Cartesian product (X) sequence into Join ()
σX (R X S ) ≡ (R X S)

11.3.2 Heuristic Query Optimization

Heuristic query optimization technique is used to modify the internal representation of a
query by using heuristic rules and transformation rules. Heuristic rules are used in the
form of a query tree or query graph structure. Optimiser starts with initial query tree and
transform it into an equivalent and efficient query tree using transformation rules.
Heuristic Optimization Algorithm : DBMS use heuristic optimization algorithms to improve
the efficiency of query by converting initial query tree into an equivalent and optimized
query tree. Optimizers utilize transformation rules to optimize the structure of query tree.
Following are the steps of heuristic optimization algorithm.
Step 1. Perform Selection operation as early as possible : By using selection operation at
early stages, you can reduce the unwanted number of record or data, to transfer
from database to primary memory.
Optimizer use transformation rule 1 to divide selection operations with conjunctive
conditions into a cascade of selection operations.
Step 2. Perform commutativity of selection operation with other operations as early as
possible : Optimizer use transformation rule 2, 4, 6, and 9 to move selection
operation as far down the tree as possible and keep selection predicates on the
same relation together. By keeping selection operation down at tree reduces the
unwanted data transfer and by keeping selection predicates together on same
relations reduces the number of times of database manipulation to retrieve
records from same database table.
Step 3. Combine the Cartesian Product with subsequent selection operation whose predicates
represents a join condition into a JOIN operation : Optimizer uses transformation rule
13 to convert a selection and cartesian product sequence into join. It reduces
data transfer. It is always better to transfer only required data from database
instead of transferring whole data and then refine it. (Cartesian product combines
all data of all the tables mention in query while join operation retrieves only
those records from database that satisfy the join condition).
Step 4. Use Commutativity and Associativity of Binary operations : Optimizer use transformation
rules 5, 11, and 12 to execute the most restrictive selection operations first.
Query Processing and Optimiz ation 429
It rearranges the leaf nodes of query tree. By using the most restrictive selection
operations, the number of records fetched from database reduces and also
subsequent operations can be performed on less number of records.
Step 5. Perform projection operations as early as possible : After performing selection operations,
optimizer use transformation rules 3, 4, 7 and 10 to reduce the number of
columns of a relation by moving projection operations as far down the tree as
possible and keeping projection predicates on the same relation together.
Step 6. Compute common expressions only once: It is used to identify sub-trees that represent
groups of operations that can be executed by a single algorithm.

Consider the query below in SQL and transformation of its initial query tree into an
optimal query tree.
Select Emp_Name
From Employee e, Department d, Project p
Where [Link] = ‘LUXMI PUB.’
AND d.Proj_ID = p.Proj_ID
AND e.Dept_ID= d.Dept_ID
AND [Link] > 35
This query needs to display names of all employees working for project “LUXMI PUB.”
and having age more than 35 years.
Figure 11.10 shows the initial query tree for the given SQL query. If the tree is executed
directly then it results in the Cartesian product of entire Employee, Department, and Project
table but in reality, the query needed only one record from relation Project and only the
employee records for those whose age is greater than 35 years.

Emp_Name

[Link] = ‘LUXMI PUB.’ d.Proj_ID = p.Proj_ID e.Dept_ID = d.Dept_ID [Link] > 35

X Project

Employee Department

FIGURE 11.10. Initial query tree.

— We can improve the performance by first applying selection operations to reduce the
number of records that appear in Casterian product. Figure 11.11 shows the improved query
tree.
430 Introduc tion to Database Management System

Emp_Name

d.Proj_ID = p.Proj_ID

e.Dept_ID = d.Dept_ID [Link] = ‘LUXMI PUB.’

X Project

[Link] > 35 Department

Employee

FIGURE 11.11. Improved query tree by first applying selection operations.

— The query tree can be further improved by applying more restrictive selection operation.
So, switch the positions of relations Project and Employee as you know that in a single
project it may be more than one employee. Figure 11.12 shows the improved query tree.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID [Link] > 35

X Employee

[Link] = ‘LUXMI PUB.’ Department

Project

FIGURE 11.12. Improved query tree by applying more restrictive selection operations.

— A further improvement can be done by replacing Cartesian product operations by

Join operations with a join condition as shown in Figure 11.13.
Query Processing and Optimiz ation 431
Emp_Name

p.Dept_ID = d.Dept_ID

d.Proj_ID = p.Proj_ID [Link] > 35

Employee
[Link] = ‘LUXMI PUB.’ Department

Project

FIGURE 11.13. Improved query tree by replacing Cartesian product and

selection operations by join operations.

— Further improvement can be done in query tree by keeping only required attributes
(columns) of relations by applying projection operations as early as possible in the query
tree. Optimizer keep the attributes required to display, and the attributes needed by
the subsequent operation in the intermediate relations. Modified query tree is shown in
Figure 11.14.
Emp_Name

e.Dept_ID = d.Dept_ID

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID [Link] > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

[Link] = ‘LUXMI PUB.’ Department

Project

FIGURE 11.14. Improved query tree by applying and moving projection

operations down the query tree.

The SELECT clause in SQL is equivalent to projection operation and WHERE clause in SQL is
equivalent to selection operation in relational algebra.
432 Introduc tion to Database Management System
11.3.3 Cost Based Query Optimization
In cost based query optimization, optimizer estimates the cost of running of all alternatives
of a query and choose the optimum alternative. The alternative which uses the minimum
resources is having minimum cost. The cost of a query operation is mainly depend on its
selectivity i.e., the proportion of the input relations that forms the output. Following are the
main components used to determine the cost of execution of a query:
(a) Access cost of secondary storage : Access cost to secondary storage consists of cost of
database manipulation operations which includes searching, writing, reading of data
blocks stored in the secondary memory. The cost of searching depends upon the type
of indexes (primary, secondary, hashed), type of file structure, ordering of relation
in addition to physical storage location like file blocks are allocated contiguously on
the same disk or scattered on the disk.
(b) Storage cost : Storage cost consists of cost of storing intermediate results (tables or
files) that are generated by the execution strategy for the query.
(c) Computation cost : Computation cost consists of performing in-memory operations
during query execution such as sorting of records in a file, merging of records,
performing computations on field values, searching of records. These are mainly
performed on data buffers.
(d) Memory usage cost : It consists of cost of pertaining to the number of memory
buffers needed during query execution.
(e) Communication cost : It consists of the cost of transferring query and its result from
database location to the location of terminal where the query is originated.
From all the above components, the most important is access cost to secondary storage
because secondary storage is comparatively slower than other devices. Optimizer try to
minimize computation cost for small databases as most of the data files are stored in main
memory. For large database, it try to minimize the access cost to secondary storage and for
distributed databases, it trys to minimize the communication cost because various sites are
involved for data transfer.
To estimate the cost of execution strategies, optimizer access statistical data stored in
DBMS catalog. The information stored in DBMS catalog is given below:
(i) Number of records in relation X, given as R.
(ii) Number of blocks required to store relation X, given as B.
(iii) Blocking factor of relation X, given as BFR.
(iv) Primary access method for each file and attributes for each file.
(v) Number of levels for each multi-level index for a attribute A given as IA.
(vi) Number of first-level index blocks for a attribute A, given as BAI1.
(vii) Selection cardinality of attribute A in relation R, given as SA, where SA= R × SLA,
where SLA is the selectivity of the attributes.
Cost Function for Selection Operation : Selection operation works on a single relation in
relation algebra and retrieves the records that satisfy the given condition. Depending upon
the structure of file, available indexes, searching methods, the estimated cost of strategies
for selection operation is as given below.
Query Processing and Optimiz ation 433

[Link]. Strategies Cost

1. Linear search B/2, if record is found
B, if record not found
2. Binary search log2 B, if equality condition is one a
unique key attribute
log2 B + (SA/BFR)-1, otherwise
3. Using primary index to retrive a single 1, assuming no overflow
record
4. Equality condition on primary key IA + 1
5. Equality condition on hash key 1, assuming no overflow
6. Inequality condition of primary index IA + B/2
7. Inequality condition of any ordered index IA + B/2
8. Equality condition on clustering index [IA + 1] + [SA/BFR]
(secondary index)
9. Equality condition on non-clustering [IA + 1] + [SA]
index (secondary index)
10. Inequality condition on B+-tree index IA + [BAI1/2] + [R/2]
(secondary index)
Now consider the example of Employee table. Consider, there are 10,000 records stored
in 2000 blocks. Also the following indexes are available.
 A secondary index on the Emp-ID with 4 levels.
 A clustering index on salary with 3 levels and average selection cardinality of 30.
 A secondary index on non-key attributes Age with 2 levels and 4 first level index
blocks. There are 200 distinct values for Age.
Consider the queries:
(i) σEmp-ID = 1A (Employee).
(ii) σAge > 20 (Employee).
(iii) σAge = 20 AND salary > 9000 (Employee).
Consider the following cost components.
 Number of records (R) = 10,000
 Number of blocks (B)= 2000
 Blocking Factor (BFR) = 10000/2000 = 5
 IEmp-ID = 4, SEmp-ID = 10000/10000 = 1
 IAge = 2, SAge = 10000/200 = 50
 ISalary = 2, SSalary = 30
Now cost of the above queries (suppose records are in table)
(i) If linear search is used then cost will be B/2 = 2000/2= 1000
(ii) If linear search is used then cost will be B = 2000.
434 Introduc tion to Database Management System
(iii) This query has a conjunctive selection condition. To estimate the cost of use using
anyone of the two components of selection condition, to retrieve the records plus
the linear search. The linear search costs 2000 and condition salary > 9000 first gives
cost estimate of Isalary + (B/2)= 2 + 1000 = 1002 and cost of condition Age = 20 is
30 + 2 = 32 So, total cost is 1034.
Cost Function for Join Operation : Join operation is the most time consuming operation
in databases and an accurate cost function for join operations depends upon the estimate of
size of file (number of records) after the join operation. The estimated cost of strategies for
join operation is given below:
[Link]. Strategies Cost
1. Nested loop joins (Block) (a) B(R) + [B(R) * B(S)], if the buffer has only
one block
(b) B(R) + [B(S) * (B(R)/(Buffer-2))], if [Buffer-2]
block for relation R.
(c) B(R) + B(S), if all blocks of R can be read into
database buffer
2. Indexed nested-loop join (a) B(R) + R * (IA + 1), if join attribute A in
relation S is a primary key.
(b) B(R) + R * [IA + (SA(R)/BFR(R))], for clustering
index I on attribute A
3. Sort merged join (a) B(R) * [log2 (B(R) + B(S) * log2 (BCR)))], for
sorts
(b) B(R) + B(S) for merge
4. Hash join (a) (3 * B(R)) + B(S), if hash index is held in
memory
(b) 2[B(R) + B(S)] * [log (B(S) – 1)] + B(R) +
B(S), otherwise
(R) and (S) are relations R and S respectively.

Example. Consider we have 600 records in Department table. BFR for department table
is 60 and number of blocks are 600/60 = 10.
For the Join operation, Employee Dept-ID Department

Type of Join Cost

Block nested-loop joins 22000 (Buffer has only one block)
2010 (if all blocks of employee can be read into
database buffer)
Hash Join 6010 (is hash index is held in memory)
Example. Given two unary relation (contains only one attribute), R1 and R2.
R1 R2
7 8
2 4
Query Processing and Optimiz ation 435
9 2
8 1
3 3
9 2
1 7
3 3
6
Show the result of joining R1 and R2 using each of the following algorithms. List the
results in the order that would be output by the join algorithm.
(a) Nested Loop Join. Use R1 for outer Loop and R2 for the inner loop.
(b) Sort Merge Join.
Solution. The result relation contains only one attribute, which is the common attribute
between R and S.
(a) Nested loop join. For every tuple in R1, find matches in R2.
7, 2, 2, 8, 3, 3, 1, 3, 3
(b) Sort merge join. The result appears in sorted order on the join attribute.
1, 2, 2, 3, 3, 3, 3, 7, 8

SOLVED PROBLEMS

Problem 1. Consider two relations R and S. The number of tuples in R is 500,000 each
of size 50 bytes. The number of tuples in S is 125,000 each of size 40 bytes. The page size
is 5 KB. We have a buffer of size 250 KB. Answer the following questions with detail steps
for following the join query:
SELECT *
FROM R, S
WHERE [Link]=[Link].
1. Find the cost of page oriented Simple Nested Loops Join.
2. Find the cost of Block Nested Loops Join.
3. Find the cost of Sort-Merge Join.
4. Can you do any refinement in the Sort-Merge Join Cost? Explain?
5. Find the cost of Hash Join.
Solution. The given information is as follows:
Page size = 5 K, Buffer size = 250 K = 250/50 = 50 pages.
For R, number of tuples = 500,000, size of one tupe = 50 bytes.
Size of R = 500,000 × 50 byte = 500,000×50/5000 pages = 5000 pages.
For S, number of tuples = 125,000, size of each tuple = 40 bytes.
Size of S = 125,000 × 40 byte = 1250,000 × 40/5000 pages = 1000 pages.
436 Introduc tion to Database Management System
Consider S to be the outer relation and R to be the inner relation.

Note. We considered S to be outer because it is the smaller relation. You can also
consider R to be outer but the cost will be more.
Consider N=size of R=5000 pages and M=size of S=1000 pages.
1. Page Oriented Simple Nested Loops Join.
Cost=M+M*N
(Reading outer Relation once + for each outer page reading the whole inner
relation)
Cost=1000+5000*1000=5,001,000 I/O
2. Block Nested Loops Join.
Cost= (scan outer)+(scan inner)*(number of outer blocks)
(number of outer blocks)= ceiling of ((size of outer relation)/(Block size) ).
(block size)=(Buffer Size)-2. (one buffer for reading and one buffer for writing)
Cost=M+N*ceiling of (M/B-2).
Cost=1000+5000*21=106,000 I/O.
3. Sort-Merge Join.
Cost=sorting both relations + reading both relations.
Cost of sorting relation = 2*(number of passes)*Size. (2 for reading and writing)
Number of passes= ceiling of (log of (size/buffer) to the base of B-1) + 1.
The one comes here for pass 0. After pass zero you get (Size/Buffer) sorted runs
For Relation S, (# of passes)=2; by applying the above formula.
Cost of sorting S=2*2*1000 =4000.
Explanation: You can sort S in two passes.

In pass 0, you produce 20 (1000/50) sorted runs of size 50.
In pass 1, you can sort these 20 sorted runs, and hence the whole relation.
For Relation R, (# of passes)=3; by applying the above formula.
Cost of sorting R=2*3*5000 =30,000.

Explanation: You can sort R in three passes.
In pass 0, you produce 100 (5000/50) sorted runs of size 50.
In pass 1, you sort these 100 sorted runs, and produce 2 sorted runs of size 50.
In pass 2, you sort these 2 sorted runs and hence the whole relation.
Final Cost=1000 + 5000 + 4000 + 30,000 = 40,000 I/O.
4. To do the refinement of combing the merging phase in the sorting of R and S,
size of buffer must be greater than sqrt of (size of Larger relation)
In this example, size of Larger relation = 5000 and buffer size = 50. Hence this
condition is not satisfied. So we cannot do this refinement.
5. Hash Join
Cost= 3*(M+N)
Cost= 3*6000=18,000 I/O.
Query Processing and Optimiz ation 437
Problem 2. Consider the join between relations R and S on R.a=S.b, given the following
information about the relations to be joined. The cost metric is the number of page I/Os
unless otherwise noted, and the cost of writing out the result should be uniformly ignored.
• Relation R contains 10,000 tuples and has 10 tuples per page.
• Relation S contains 2,000 tuples and also has 10 tuples per page.
• Attribute b of relation S is the primary key for S.
• Both relations are stored as simple heap .les.
• Neither relation has any indexes built on it.
• 52 buffer pages are available.
(a) What is the cost of joining R and S using a page-oriented simple nested loops
join? What is the minimum number of buffer pages required for this cost to
remain unchanged?
(b) What is the cost of joining R and S using a block-nested loops join? What is the
minimum number of buffer pages required for this cost to remain unchanged?
(c) What is the cost of joining R and S using a sort-merge join? What is the minimum
number of buffer pages required for this cost to remain unchanged?
(d) What is the cost of joining R and S using a hash join? What is the minimum
number of buffer pages required for this cost to remain unchanged? Neglect the
size involved in building a hash table.
Solution. Let M = 1000 be the number of pages in R, N = 200 be the number of pages
in S, and B = 52 be the number of [Link] pages available.
(a) Basic idea is to read each page of the outer relation, and for each page scan
the inner relation for matching tuples. Total cost would be #pages in outer +
(#pages in outer #pages in inner) which is minimized by having the smaller
relation be the outer relation.
Total Cost = N + (N*M) = 200, 200
The minimum number of buffer pages for this cost is 3.
(b) This time read the outer relation in blocks, and for each block scan the inner
relation for matching tuples. So the outer relation is still read once, but the inner
relation is scanned only once for each outer block, of which there are:
ceiling of (#pagesinouter/ B-2] = 4.
Total Cost = N +M* ceiling (N/B-2) = 4, 200.

If the number of buffer pages is less than 52, the number of scans of the inner
would be more than 4 since ceiling(200/49) is 5. The minimum number of buffer
pages for this cost is therefore 52.
(c) Since B > √M > √N we can use the refinement to Sort-Merge discussed on pages
254–255 in the text.
Total Cost = 3 (M + N) = 3, 600
Note. If S.b were not a key, then the merging phase could require more than
one pass over one of the relations, making the cost of merging M N I/Os in
the worst case.
438 Introduc tion to Database Management System
The minimum number of buffer pages required is 25. With 25 buffer pages, the
initial sorting pass will split R into 20 runs of size 50 and split S into 4 runs
of size 50 (approximately). These 24 runs can then be merged in one pass, with
one page left over to be used as an output buffer. With fewer than 25 buffer
pages the number of runs produced by the first pass over both relations would
exceed the number of available pages, making a one-pass merge impossible.
(d) The cost of Hash Join is 3(M+N) if B > √ N where N is the number of pages in
the smaller relation, S. Since √N =14, this condition is met. We will also assume
uniform partitioning from our hash function.
Total Cost = 3 (M + N) = 3, 600.
The minimum number of buffer pages required, and a good guess is that we
need B >√N.
Problem 3. Consider the relations r1(A,B,C), r2(C,D,E), and r3(E, F), with primary keys
A, C, and E, respectively. Assume that r1 has 1000 tuples, r2 has 1500 tuples, and r3 has
750 tuples.
(a) Give an efficient strategy (in which order) for computing r1 1 r2 1 r3
(b) Estimate the size of the above joins.
Solution.
(a) the order shall be (r1 1 r2) 1 r3. This is because the estimate size of r1 1 r2 is
1000, the estimate size of r1 1 r3 is 750,000, while that of r2 1 r3 is 1500.
(b) less than 1000.
Note that (a) and (b) are not necessarily related.
Problem 4. Consider the tables WORKS AT(SSN, gymID) and GYM(gymID, name). Notice
that gymID is not a candidate key for the table GYM.
WORKS AT(SSN, gymID) consists of N1 = 100,000 tuples and has
• V(SSN,WORKS AT) = 50,000 distinct values of SSN
• V(gymID,WORKS AT) = 20,000 distinct values of gymID.
GYM(gymID, name) consists of N1 = 40,000 tuples and has
• V(gymID,GYM) = 20,000 distinct values of gymID
• V(name,GYM) = 30,000 distinct values of name.
(a) Estimate the number of qualifying tuples of the query:
SELECT *
FROM WORKS_AT
WHERE SSN = 123456789;
(b) Can SSN be a candidate key for the table WORKS AT? Give a short explanation
for your answer.
(c) Estimate the number of qualifying tuples of the query:
SELECT *
FROM GYM
WHERE name = “Gym planet”;
Query Processing and Optimiz ation 439
(d) Estimate the number of qualifying tuples of the query:
SELECT *
FROM WORKS_AT
WHERE SSN = 123456789 AND gymID=101;
(e) Notice that gymID is not a candidate key for the table GYM. Estimate the
number of qualifying tuples of the query:
SELECT SSN, [Link], name
FROM WORKS_AT JOIN GYM
WHERE [Link] = WORKS_AT.gymID;
(f) Estimate the number of qualifying tuples of the query:
SELECT [Link], [Link]
FROM WORKS_AT AS WA1 JOIN WORKS_AT AS WA2
WHERE [Link] = [Link];

Solution.
(a) N1/V (SSN, WORKS AT) = 100, 000/50, 000 = 2
(b) No, because it has multiple duplicates in the table.
(c) N2/V (name, GYM) = 40, 000/30, 000 = 1.33
(d) N1/(V (SSN, WORKS AT)∗V (gymID, WORKS AT)) = 100, 000/50, 000/20, 000 =0.0001
(e) gymID is primary key in the table GYM and foreign key in the table WORKS AT. So,
N1 ∗ N2/V (gymID, WORKS AT) = 100, 000 ∗ 40, 000/20, 000 = 200, 000
(f) gymID is not primary key in the table WORKS AT. So, N1 ∗ N1/V (gymID, WORKSAT)
= 100, 000 ∗ 100, 000/20, 000 = 500,000

TEST YOUR KNOWLEDGE

True/False
1. In a query processing system, the optimizer optimizes the query before it is given to the
parser and translator.
2. Working with data in the data cache is many times faster than working with data in the data
files.
3. The cost of processing a query is usually dominated by secondary storage access, which is
slow compared to memory access.
4. The CPU time taken to process a query is usually taken as the cost of the query.
5. The SQL execution activities are performed by the query optimizer.
6. In general, a query plan that generates few intermediate result tuples is preferable to one that
generates a lot of intermediate result tuples.
7. Query processing is the procedure of selecting the most appropriate plan that is used in
responding to a database request.
8. Binary search is better than linear search in the sense that it always works, no matter what.
440 Introduc tion to Database Management System
9. The internal query representation is usually a binary query tree.
10. A cost-based optimizer uses a set of preset rules and points to determine the best approach
to execute a query.
11. To access the records in a file in a specific sort order on the value of an attribute, we could
create an index on that attribute, and use that index to access the record in the desired order.
12. A query tree is also called a relational algebra tree.
13. We can implement a disjunctive selection (“or”s) by taking the intersection of pointers to
records satisfying each component of the disjunction.
14. It is more advantageous (in terms of query processing cost) to use nested loop join instead
of block nested loop join.
15. In the hash-join algorithm, the number of partitions is determined by the size of memory and
the size of the probe relation.
16. Heuristic rules are used as an optimization technique to modify the internal representation of
a query.
17. The cost of processing a query is not dependent on disk access.
18. In general, heuristic rules are used in the form of query tree or query graph data structure.
19. The heuristic optimization algorithm utilizes some of the transformation rules to transform an
initial query tree into an optimised and efficiently executable query tree.
20. Character field comparisons are faster than numeric, date, and NULL comparisons.

Fill in the Blanks

1. Query processing has __________ phases.
2. The ______ analyzes the SQL query and finds the most efficient way to access the data.
3. _____________ play an important role in speeding up data access.
4. ________ is a measure of how likely an index will be used in query processing.
5. Once an SQL statement is transformed, the DBMS creates a(n) _______ plan.
6. In syntax-checking phase of query processing the system _______ the query and checks that
it obeys the ________ rules.
7. The two types of query optimization techniques are (a) ________ and (b) ________.
8. A query tree is also called a ________ tree.
9. Query transformation is performed by transforming the query into ________ that are more
efficient to execute.
10. In the ________ stage, the query is lexically and syntactically analysed using parsers to find
out any syntax error.
11. In ________ stage, the query is converted into normalised form that can be more easily
manipulated.
12. ________ uses the transformation rules to convert one relational algebraic expression into an
equivalent form that is more efficient.
13. In general, the heuristic rules are used in the form of ________ or _______ structure.

Multiple Choice Questions

1. The DBMS ____ the SQL query using the chosen execution plan.
(a) parses (b) executes (c) fetches (d) processes
Query Processing and Optimiz ation 441
2. The objective of query simplifier is
(a) transformation of the query to a semantically equivalent and more efficient form.
(b) detection of redundant qualifications
(c) elimination of common sub-expressions
(d) all of these.
3. If there is no index, the DBMS will perform a ____ scan
(a) loop (b) range
(c) row ID table access (d) full table
4. One measure that determines the need for an index is the ____ of the column you want to
index. ____ refers to the number of different values a column could possibly have
(a) Database statistics (b) Data sparsity
(c) Primary keys (d) Query optimization
5. Knowing the sparsity of a column helps you decide whether the use of ____ is appropriate
(a) query processing (b) query optimization
(c) an index (d) a full table scan
6. ____ is/are the central activity during the parsing phase in query processing
(a) Database statistics (b) Data sparsity
(c) SQL query (d) Query optimization
7. Hash and Merge join are applicable
(a) for any join condition
(b) only when the join condition is an equality (=)
(c) when the join condition has an inequality (> or <)
(d) when the join condition has a \like”
8. Two very large relations R and S are have the physical sort order of R.c1 and S.c1 respectively
(assume a clustering index exists on R.c1 and S.c1). For the query select * from R, S where
R.c1 = S.c1 order by R.c1 the likely best join method is
(a) nested loops join (b) hash join
(c) merge join (d) index nested loops join
9. The component of a database management system responsible for determining the most efficient
way to execute a query is
(a) Query plan (b) Query optimize
(c) Composite index (d) Query
10. Cardinality estimation depends on estimates of the
(a) Selection factors of predicates in the query
(b) Selection factors of query optimizer
(c) Selection factors of query process
(d) Transitional database management system
11. Relational query optimization depends on knowledge of the physical storage of data which is
(i) Readily available
(ii) Problematic
442 Introduc tion to Database Management System
(iii) With query optimizer
(iv) Access path
which of the following are correct?
(a) (i), (iii) (b) (i), (iv) (c) (i), (ii) (d) (iii), (iv)
12. Efficiency of a query can be measured by
(i) Response time
(ii) Execution time
(iii) Execution sequence
(iv) Correctness
which of the following satisfies the statement?
(a) (i), (iii) (b) (ii), (iv) (c) (iii), (iv) (d) (i), (iv)
13. The data processor consists of three elements
(i) Local query optimizer
(ii) Distributed execution monitor
(iii) Local recovery manager
(iv) Run-time support processor
which of the following are true?
(a) (i), (ii) (b) (i), (iv) (c) (iii), (iv) (d) (i), (ii)
14. One of the following steps is not involved in processing a query
(a) Parsing and translation (b) Optimization
(c) Evaluation (d) Distribution
For the Q. 15–19, let r and s be relations, b s (number of blocks occupied by s) =
400, ns (number of records in s)= 1600, b r (number of blocks occupied by r) = 1600,
nr (number of records in r) = 3200. Assume the final result of operations is not written to disk.
15. Let the buffer size M = 400. What is the minimum achievable cost of a join using the block
nested loop join algorithm, with s being traversed in the outside loop?
(a) 400 + 160 (b) 1600 + 400*2 (c) 1600 + 400*4 (d) None of the above
16. Let the buffer size M = 201. What is the minimum achievable cost of a join using the block
nested loop join algorithm, with the r relation being traversed in the outer loop?
(a) 400 + 1600 (b) 400 + 1600 + 1599
(c) 1600 + 400 + 7*399 (d) 400 + 1600 * 1600
17. Assuming we use merge join to join r and s, and the memory allocated for the sort operation
M = 10, what is the cost of the join if both r and s are already sorted on the join attribute?
(a) 400 + 600 (b) 400*10 + 600
(c) 400*10 + 600*10 (d) None of the above
18. Assuming we use hash join, r is the build relation, and M = 10. How many partitioning
passes are needed?
(a) 1 (b) 2 (c) 3 (d) 4
19. Suppose we want to sort the s relation using the sort-merge algorithm, and M = 4. How
many merge passes are needed to get the final result?
(a) Log3400 (b) Log3100 (c) Log4100 (d) Log4400
Query Processing and Optimiz ation 443
For Q. 20–25 below, assume r(A,B,C), s(C,D,E), t(E,F,G) are relations, nr=1000, ns=5000, nt=600,
V(C,r)=200, V(C,s)=100, V(B,r)=400,Min(A)=41, Max(A)=50, in(B)=31, Max(B)=40, Min(C)=80,
Max(C)=89, Min(D)=10, Max(D)=14, V(E,s)=50, V(E,t)=200.
In Q. 20–24, estimate how many tuples there are in the result of the queries.
20. sA≥49 ∨ A<42 (r)
(a) 200 (b) 300 (c) 400 (d) 401
21. sA≥49 ∨ B≥39 ∨ C≥86 (r)
(a) 384 (b) 616 (c) 614 (d) 386
22. r |X| s if C is not a superkey of r and C is not a superkey of s
(a) 25,000 * (b) 50,000 (c) 75,000 (d) 78,000
23. s |X| t if E is the primary key of t and s
(a) At most 600 (b) At most 5,000
(c) At most 3,000,000 (d) None of the above
24. CGsum(A) (r)
(a) 200 (b) 1000 (c) 200,000 (d) 1,200
25. Estimate the number of distinct values V(B, s A≥46(r) )
(a) 50 (b) 100 (c) 200 (d) 400
26. Which of the following cost is the most important cost component to be considered during
the cost-based query optimization?
(a) memory uses cost (b) secondary storage access cost
(c) communication cost (d) all of these
27. In general, the heuristic rules are used in the form of
(a) query tree (b) query graph data structure.
(c) both (a) and (b) (d) either (a) or (b)

Answers
True/False
1. F 2. T 3. T
4. F 5. F 6. T
7. T 8. F 9. T
10. F 11. T 12. T
13. F 14. F 15. F
16. T 17. T 18. T
19. T 20. F

Fill in the Blanks

1. Three 2. query optimizer 3. Indexes
4. Index selectivity 5. execution 6. analyzed, language
7. heuristic, cost based 8. relational algebra 9. equivalent expression
10. syntax analysis 11. query normalization 12. query simplifier
13. query tree, query graph
444 Introduc tion to Database Management System
Multiple Choice Questions
1. (b) 2. (d) 3. (d)
4. (b) 5. (c) 6. (d)
7. (b) 8. (c) 9. (b)
10. (a) 11. (a) 12. (d)
13. (b) 14. (d) 15. (d)
16. (c) 17. (d) 18. (c)
19. (b) 20. (b) 21. (b)
22. (a) 23. (d) 24. (a)
25. (d) 26. (d) 27. (c)

Exercises

Short Answer Questions

1. What is query processing?
2. What are the objectives of query processing?
Ans. The aim of query processing is to transform a query written in a high level language
into a correct and efficient execution strategy expressed in a low level language and execute
the strategy to retrieve the data.
3. What are the 3 steps of query processing?
(i) Parsing and translation, (ii) Optimization, (iii) Evaluation
4. What is query processor?
5. What is execution/query plan?
6. What is specified in a query plan (evaluation plan)?
Ans. Query plan defines what algorithm is used for each operation and how the execution
of the operations is coordinated.
7. Explain the general strategy for query processing?
8. Explain various steps in query processing.
9. What is query decomposition?
10. What are the various stages of query processing?
11. What is query analysis?
12. What is query tree notation?
13. What is query graph notation?
14. What is query optimization? What is its aim?
15. What are various query execution plans?
16. What is left deep tree query execution plan?
17. What is right deep tree query execution plan?
18. What is linear tree query execution plan?
19. What is bushy execution plan?
20. What is query code generator?
Query Processing and Optimiz ation 445
21. What is runtime database processor?
22. What is the goal of query optimization?
Ans. Find the most efficient query evaluation plan (query plan) for a given query, i.e., the
one which can be executed most efficiently.
23. What are the main issues that must be considered in query optimization?
24. At what time during query processing does (query) optimization occur?
Ans. After parsing the query and translating it to relational algebra.
25. What are the various rules for transforming relational algebra query?
26. What is heuristic query optimization?
27. Explain heuristic query optimization algorithm.
28. What is cost based query optimization?
29. How cost-based optimization works?
Ans. Cost-based optimization works in 3 steps:
1. Generating logically equivalent expressions (using equivalence rules);
2. Annotating resultant expressions to get alternative query plans;
3. Choosing the cheapest plan based on estimated cost.
30. What are the main components that are used to determine the cost of execution of a query?
31. What is the cost function for selection operation?
32. What is the cost function for join operation?
33. Two relations R with 60000 tuples and occupying of 300 blocks is to be joined with a relation
S with 40000 tuples and occupying 400 blocks. What is the total cost using the algorithms
of nested-loop-join and block-nested loop join? Give both the best case and the worst case
figures.
Ans. Nested-loop-join: Worst Case: If R as the outer relation 60000 × 400 + 300 = 24000300
Disk access, if S as the outer relation we need 40000 × 300 + 400 = 12000400 disk accesses
Best Case: 300 + 400 = 700 disk accesses will be required.
Block-nested loop join:
Worst Case: If R is the outer relation, we need 300 * 400 + 300 = 120300 disk access, if S is
the outer relation we need 400 * 300 + 400 = 120400 disk access
Best Case: 300+ 400 = 700 disk accesses will be required.
34. How do you estimate the query cost for natural join when
(i) R  S = j
(ii) R  S is a foreign key
Ans.
(i) If R  S = F , then r  s is the same as r × s.
(ii) If R  S in S is a foreign key in S referencing R, then the number of tuples in r  s is
exactly the same as the number of tuples in S. The case for R  S being a foreign key
referencing S is symmetric.
35. Given two relations R (A, B) and S (B, C) with number of tuples in R and S equal to 500
and 1000 respectively and B is the foreign key in R, what is the number of tuples in R  S.
Ans. The number of tuples in R  S is 500000.
36. Assuming M memory blocks, what is the best way to use these blocks in the block nested
loop join?
Ans. M−2 blocks for the outer relation, 1 block for the inner relation, 1 block for the output
446 Introduc tion to Database Management System
37. What is the number of block accesses for a merge-join, assuming that the relations are sorted
and the join attribute(s) in one relation forms a key?
Ans. br + bs, where br and bs is the number of blocks of relation r and s, respectively.
38. Is E1 θ E2 = E2 θ E1 a correct equivalence rule (for query optimization)?
Ans. Yes
39. What is the number of block accesses for a merge-join, assuming that the relations are sorted
and the join attribute(s) in one relation forms a key?
Ans. br + bs, where br and bs are the number of blocks in relation r and s, respectively.
40. Consider a relation r(A, B), which is sorted on A, and the query sB=v(r). Is binary search a
valid strategy to evaluate this query?

Ans. No
41. What are optimizer choices? How many modes do optimizer choices have?
Ans. Query optimization is the central activity during the parsing phase in query processing.
In this phase, the DBMS must choose what indexes to use, how to perform join operations,
and what table to use first, and so on. Each DBMS has its own algorithms for determining
the most efficient way to access the data. The query optimizer can operate in one of two
modes:
A rule-based optimizer uses preset rules and points to determine the best approach to execute
a query. The rules assign a “fixed cost” to each SQL operation; the costs are then added to
yield the cost of the execution plan. For example, a full table scan has a set cost of 10, while
a table access by row ID has a set cost of 3.
A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects
being accessed to determine the best approach to execute a query. In this case, the optimizer
process adds up the processing cost, the I/O costs, and the resource costs (RAM and temporary
space) to come up with the total cost of a given execution plan.
42. How can queries be written to perform the fastest when equality and inequality comparisons
are needed?
Ans. Equality comparisons are faster than inequality comparisons. As a general rule, equality
comparisons are processed faster than inequality comparisons. For example, PRICE = 10.00
is processed faster because the DBMS can do a direct search using the index in the column.
If there are no exact matches, the condition is evaluated as false. However, if you use an
inequality symbol (>, >=, <, <=), the DBMS must perform additional processing to complete
the request. The reason is because there will almost always be more “greater than” or “less
than” values than exactly “equal” values in the index. The “not equal” symbol (<>) yields
slower searches, especially when the sparsity of the data is high, that is, when there are many
more different values than there are equal values.

Long Answer Questions

1. What are the general strategies for query processing ? Explain. Also discuss query optimization
techniques through suitable example.
2. What is meant by query processing ? Discuss the various methods for query processing and
query optimization with the help of suitable examples.
3. What do you mean by query optimization ? Explain the various query optimization methods.
4. Explain the basic algorithm for executing query-processing operation with examples.
5. Discuss various Heuristic based query optimization techniques with examples.
6. Discuss the issues that are considered in designing of query optimizer.
Query Processing and Optimiz ation 447
7. What is query processing ? Describe the steps involved in query processing.
8. Describe how a query that involves join, selection and projection operations can be optimized.
Explain the above with a suitable example.
9. Write the relational algebraic expression and create the query graph for the query. List the
names of the employees who draw a salary above. ` 10,000 and are working on a project
whose title is “Disaster management”. Suggest query improvements in the query graph, if any.
10. A query-involving join on three relations is to be executed. What factors will a DBMS consider
to do evaluation of the query in an optimal fashion ? Explain with the help of an example.
11. Does the data dictionary have any role to play in query processing? Describe with the help
of an SQL query requiring Join operation, SELECTION and PROJECTION.

Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
17 pages
Query Optimization Algorithms in DBMS
No ratings yet
Query Optimization Algorithms in DBMS
30 pages
Query Processing and Optimization Steps
No ratings yet
Query Processing and Optimization Steps
35 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
27 pages
Query Optimization in DBMS Explained
No ratings yet
Query Optimization in DBMS Explained
27 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
27 pages
Steps in Query Processing Explained
No ratings yet
Steps in Query Processing Explained
3 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
31 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
21 pages
Understanding Query Processing Steps
No ratings yet
Understanding Query Processing Steps
1 page
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
38 pages
Understanding Query Optimization Techniques
No ratings yet
Understanding Query Optimization Techniques
11 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
18 pages
Query Processing and Optimization Steps
No ratings yet
Query Processing and Optimization Steps
83 pages
Adbms Unit1
No ratings yet
Adbms Unit1
16 pages
Query Optimization Techniques in DBMS
No ratings yet
Query Optimization Techniques in DBMS
42 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
16 pages
Understanding SQL Query Processing Steps
No ratings yet
Understanding SQL Query Processing Steps
9 pages
Object-Oriented Data Model & Query Processing
No ratings yet
Object-Oriented Data Model & Query Processing
34 pages
Understanding Database Queries
No ratings yet
Understanding Database Queries
6 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
39 pages
Query Processing and Optimization Overview
100% (1)
Query Processing and Optimization Overview
16 pages
Query Processing in DBMS Explained
No ratings yet
Query Processing in DBMS Explained
20 pages
Query Processing and Optimization Algorithms
No ratings yet
Query Processing and Optimization Algorithms
46 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
129 pages
Steps in Query Processing Explained
No ratings yet
Steps in Query Processing Explained
4 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
38 pages
Query Tree Optimization Techniques
No ratings yet
Query Tree Optimization Techniques
60 pages
Introduction To Query Processing and Optimization
No ratings yet
Introduction To Query Processing and Optimization
4 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
5 pages
Overview of Query Processing Steps
No ratings yet
Overview of Query Processing Steps
7 pages
Query Processing Notes
No ratings yet
Query Processing Notes
4 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
49 pages
Query Processing and Optimization Overview
No ratings yet
Query Processing and Optimization Overview
34 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
36 pages
Query Processing in Database Systems
No ratings yet
Query Processing in Database Systems
47 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
24 pages
Query Processing and Optimization in DBMS
No ratings yet
Query Processing and Optimization in DBMS
62 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
21 pages
Query Optimization in Distributed Databases
No ratings yet
Query Optimization in Distributed Databases
7 pages
Unit-7 Query Processing
No ratings yet
Unit-7 Query Processing
12 pages
Query Processing and Optimization Steps
100% (1)
Query Processing and Optimization Steps
43 pages
Query Processing and Optimization Steps
No ratings yet
Query Processing and Optimization Steps
8 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
39 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
63 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
99 pages
Query Processing and Decomposition
No ratings yet
Query Processing and Decomposition
28 pages
Query Processing and Optimization Guide
No ratings yet
Query Processing and Optimization Guide
50 pages
Heuristic Query Optimization in DBMS
No ratings yet
Heuristic Query Optimization in DBMS
43 pages
Types of Integrity Constraints in Databases
No ratings yet
Types of Integrity Constraints in Databases
48 pages
Query Processing and Optimization in DBMS
100% (3)
Query Processing and Optimization in DBMS
28 pages
Query Optimization Techniques in DBMS
No ratings yet
Query Optimization Techniques in DBMS
5 pages
Chapter 2 - Query-processing-And-optimizationFinal ADVANCED DATA BASE
No ratings yet
Chapter 2 - Query-processing-And-optimizationFinal ADVANCED DATA BASE
59 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
11 pages
Query Processing and Optimization Techniques
No ratings yet
Query Processing and Optimization Techniques
79 pages
Overview of Query Processing in DBMS
100% (1)
Overview of Query Processing in DBMS
22 pages
Query Processing Steps in DBMS
No ratings yet
Query Processing Steps in DBMS
7 pages
Steps in Query Processing
No ratings yet
Steps in Query Processing
44 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
Social Media
No ratings yet
Social Media
5 pages
Evaluation of Relational Algebra Expressions
No ratings yet
Evaluation of Relational Algebra Expressions
31 pages
Understanding Symbol Tables in Compilers
No ratings yet
Understanding Symbol Tables in Compilers
28 pages
YACC: A Guide to Compiler Design
No ratings yet
YACC: A Guide to Compiler Design
9 pages
MBA IV Sem: Data Analysis Overview
No ratings yet
MBA IV Sem: Data Analysis Overview
20 pages
Basic Block Construction from 3AC Code
No ratings yet
Basic Block Construction from 3AC Code
10 pages
Computer Organization and Structure Overview
No ratings yet
Computer Organization and Structure Overview
29 pages
Introduction to Compiler Design Basics
No ratings yet
Introduction to Compiler Design Basics
11 pages
Understanding Peripheral Devices in Computing
No ratings yet
Understanding Peripheral Devices in Computing
17 pages
MBA 4th Sem: System Analysis & Design
No ratings yet
MBA 4th Sem: System Analysis & Design
19 pages
Computer Organization Assignment Guide
No ratings yet
Computer Organization Assignment Guide
1 page
Understanding APIs in Software Development
No ratings yet
Understanding APIs in Software Development
10 pages
Winsock Programming
100% (1)
Winsock Programming
12 pages
SMPS Troubleshooting and Repair Guide
100% (1)
SMPS Troubleshooting and Repair Guide
56 pages
How to Install Linux: A Step-by-Step Guide
No ratings yet
How to Install Linux: A Step-by-Step Guide
7 pages
Developing Interactive Web Pages Using Java Scripts
No ratings yet
Developing Interactive Web Pages Using Java Scripts
15 pages
Server-Side Scripting with JSP and ASP
No ratings yet
Server-Side Scripting with JSP and ASP
10 pages
Step-by-Step Computer Disassembly Guide
No ratings yet
Step-by-Step Computer Disassembly Guide
2 pages
ASP Database Connection Guide
No ratings yet
ASP Database Connection Guide
8 pages
Installing CPU Heat Sink and Fan
No ratings yet
Installing CPU Heat Sink and Fan
22 pages
Understanding Computer Virus Types
No ratings yet
Understanding Computer Virus Types
62 pages
Changing Settings On Windows 2000
No ratings yet
Changing Settings On Windows 2000
85 pages
Understanding Internet and Web Basics
No ratings yet
Understanding Internet and Web Basics
5 pages
Understanding Internet Functionality
No ratings yet
Understanding Internet Functionality
4 pages
Understanding Cable Modem Functionality
No ratings yet
Understanding Cable Modem Functionality
5 pages
Serial vs Parallel Ports Explained
100% (1)
Serial vs Parallel Ports Explained
3 pages
Basics of Information Systems & Decision Making
No ratings yet
Basics of Information Systems & Decision Making
73 pages
Understanding Management Information Systems
No ratings yet
Understanding Management Information Systems
12 pages
Multimedia Approach in Information Systems
No ratings yet
Multimedia Approach in Information Systems
10 pages
Transaction Processing Systems Explained
No ratings yet
Transaction Processing Systems Explained
11 pages
Power BI Interview Questions Guide
No ratings yet
Power BI Interview Questions Guide
18 pages
Mac Keyboard Shortcuts Guide
No ratings yet
Mac Keyboard Shortcuts Guide
13 pages
Openflow For Wireless Mesh Networks
No ratings yet
Openflow For Wireless Mesh Networks
6 pages
Unlocking Recursive Thinking in LLMs
No ratings yet
Unlocking Recursive Thinking in LLMs
14 pages
Key Features of React Explained
No ratings yet
Key Features of React Explained
45 pages
KEP: High Voltage Testing Solutions
No ratings yet
KEP: High Voltage Testing Solutions
32 pages
05 Germany Donau
No ratings yet
05 Germany Donau
137 pages
Databricks Tutorial Overview and Setup
100% (1)
Databricks Tutorial Overview and Setup
131 pages
2MP Outdoor Speed Dome Camera B916
No ratings yet
2MP Outdoor Speed Dome Camera B916
2 pages
Missing Files in Citra Emulator Logs
No ratings yet
Missing Files in Citra Emulator Logs
15 pages
Introduction To Windows Command Line Module Cheat Sheet
No ratings yet
Introduction To Windows Command Line Module Cheat Sheet
20 pages
Development of Malware Using Large Language Models
No ratings yet
Development of Malware Using Large Language Models
5 pages
Kongsberg AIS BS600 Base Station Overview
100% (1)
Kongsberg AIS BS600 Base Station Overview
2 pages
TD730 TD830 System Reference Manual 197704
No ratings yet
TD730 TD830 System Reference Manual 197704
105 pages
4x4 Keypad Interfacing with ATmega32
100% (3)
4x4 Keypad Interfacing with ATmega32
6 pages
ABB I-Bus® KNX: SA/S 4.16.6.2 Switch Actuator
No ratings yet
ABB I-Bus® KNX: SA/S 4.16.6.2 Switch Actuator
9 pages
Free LIMS Buyer’s Guide for Labs
No ratings yet
Free LIMS Buyer’s Guide for Labs
48 pages
CISA Practice Questions for IS Auditors
No ratings yet
CISA Practice Questions for IS Auditors
1 page
Java OOP Concepts and Error Handling
No ratings yet
Java OOP Concepts and Error Handling
143 pages
Excel 2021: Text & Conditional Functions
No ratings yet
Excel 2021: Text & Conditional Functions
16 pages
Full Stack Development
No ratings yet
Full Stack Development
10 pages
Understanding Consumer Decision Making
No ratings yet
Understanding Consumer Decision Making
7 pages
Smart Buildings Sinopoli
No ratings yet
Smart Buildings Sinopoli
22 pages
ECDIS Software Update Ver.71B Release
No ratings yet
ECDIS Software Update Ver.71B Release
89 pages
DBMS Course Syllabus and Objectives
No ratings yet
DBMS Course Syllabus and Objectives
2 pages
AIOps Framework for Autonomous Kubernetes
No ratings yet
AIOps Framework for Autonomous Kubernetes
8 pages
MTN NG Self Service Portal Home
No ratings yet
MTN NG Self Service Portal Home
91 pages
Nutanix-Files 1598838521771
No ratings yet
Nutanix-Files 1598838521771
47 pages
Azure Cloud Computing Lab Manual
No ratings yet
Azure Cloud Computing Lab Manual
47 pages
EVS XT-VIA Server Manual 20.1
No ratings yet
EVS XT-VIA Server Manual 20.1
114 pages

Query Optimization Algorithms

Uploaded by

Query Optimization Algorithms

Uploaded by

Query Processing and Optimization

11.2 Basics of Query Processing

Query Processing : Query Processing is a procedure of converting a query written in high-

11.2.1 General Strategy for Query Processing

11.2.2 Steps in Query Processing

User input General query

High level query language

Scanning, parsing, Syntax checking and verification

Check existence of relations and attributes,

Optimization to reduce execution time and

Execution plan Query modification

Query code generator Generate code for queries

Main Runtime database processor Deals with database to do necessary

FIGURE 11.1. Steps in query processing.

SQL query Query analysis

FIGURE 11.2. Steps in query decomposition.

FIGURE 11.3. Query treee notation.

 Square brackets–represent attributes retrieved from each relation.

FIGURE 11.4. Query graph notation.

FIGURE 11.5. Left-deep execution plan.

Disadvantage : The disadvantages of left-deep tree query execution plan are

FIGURE 11.6. Right-deep execution plan.

FIGURE 11.7. Linear tree execution plan.

FIGURE 11.8. Bushy execution plan.

The main advantage of bushy execution plan is the flexibility provided by it in

11.3 Query Optimization

Cost model Execution plan generator

Execution plan in form of

FIGURE 11.9. Query optimization process.

The main issues that need to be considered in query optimization are:

11.3.1 Transformation Rules for Relational Algebra

Rule 8. Commutativity of Union (∪) and Intersection (∩)

11.3.2 Heuristic Query Optimization

[Link] = ‘LUXMI PUB.’  d.Proj_ID = p.Proj_ID  e.Dept_ID = d.Dept_ID  [Link] > 35

FIGURE 11.10. Initial query tree.

e.Dept_ID = d.Dept_ID [Link] = ‘LUXMI PUB.’

[Link] > 35 Department

FIGURE 11.11. Improved query tree by first applying selection operations.

d.Proj_ID = p.Proj_ID [Link] > 35

[Link] = ‘LUXMI PUB.’ Department

— A further improvement can be done by replacing Cartesian product operations by

d.Proj_ID = p.Proj_ID [Link] > 35

FIGURE 11.13. Improved query tree by replacing Cartesian product and

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID [Link] > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

[Link] = ‘LUXMI PUB.’ Department

FIGURE 11.14. Improved query tree by applying and moving projection

[Link]. Strategies Cost

Type of Join Cost

TEST YOUR KNOWLEDGE

Fill in the Blanks

Multiple Choice Questions

Fill in the Blanks

Short Answer Questions

Long Answer Questions

You might also like

[Link] = ‘LUXMI PUB.’ d.Proj_ID = p.Proj_ID e.Dept_ID = d.Dept_ID [Link] > 35

e.Dept_ID = d.Dept_ID [Link] = ‘LUXMI PUB.’

[Link] > 35 Department

d.Proj_ID = p.Proj_ID [Link] > 35

[Link] = ‘LUXMI PUB.’ Department

d.Proj_ID = p.Proj_ID [Link] > 35

d.Dept_ID e.Emp_ID, e.Emp_Name, e.Dept_ID

d.Proj_ID = p.Proj_ID [Link] > 35

p.Proj_ID d.Dept_ID, d.Proj_ID Employee

[Link] = ‘LUXMI PUB.’ Department