0% found this document useful (0 votes)
7 views39 pages

Query Processing and Optimization Techniques

Uploaded by

hibap16493
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views39 pages

Query Processing and Optimization Techniques

Uploaded by

hibap16493
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 2

Query Processing and


Optimization
Outline
 Query processing steps

 Query Optimization:
 Heuristic based query optimization

 Cost based query optimization


 Selection and Sorting operation optimization

Advanced database-Ch2: Query Processing & Optimization Slide 2


Query Processing
 Activities involved in retrieving data from
the database.
 Aims of QP:
 transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
 execute strategy to retrieve required data.

Advanced database-Ch2: Query Processing & Optimization Slide 3


Query Optimization
 Activity of choosing an efficient execution
strategy for processing query.
 As there are many equivalent transformations
of same high-level query, aim of QO is to
choose one that minimizes resource usage.
 Generally, reduce total execution time of
query. May also reduce response time of
query.
 Problem: computationally intractable with
large number of relations, so strategy
adopted is reduced to finding near optimum
solution.
Advanced database-Ch2: Query Processing & Optimization Slide 4
Basic Steps in Query Processing
 Is a 3-step process that transforms a high-
level query (of relational calculus/SQL) into
an equivalent and more efficient lower-
level query (of relational algebra).
1. Parsing and translation
 Check syntax and verify relations.
 Translate the query into an equivalent
relational algebra (RA)expression.
2. Optimization
 Generate an optimal evaluation plan
(with lowest cost) for the query plan.
Advanced database-Ch2: Query Processing & Optimization Slide 5
Basic Steps in Query Processing …
 Evaluation
 The query-execution engine takes an (optimal)
evaluation plan, executes that plan, and
returns the answers to the query.

Advanced database-Ch2: Query Processing & Optimization Slide 6


Basic Steps in Query Processing …

Advanced database-Ch2: Query Processing & Optimization Slide 7


Basic Steps in Query Processing …

Advanced database-Ch2: Query Processing & Optimization Slide 8


Example: SQL To RA Query
 Find students GPA and address of
students whose GPA>3.5

Advanced database-Ch2: Query Processing & Optimization Slide 9


Bad query processing plan example
 Cartesian product first:
 Student x People
 Selection criteria next:
 Gpa>3.5 AND sname=pname
 GROUP BY; HAVING (if available)
 Projections
 SELECT gpa,address
 ORDER BY last
Incredibly inefficient with huge
intermediate results

Advanced database-Ch2: Query Processing & Optimization Slide 10


Query Optimization
• Aim: Transform query into faster, equivalent query

equivalent query 1
equivalent query 2 faste
query
rquer
y
equivalent query n

• Heuristic (logical) optimization


 Query tree (relational algebra) optimization
 Query graph optimization

• Cost-based (physical) optimization


Advanced database-Ch2: Query Processing & Optimization Slide 11
Query Optimization Heuristics
 Apply heuristic rules on standard initial
query tree to find optimized equivalent
query tree
 Main heuristic: Favor operations that
reduce the size of intermediate results first
 Apply SELECT and PROJECT operations before
join or other set operations
 Apply more selective SELECT and join first
 General transformation rules for relational
algebra operators

Advanced database-Ch2: Query Processing & Optimization Slide 12


Some Transformation Rules
• Cascade of σ: σc1 ∧c2 ∧... ∧cn(R) = σc1(σc2(...(σ cn(R))...))
• Commutativity of σ: σc1(σc2(R)) = σc2(σ c1(R))
• Commuting σ with π: πL(σc (R)) = σc(πL(R))
 Only if c involves solely attributes in L.
• Commuting σ with : σc (R S) = σc (R) S
 Only if c involves solely attributes in R.
• Commuting σ with set operations: σc(R θ S) = σc(R) θ
σc(S)
 Where θ is one of ∪, ∩, or
-.
• Commutativity of ∪, ∩, and :RθS=SθR
 Where θ is one of ∪, ∩
and .
• Associativity of , ∪, ∩: (R θ S) θ T = R θ (S θ T)
Advanced database-Ch2: Query Processing & Optimization Slide 13
Transformation Algorithm Outline
• Transform a query represented in relational algebra to
an equivalent one (generates the same result.)

• Step 1: Decompose σ operations.


• Step 2: Move σ as far down the query tree as possible.
• Step 3: Rearrange leaf nodes to apply the most
restrictive σ operations first.
• Step 4: Form joins from × and subsequent σ operations.
• Step 5: Decompose π and move down the query tree as
far as possible.
• Step 6: Identify candidates for combined operations.

Advanced database-Ch2: Query Processing & Optimization Slide 14


Heuristic Query Optimization Summary
• Heuristic optimization transforms the query-tree by
using a set of rules (Heuristics) that typically (but not
in all cases) improve execution performance.
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of attributes)
 Perform most restrictive selection and join operations (i.e.
with smallest result size) before other similar operations.

• Generate initial query tree from SQL statement.


• Transform query tree into more efficient query tree, via
a series of tree modifications, each of which hopefully
reduces the execution time.
• A single query tree is involved.
Advanced database-Ch2: Query Processing & Optimization Slide 15
Query Tree (plan)
 A tree data structure that corresponds to a
relational algebra expression
 Leaf nodes = input relations
 Internal nodes = RA operations
 Execution of query tree
 Start at the leaf nodes
 Execute internal node whenever its operands are
available and replace node by result

Advanced database-Ch2: Query Processing & Optimization Slide 16


Query Tree Optimization Example
• What are the names of customers living on Elm Street
who have checked out “Terminator”?
• SQL query:
SELECT Name
FROM Customer CU, CheckedOut CH, Film F
WHERE Title = ’Terminator’ AND [Link] = [Link]
AND [Link] = [Link] AND [Link] = ‘Elm’

• Canonical query tree πName


σTitle= ‘Terminator’∧[Link] = [Link] ∧[Link] = [Link] ∧[Link] = ‘Elm’

×
× Note the use of
F
Cartesian product!
CU CH
Advanced database-Ch2: Query Processing & Optimization Slide 17
Apply Selections Early
πName

σ [Link] = [Link]

σ[Link] = [Link] σTitle= ‘Terminator’

× F

σStreet = ‘Elm’ CH

CU
Advanced database-Ch2: Query Processing & Optimization Slide 18
Apply More Restrictive Selections Early
πName

σ [Link] = [Link]

σ [Link] = [Link] σStreet= ‘Elm’

× CU

σTitle= ‘Terminator’ CH

F
Advanced database-Ch2: Query Processing & Optimization Slide 19
Form Joins
πName

[Link] =
[Link]

[Link] = [Link] σStreet= ‘Elm’

σTitle= ‘Terminator’ CH CU

Advanced database-Ch2: Query Processing & Optimization Slide 20


Apply Projections Early
πName

[Link] =
[Link]

[Link] = [Link] πName, CustomerID

πFilmID πFilmID, CustomerID σStreet= ‘Elm’

σTitle= ‘Terminator’ CU
CH

F
Advanced database-Ch2: Query Processing & Optimization Slide 21
Cost-Based Optimization
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost

Note: Different database systems may


focus on different cost components

Advanced database-Ch2: Query Processing & Optimization Slide 22


Relevant Statistics
• Catalog Information Used in Cost Functions:
• Information about the size of a file
• number of records (tuples) (r),
• record size (R),
• number of blocks (b)
• blocking factor (bfr)
• Information about indexes and indexing attributes of
a file
• Number of levels (x) of each multilevel index
• Number of first-level index blocks (bI1)
• Number of distinct values (d) of an attribute
• Selectivity (sl) of an attribute
• Selection cardinality (s) of an attribute. (s = sl * r)

Advanced database-Ch2: Query Processing & Optimization Slide 23


Selection Queries
• Primary key, point
σFilmID= 2 (Film)
• Point
σTitle= ‘Terminator’ (Film)
• Range
σ1 < RentalPrice< 4 (Film)
• Conjunction
σType= ‘M’∧Distributor= ‘MGM’ (Film)
• Disjunction
σPubDate< 2004 ∨Distributor= ‘MGM’ (Film)

Advanced database-Ch2: Query Processing & Optimization Slide 24


Selection Strategies
• Linear search
• Expensive, but always applicable.
• Binary search
• Applicable only when the file is appropriately ordered.
• Hash index search
• Single record retrieval; does not work for range queries.

• Clustering index search


• Multiple records for each index item.
• Implemented with single pointer to block with first
associated record.
• Secondary index search
• Implemented with dense pointers, each to a single record.
Advanced database-Ch2: Query Processing & Optimization Slide 25
Selection Strategies for Conjunctive Queries
• Use any available indices for attributes involved in
simple conditions.
• If several are available, use the most selective index. Then
check each record with respect to the remaining conditions.
• Attempt to use composite indices.
• This can be very efficient.
• Do intersection of record pointers.
• If several indices with record pointers are applicable to the
selection predicate, retrieve and intersect the pointers. Then
retrieve (and check) the qualifying records.
• Disjunctive queries provide little opportunity for smart
processing.
Advanced database-Ch2: Query Processing & Optimization Slide 26
Examples of Cost Functions for SELECT

S1. Linear search (brute force) approach


• CS1a = b;
• For an equality condition on a key, CS1a = (b/2) if
the record is found; otherwise CS1a = b.
S2. Binary search:
• CS2 = log2b + (s/bfr) –1
• For an equality condition on a unique (key)
attribute, CS2 =log2b
S3. Using a primary index (S3a) or hash key
(S3b) to retrieve a single record
• CS3a = x + 1; CS3b = 1

Advanced database-Ch2: Query Processing & Optimization Slide 27


Joins
• Join Strategies
 Nested loop join
 Index-based join
 Sort-merge join

• Strategies work on a per block (not per record) basis.


 Need to estimate #I/Os (block retrievals)

• Relation sizes and join selectivities impact join cost.


 Query selectivity = #tuples in result / #candidates

‹ ‘More selective’ means smaller ‘selectivity value’
„ For join, #candidates is the size of Cartesian product

Advanced database-Ch2: Query Processing & Optimization Slide 28


Nested Loop Join and Index-Based Join
• Nested loop join
• Exhaustive comparison (i.e., brute force approach)
• The ordering (outer/inner) of files and allocation of buffer
space is important.

• Index-based join
• Requires (at least) one index on a join attribute.
• At times, a temporary index is created for the purpose of a
join.
• The ordering (outer/inner) of files is important.

Advanced database-Ch2: Query Processing & Optimization Slide 29


Nested Loop
• Basically, for each block of the outer table (r), scan the
entire inner table (s).
• Requires quadratic time, O(n ) 2

• Improved when buffer is used.


Outer Main Memory
Buffer
r

spill Output
Scan Inner when full

Advanced database-Ch2: Query Processing & Optimization Slide 30


Example of Nested-Loop Join
Customer [Link]= [Link]
CheckedOut
• Parameters
rCheckedOut = rCustomer = 200
40,000
bCheckedOut = bCustomer = 10
2,000
nB = 6 (size of main memory buffer)
• Algorithm:
repeat: read (nB - 2) blocks from outer relation
repeat: read 1 block from inner relation
compare tuples
• Cost: bouter + ( ⎡bouter/ (nB -2)⎤ ) × binner
• CheckedOut as outer: 2,000 + ⎡2,000/4⎤ × 10 =
7,000
• Customer as outer: 10 + ⎡10/4⎤ × = 5,010
Advanced database-Ch2: Query Processing & Optimization
2,000 Slide 31
Index-based Join
• Requires (at least) one index on a join attribute
 A temporary index can be created

for each index on r


record of
s, query
r
Scan Outer in index
if joins
s
Output
spill
when full

Main Memory

Advanced database-Ch2: Query Processing & Optimization Slide 32


Example of Index-Based Join
Customer [Link]= [Link]
• Cost:CheckedOut
bouter + router × cost use of index
• Assume that the video store has 10 employees.
 There are 10 distinct EmpIDs in
CheckedOut.
• Assume 1-level index on CustomerID of Customer.
• Iterate through all 40,000 tuples in CheckedOut
(outer
rel.)„ 2,000 disk reads (bCheckedOut) to scan CheckedOut
 For each CheckedOut tuple, search for matching
Customer
tuples
 using
‹ 0 disk index.
reads for index (in main memory) + 1 disk read for actual data
block
• Cost: 2,000 + 40,000 × (0 + 1) = 42,000
Advanced database-Ch2: Query Processing & Optimization Slide 33
Sort-Merge Join
• Sort each relation using multiway merge-sort
• Perform merge-join
Scan

sorted
Main Memory
r1 r
•••
unsorted r2 sorted ••• Match
r ••• r •••
Scan •••
rn
sorted
s
Output

Advanced database-Ch2: Query Processing & Optimization Slide 34


External or Disk-based Sorting
• Relation on disk often too large to fit into memory
• Sort in pieces, called runs
Main Memory r1
Buffer (N blocks)
Scan Output r2
when sorted
r •••
rMdivN

Unsorted relation rlast


(M blocks)
Runs
(M div N runs each of
size N blocks, and maybe one
last run of < N leftover blocks)
Advanced database-Ch2: Query Processing & Optimization Slide 35
External or Disk-based Sorting, Cont.
• Runs are now repeatedly merged
• One memory buffer used to collect output
Main Memory Main Memory
m1
r1 Buffer (N blocks)
Buffer (N blocks)
Output m2
r2
when merged
••• •••
rN-1 mlast-1
•••
rMdivN output
mlast output

Merged runs
rlast ((M div N) div N-1) runs each of
sorted runs size N*N-1 blocks, and maybe one
(N blocks each) last run of < N*N-1 leftover blocks)
Advanced database-Ch2: Query Processing & Optimization Slide 36
External Sorting (Multiway Merge Sort)
• Buffer size is nB = 4 (N)
Orginal rel.
32 5 1712 1 14 8 3 313023 2629 2 2511 6 7 1516 28 4 9 221821 102013 272419

Initial runs
5 121732 1 3 8 14 23263031 2 11 2529 6 7 1516 4 9 2228 101820 21 13192427

First merge
1 3 5 8 12141723263031 32 2 4 6 7 9 11 151622252829 1013181920202427

Second merge
1 2 3 4 5 6 7 8 9 1011 121314151617181920212223242526272829303132

• Cost: 2 × brelation + 2 × brelation × ⎡lognB-1 (brelation/nB)⎤


• 2 × 32 + 2 × 32 × log3(32/4) = 192
Advanced database-Ch2: Query Processing & Optimization Slide 37
Example of Sort-Merge Join
• Cost to sort CheckedOut (bCheckedOut=
 CostSortChecedOut = 2 × 2,000 + 2 × 2,000 × ⎡log5(2,000/6)⎤ =
2000)
to sort Customer relation (bCustomer= 10)
20,000
• Cost
 CostSortCustomer= 2 × 10 + 2 × 10 × ⎡log5(10/6)⎤ =
40 for merge join
• Cost
 Cost to scan sorted Customer + cost to scan sorted
CheckedOut
 Costmergejoin= 10 + 2,000 = 2,010
• Costsort-merge join = CostSortCustomer + CostSortChecedOut + Costmergejoin
• Costsort-merge join= 20,000 + 40 + 2,010 = 22,050

Advanced database-Ch2: Query Processing & Optimization Slide 38


Cost and Applicability of Join Strategies
• Nested-loop join
 Brute-force
 Can handle all types of joins (=, <, >)
• Index-based join
 Requires minimum one index on join attributes
• Sort-merge join
 Requires that the files are sorted on the join attributes.
 Sorting can be done for the purpose of the join.
 A variation is also applicable when secondary indices are
available instead.

Advanced database-Ch2: Query Processing & Optimization Slide 39

You might also like