QUERY PROCESSING IN A DATABASE
Query Processing: This is a series of steps that a database management system follows to execute a
query.
The main reason for going through these steps is to transform a high level query written in SQL
into an efficient execution plan that retrieves the required data from the database.
Here’s a breakdown of the key phases:
1. Query parsing and Translation
This is done to ensure the query is syntactically and semantically correct.
Steps:
Syntax Analysis: The query is parsed to check for grammatical correctness
Semantic Analysis: Ensures the query is meaningful and valid
Translation: Translates the SQL query into an internal, intermediate representation such as a
query tree or a query graph
Output: A parsed query that is ready for optimization
Here’s a practical example of how query parsing works (for syntax and semantic analysis):
a) Understanding the query’s syntax. (Checks to see if the syntax is correct)
The parser identifies the structure:
● SELECT clause (columns to retrieve: name, age)
● FROM clause (table: users)
● WHERE clause (filter: age > 30)
Extracting Components:
● Action: Retrieve data.
● Target: Table users.
● Condition: Filter rows where age > 30.
Semantic Analysis: After parsing the data, the system ensures the query makes
sense in terms of database schema. For instance:
● Are name and age valid columns in the users table?
● Does the users table exist?
The system can also perform the shared pool check. If the shared pool check is successful, the
system skips query parsing and optimization checks
QUERY PROCESSING IN A DATABASE
Shared pool: The shared pool is an area in the memory of the database where the database
stores information that is frequently accessed such as:
● Recently executed SQL queries
● Execution plans
● Metadata (Data about tables and columns)
When you execute a query, the database does the following:
● Checks the shared pool to see if an identical query has been run before.
● If it has, the system will skip parsing and optimization steps and reuses the existing
execution plan.
● If it hasn’t, the database will parse, optimize and execute it.
2. Query Optimization
Improve query performance by selecting the most efficient execution plan.
Types of Query Optimization
Logical Optimization:
Reorganizes the query without changing its result.
Example: Pushing down selection predicates (WHERE clause) to filter rows earlier.
Physical Optimization:
Determines the best physical methods to execute the query (e.g., choosing between an index
scan or a full table scan).
Based on factors like:
Index availability.
Statistics on table size, data distribution, etc.
System resources like memory or disk I/O.
3. Query Plan Generation
Enumeration of Plans:
● The database optimizer generates candidate execution plans.
● These plans define the sequence of operations (e.g., table scans, index scans, joins) that
can be used to execute the query.
● For instance, if a query involves joining two tables, the optimizer considers different join
methods (e.g., nested loops, hash joins, merge joins) and their possible orders.
QUERY PROCESSING IN A DATABASE
Cost Estimation:
● Each candidate plan is evaluated for cost, which includes factors such as:
o Execution time: How long the query will take to complete.
o Disk I/O: The amount of data read/written to the disk.
o Memory usage: How much RAM is required during execution.
● The optimizer relies on statistics about the data (e.g., table sizes, index usage, data
distribution) to make these estimates.
● Example: A full table scan may have a higher cost than an index scan if the table is large.
Selection of Optimal Plan:
● After calculating costs, the optimizer chooses the plan with the lowest estimated cost.
● This ensures that the database executes the query in the most efficient way, saving time
and system resources.
● Note: The selected plan may not always be perfect because it depends on the accuracy
of the data statistics.
4. Query Execution
Purpose:
The goal of this step is to execute the selected query plan efficiently and accurately to produce
the desired result set.
Execution by the Query Execution Engine:
● The query execution engine is the component responsible for carrying out the
instructions in the chosen query plan.
● It acts as a bridge between the logical query plan and the physical database, ensuring
the operations are performed in the correct order.
Performing Operations:
● The engine executes operations specified in the query plan, such as:
o Joins: Combining rows from two or more tables based on specified
conditions.
o Filters (Selection): Retrieving only the rows that satisfy a given condition.
o Projections: Selecting specific columns from the result set.
o Aggregations: Performing calculations like sums, averages, or counts on
groups of data.
Data Retrieval:
● The execution engine retrieves data as needed from:
QUERY PROCESSING IN A DATABASE
● Disk storage: If the required data is not in memory, it is fetched from disk using
methods like table scans or index scans.
● In-memory cache: If the data has already been cached, retrieval is faster. This
reduces disk I/O and speeds up query execution.
Above is a detailed diagram of Query Processing in a Database