0% found this document useful (0 votes)

14 views15 pages

SQL Performance Optimization Tips

The document outlines various SQL performance optimization techniques, including effective use of indexes, avoiding SELECT *, optimizing JOIN operations, and minimizing subqueries. It emphasizes the importance of retrieving only necessary data, utilizing stored procedures, and normalizing database tables for better query performance. Additionally, it recommends monitoring query performance, leveraging cloud-specific features, and using efficient methods like EXISTS() over COUNT() for checking record existence.

Uploaded by

mrs.siam2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

SQL Performance Optimization Tips

Uploaded by

mrs.siam2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SQL Performance Optimization Techniques

1. Use indexes effectively in relational databases like MySQL and Postgres

I like to think of indexes as primary keys and mapping tables in SQL. In wide data
tables, you often see identification codes or integers that map to another data table.
This is an effective way of storing data because it allows you to easily query the wide
table and join values back to another table—essentially, this provides more details
on a data row. You also see indexes in the form of primary keys that allow you to
select a unique row.

These two indexing methods help you locate data in tables more quickly. Because
indexes store data in one or more columns, you can easily find one value or even a
range of values. For example, if using a WHERE clause in a query, an index prevents
you from having to scan the entire table. Instead, you can simply look for a match
condition. This ends up saving you a lot of time if you’re performing these types of
queries often.

Keep in mind that cloud data warehouses like Redshift and Snowflake are columnar
and don’t have indexes like relational databases. They automatically partition data
based on the distribution of data during the load time. Here, I recommend loading
the data in a sorted order that you query often. You can also override the partition,
causing the database to re-cluster and distribute the data accordingly.

There are three main types of indexes:

Clustered indexes

Clustered indexes physically order the columns based on their actual value. You
only want to use clustered indexes when your column values are in sequential or
sorted order and there are no repeat values. This is because the index is ordering
them based on the actual value within the column itself. Once this index is created,
it will then point to the row that contains the data—not the data itself. Primary keys
are a form of clustered indexes.
Source: [Link]

Non-clustered indexes

Non-clustered indexes create two separate columns—one for the index and the other
that points to the value. This type of index is typically used for mapping tables or
even any type of glossary. You have certain column values that point to a specific
location. Unlike clustered indexes, the index points directly to the data.

If you’re choosing between these two indexes, clustered indexes are the way to go.
They are faster and require less memory to run since they don’t exist in two separate
locations. This practice optimizes your cloud data warehouse performance.

Full-text indexes

There are also full-text indexes, which are rarer, but allow you to search through
columns with a lot of text, like those that hold article or email content. This type of
index stores the position of the terms found in the indexed field, making it much
easier to find.
2. Avoid SELECT * and retrieve only necessary columns

Believe it or not, when exploring different datasets in your data warehouse, you
don’t want to use a SELECT *. These types of queries are actually quite inefficient
because you are choosing to view all fields in a dataset rather than just the ones you
are interested in.

When writing your queries within a data model, be sure to leave out columns that
will never be used by data analysts or business users. If you are writing a query for
reporting purposes, only include the columns the business users want to look at.
When working to prevent confusion and optimize run-time, less is always better!

Selecting only the specific fields you want or need to view will keep your models
and reports clean and easy to navigate. Here’s an example of what that could look
like:
× SELECT * FROM customers.customer_profiles →

√ SELECT customer_name, customer_email, customer_phone FROM

customers.customer_profiles

3. Optimize JOIN operations

Joins can make or break complex queries. It’s imperative that you know the
difference between an inner join, outer join, left join, and right join. Using the wrong
join can create duplicates in your dataset and slow it down massively.

Outer join

I recommend only using an outer join if you have a very specific use case that can’t
be solved otherwise. Outer joins returned matched and unmatched rows
from both of the tables that you are joining. It essentially returns everything from
both datasets in one dataset, which in my opinion basically defeats the purpose of a
join. Outer joins produce a lot of duplicates and return pieces of data you probably
don’t need, making it inefficient.

Inner join

Inner joins return only the matching records from the two tables that you are joining.
This is almost always preferred over an outer join.

Left and right joins

Left and right joins return all records from one table and only matching records from
the table being joined. For left joins, the resulting query would contain all values in
the first table and only the matching tables in the second. For right joins, it is the
opposite—the resulting query would contain all values from the second table and
only the matching records from the first table.
I recommend always choosing a left join over a right. In order to make this work,
simply change the order you are joining your tables. Left joins are a lot easier to
read and understand as compared to right joins—making this type of join better
for data governance, reliability, and data quality.

Lastly, with joins, make sure you are joining the two tables on a common field. If
you are selecting a field that doesn’t exist to join the tables together, you may get
an extremely long-running query that will end up wasting your time and money. I
recommend verifying that joins are utlilizing primary and foreign keys that exist
between two tables.

Here’s an example of a left join:

Also, don’t be afraid to join on more than one column if need be. Sometimes this
can help reduce resulting duplicates and speed up run-time in the case of multiple
records with the same joining field.

Here’s an example with an inner join:

4. Minimize the use of subqueries

Subqueries are very hard for anyone to read and understand. Instead of using
subqueries, especially in complex models or reporting, opt for CTEs instead. CTE
stands for common table expression and separates your code into a few smaller
queries rather than one big query.

CTEs make it easy for anyone reading through your code to understand it. As an
added bonus, theyalso simplify the debugging process. Rather than having to pull
each subquery out into its own query and debug at each stage, you can simply select
from each CTE and validate as you go. Here’s an example:

Before Optimization:

After Optimization:

As you can see, the CTE is a little bit longer, but it’s much easier to understand.
Now, any reviewer can analyze each smaller piece of the query and easily relate each
component back to one another.
5. Avoid redundant or unnecessary data retrieval

When exploring datasets, developing reports or models, and validating data, it is

important to only retrieve the data that you need. This way you aren’t spending
money or utilizing compute resources on data that you don’t need. As we mentioned
before, it’s important to only select the necessary columns rather than SELECT *.
However, it is also important to limit the number of rows you are returning, not just
columns. With relational databases like MySQL and Postgres, they slow down when
the number of rows increase.

You can use LIMIT to reduce the number of rows returned. You will typically see
SQL editors like dbeaver set a feature to limit the data return to 100 or 200. This
built-in feature prevents you from unknowingly returning thousands of rows of data
when you just want to look at a few.

These functions are particularly useful for validation queries or looking at the
resulting output of a transformation you’ve been working on. They are good for
experimentation and learning more about how your code operates. However, these
types of functions are not good to use in automated data models where you will want
to return all of the data.

To use LIMIT:

This will only return 100 rows, even if you have more than 100 customers.

You can also add an OFFSET clause to your LIMIT functions if you don’t want to
return the first 100 rows, but want to skip some first. If you wanted to skip the first
20 rows and select the 100 customers after that, you would write:
While these queries help to limit data, cloud data platforms also help to reduce
impact of redundant queries by leveraging caches. You can also take advantage of
temporary tables in cloud platforms to store repeat queries—just remember to delete
them when you are finished using them!

6. Utilize stored procedures

Stored procedures are database objects that contain lines of SQL code. They can be
reused and automated in order to speed up development time and simplify logic.
You can think of them as “functions”, like those that exist in Python and Javascript
that can be saved and applied in your code environment.

Stored procedures improve performance in your cloud database because they

compile and cache code, allowing increased speed with frequently used queries.
They also simplify a lot of processes for developers by existing as a reusable piece
of code. Developers don’t have to worry about writing the same piece of code over
and over again. Instead, they can utilize SQL functions that already exist in the form
of a stored procedure.

You can create a stored procedure like so:

Later you can execute the stored procedure like below:

You can also pass parameters into stored procedures by specifying the column name
and datatype.

Simply include the column_name that is going to be the parameter using an @ sign
and the data type you want it to be passed through. Then, to execute it, you again
specify the parameter and its value.

This allows you to really customize your stored procedure for your specific use case
while still reusing code that’s already been written and automated.

7. Consider partitioning and sharding for MySQL and Postgres

Partitioning and sharding are two techniques you can use to spread the distribution of data
in the cloud.
With partitioning, you divide one large table into multiple smaller tables, each with its own
partition key. Partition keys are typically based on the timestamps of when rows were
created or even on the integer values they contain. When you execute a query on this table,
the server will automatically route you to the partitioned table appropriate for your query.
This improves performance because, rather than searching the entire table, it is only
searching a small part of it.
Sharding is quite similar except, instead of splitting one big table into smaller tables, it’s
splitting one big database into smaller databases. Each of these databases is on a different
server. Instead of a partition key, there is a sharding key that redirects queries to be run on
the appropriate database. Sharding is known to increase processing speeds because the load
is split across different servers—both working at the same time. It also makes databases
more available and reliable due to the fact that they are completely independent of one
another. If one database goes down, it doesn’t affect the other ones.
Keep in mind that modern cloud data platforms do this automatically when you define the
partition key and distribution type on load. AWS also offers a relational database product
called Aurora which automates partitioning and sharding.

8. Normalize database tables

The premise of normalization is to make sure the values in your database tables are
easy to locate and query. Normalization at the layer closest to your raw data is
important so that you can easily query the values downstream. We cover this idea
more in our article on data modeling.

First Normal Form (1NF)

I often run into issues with JSON objects in my raw data. Parsing these JSON objects
is a type of normalization that ensures no nested objects exist in your data. This is
actually called First Normal Form (1NF). With this form of normalization, values
must exist as atomic values (values that can’t be broken down into smaller values)
and each row must have a primary key.

Second Normal Form (2NF)

Second Normal Form (2NF) is a different type of normalization that requires fields
with multiple values to be broken into their own rows. This allows you to easily
access each value stored in a field because it now exists dependently on the primary
key but in a different row.

Third Normal Form (3NF)

Third Normal Form (3NF) is another type of normalization. Second normal form is
actually a prerequisite to this type of normalization. So, if using this form, make sure
you follow the steps for 2NF first. Then, you want to look at your table and see if
any column values are dependent on one another. For example, if you have a
customers table with the customer name, phone number, state, and zip code, zip code
is dependent on state. You can further break these values down into another table.
Zipcode and state would exist in their own table while customer name, phone
number, and state exist in another.

If you remember the transitive property from high school geometry class, this is
essentially what that does. It says “Hey, if this column depends on that column for
its value, then that value can be moved to a separate table”.

While there is also fourth normal form and fifth normal form, these normalization
techniques are less popular and not needed for the scope of this article.

9. Monitor query performance

Monitoring SQL query performance is key when trying to optimize your SQL
queries. If you don’t ever look at the run-time of your queries, you will never know
which ones are taking the longest! This is key to determining which ones need to be
optimized and will have the most cost savings for you in your cloud database.

One tool to optimize performance is query profiling. This allows you to pinpoint the
source of performance issues by looking at statistics such as runtime and rows
returned. Query profiling also includes query execution plans which give you insight
into what code is running in what order before it runs. To optimize query
performance you can also look at database logs, the server itself, and any external
applications connected to your cloud database.

10. Use UNION ALL instead of UNION

UNION is an operator used to join the outputs of two SQL queries. It comes in handy
when you need to combine two datasets that have the same column names. However,
it is important that you understand the difference between the two UNION operators-
UNION and UNION ALL.
UNION joins all of the rows from Table A with all of the rows from Table B. No
deduplication occurs. However, UNION ALL joins all of the rows from Table A
with all of the rows from Table B and then deduplicates rows that contain the same
values. If you don’t care about duplicates, UNION is going to save you a lot of
processing time compared to UNION ALL. I typically always opt for UNION
because, even if there are duplicates, I would want to know about them and take the
time to understand why that is happening.

11. Optimize subquery performance

While I don’t recommend using subqueries when trying to optimize performance,

sometimes they are fast and handy when doing a quick and dirty analysis. If you
need to do something like check whether values exist in another table or sql
subquery, it’s best to use an EXISTS statement over an IN statement.

EXISTS returns a Boolean value, quickly comparing values and moving on to the
next when a value is not present. IN compares every value since it returns the value
itself, slowing down the processing time of the query. However, IN is more efficient
to use than something like an OR statement which scans a table for multiple different
conditions.

Instead of this…

Use this….
In this case, it is much more efficient to use the IN clause rather than the OR.
However, in the following example, it makes more sense to use an EXISTS rather
than an OR because two different tables are being compared.

Instead of this…

Do this

This will return all of the rows that prove true rather than scanning and comparing
every value like with an IN clause.

12. Leverage cloud database-specific features

One of the many benefits of using a cloud database is the database-specific features
that come with it. Snowflake, for example, has a ton of SQL functions specific to
Snowflake that make creating transformations easier. These include functions for
parsing JSON values and working with different data type. Check with your cloud
provider to see if they have specific optimizations they recommend.
13. Avoid DISTINCT in SELECT query

SELECT DISTINCT is a simple way of removing duplicates from a database.

SELECT DISTINCT works to generate distinct outcomes by using the GROUP
BY clause, which groups all the fields in the query. However, a large amount of
processing power is required to do this. So, avoid DISTINCT in SELECT queries.

Inefficient

Efficient

Unduplicated records are returned without using SELECT DISTINCT by adding

more fields.

14. To check the existence of records, use EXISTS() rather than COUNT()

Both EXISTS() and COUNT() methods can be used to check the existence of a
record entry in the table. The EXISTS() method is more effective as it exits
processing as soon as it finds the first entry of the record in the table. The COUNT()
method would scan the entire table to return the number of records in the table that
match the provided constraint.

Inefficient
Efficient

15. Use WHERE instead of HAVING

The HAVING clause filters the rows after all the rows are selected. It works just
like a filter. Do not apply the HAVING clause for any other purpose.

HAVING statements are determined in the SQL operating order

after WHERE statements. Therefore, it is quicker to execute the WHERE query.

Inefficient

Efficient

Prepared By-
Md. Al-Hasan, Assistant Professor, CSE, BAUST
Credit: Online Platform ([Link] and [Link]

SQL Query Optimization Tips for Cloud Databases
No ratings yet
SQL Query Optimization Tips for Cloud Databases
9 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
16 pages
SQL Query Optimization: 6 Essential Tips
No ratings yet
SQL Query Optimization: 6 Essential Tips
4 pages
Effective SQL Optimization Techniques
No ratings yet
Effective SQL Optimization Techniques
5 pages
Optimize SQL Queries for Better Performance
No ratings yet
Optimize SQL Queries for Better Performance
15 pages
SQL Performance Tuning Techniques
No ratings yet
SQL Performance Tuning Techniques
6 pages
5 SQL Query Optimization Tips
No ratings yet
5 SQL Query Optimization Tips
4 pages
Database Indexing
No ratings yet
Database Indexing
6 pages
SQL Database Design
No ratings yet
SQL Database Design
10 pages
Understanding SQL: Basics and Joins
No ratings yet
Understanding SQL: Basics and Joins
13 pages
Database Coding Best Practices
No ratings yet
Database Coding Best Practices
8 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
9 pages
Essential SQL Optimization Techniques
No ratings yet
Essential SQL Optimization Techniques
5 pages
Advanced Database Design Techniques
No ratings yet
Advanced Database Design Techniques
8 pages
Tuning MySQL Stored Procedures for Speed
No ratings yet
Tuning MySQL Stored Procedures for Speed
8 pages
SQL Database Tuning Techniques Guide
No ratings yet
SQL Database Tuning Techniques Guide
17 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
17 pages
SQL Server Query Optimization Tips
No ratings yet
SQL Server Query Optimization Tips
9 pages
Database Performance Optimization Guide
100% (1)
Database Performance Optimization Guide
26 pages
Teradata SQL Performance Tuning Guide
No ratings yet
Teradata SQL Performance Tuning Guide
3 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
15 pages
SQL Query Performance Optimization Tips
No ratings yet
SQL Query Performance Optimization Tips
94 pages
Effective SQL Tuning Techniques
100% (1)
Effective SQL Tuning Techniques
26 pages
Op Tim Ization
No ratings yet
Op Tim Ization
3 pages
PostgreSQL Index Types and Optimization
No ratings yet
PostgreSQL Index Types and Optimization
23 pages
Database Indexing & Query Optimization
No ratings yet
Database Indexing & Query Optimization
4 pages
13 - Performance Best Practices
No ratings yet
13 - Performance Best Practices
5 pages
13 - Performance Best Practices
No ratings yet
13 - Performance Best Practices
5 pages
Execution Plan, Query Optimisation, Indexing
No ratings yet
Execution Plan, Query Optimisation, Indexing
3 pages
Optimize SQL Queries: Advanced Techniques
No ratings yet
Optimize SQL Queries: Advanced Techniques
11 pages
Mastering SQL: Database Essentials Guide
No ratings yet
Mastering SQL: Database Essentials Guide
7 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
13 pages
SQL Data Warehousing and Query Optimization
No ratings yet
SQL Data Warehousing and Query Optimization
22 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
60 pages
SQL 2
No ratings yet
SQL 2
13 pages
Database Fundamentals and Design Guide With Join Diagrams
No ratings yet
Database Fundamentals and Design Guide With Join Diagrams
8 pages
SQL Server Coding Standards Guide
No ratings yet
SQL Server Coding Standards Guide
5 pages
Advanced Database Concepts Explained
No ratings yet
Advanced Database Concepts Explained
6 pages
SQL Server Programming Essentials
No ratings yet
SQL Server Programming Essentials
36 pages
Advanced SQL Techniques for Efficiency
No ratings yet
Advanced SQL Techniques for Efficiency
12 pages
IderaWP 7IndexingTipsToImproveSQLServerPerformance PDF
No ratings yet
IderaWP 7IndexingTipsToImproveSQLServerPerformance PDF
8 pages
Advanced SQL Techniques for AS400
100% (1)
Advanced SQL Techniques for AS400
44 pages
SQL Query Optimization Techniques
No ratings yet
SQL Query Optimization Techniques
1 page
SQL Server Query Optimization Tips
No ratings yet
SQL Server Query Optimization Tips
4 pages
SQL Server Query Optimization Tips
No ratings yet
SQL Server Query Optimization Tips
4 pages
MySQL Indexing and Query Best Practices
No ratings yet
MySQL Indexing and Query Best Practices
3 pages
Important Question
No ratings yet
Important Question
45 pages
Database Performance Optimization Techniques
No ratings yet
Database Performance Optimization Techniques
15 pages
Understanding Joins, Indexes, and Views in Databases
No ratings yet
Understanding Joins, Indexes, and Views in Databases
5 pages
Couchbase Server 4.5 Features Overview
No ratings yet
Couchbase Server 4.5 Features Overview
7 pages
SQL Query Performance Optimization Guide
No ratings yet
SQL Query Performance Optimization Guide
6 pages
Advanced DDL for Database Management
No ratings yet
Advanced DDL for Database Management
8 pages
SQL Query Design Patterns Explained
No ratings yet
SQL Query Design Patterns Explained
80 pages
SQL Performance Tuning & Security Guide
No ratings yet
SQL Performance Tuning & Security Guide
11 pages
SQL Optimization Tips for Snowflake Queries
No ratings yet
SQL Optimization Tips for Snowflake Queries
13 pages
SQL Performance Concise
No ratings yet
SQL Performance Concise
8 pages
Introduction to SQL Basics and Usage
No ratings yet
Introduction to SQL Basics and Usage
57 pages
Understanding Apache Pig Architecture
No ratings yet
Understanding Apache Pig Architecture
123 pages
HCL Placement Test Sample Questions
No ratings yet
HCL Placement Test Sample Questions
83 pages
SQL Table Creation and Constraints Guide
No ratings yet
SQL Table Creation and Constraints Guide
117 pages
The Language of SQL 2rd
No ratings yet
The Language of SQL 2rd
39 pages
MySQL Workshop: Introduction to SQL
No ratings yet
MySQL Workshop: Introduction to SQL
197 pages
Overview of SQL 99 Standard
No ratings yet
Overview of SQL 99 Standard
199 pages
Keyrus - 2021 Alteryx Designer Onboarding
No ratings yet
Keyrus - 2021 Alteryx Designer Onboarding
41 pages
Using Wildcards with SQL LIKE Clause
No ratings yet
Using Wildcards with SQL LIKE Clause
27 pages
Salesforce Marketing Cloud Consultant Guide
No ratings yet
Salesforce Marketing Cloud Consultant Guide
28 pages
Ab Initio Join Component Explained
63% (8)
Ab Initio Join Component Explained
37 pages
SQL Joins Lab: Queries and Exercises
No ratings yet
SQL Joins Lab: Queries and Exercises
3 pages
Oracle Database 11g SQL Fundamentals Guide
No ratings yet
Oracle Database 11g SQL Fundamentals Guide
4 pages
Understanding DBMS and Data Models
No ratings yet
Understanding DBMS and Data Models
13 pages
SQL Server Join Types Explained
No ratings yet
SQL Server Join Types Explained
8 pages
Medium-Hard SQL Questions for Analysts
No ratings yet
Medium-Hard SQL Questions for Analysts
17 pages
ASP.NET and C# Interview Questions Guide
No ratings yet
ASP.NET and C# Interview Questions Guide
6 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
40 pages
DBMS Midterm Exam Questions
No ratings yet
DBMS Midterm Exam Questions
3 pages
Essential Database Interview Questions
No ratings yet
Essential Database Interview Questions
6 pages
Database Management Course Overview
No ratings yet
Database Management Course Overview
246 pages
BCA ASP.NET Viva Questions & Notes
No ratings yet
BCA ASP.NET Viva Questions & Notes
14 pages
MariaDB Tutorial: Syntax and Commands
No ratings yet
MariaDB Tutorial: Syntax and Commands
39 pages
Database Management Systems Explained
100% (1)
Database Management Systems Explained
43 pages
Dada Model Question Bank
No ratings yet
Dada Model Question Bank
50 pages
Database Management System Overview
No ratings yet
Database Management System Overview
27 pages
SQL and PL/SQL Comprehensive Training
No ratings yet
SQL and PL/SQL Comprehensive Training
8 pages
DBMS MCQs
No ratings yet
DBMS MCQs
8 pages
M.E. Computer Science Curriculum 2021
No ratings yet
M.E. Computer Science Curriculum 2021
129 pages
SQL Basics: Syntax and Functions Guide
No ratings yet
SQL Basics: Syntax and Functions Guide
39 pages
C# .NET Interview Insights for Experts
No ratings yet
C# .NET Interview Insights for Experts
27 pages

SQL Performance Optimization Tips

Uploaded by

SQL Performance Optimization Tips

Uploaded by

SQL Performance Optimization Techniques

1. Use indexes effectively in relational databases like MySQL and Postgres

There are three main types of indexes:

√ SELECT customer_name, customer_email, customer_phone FROM

3. Optimize JOIN operations

Left and right joins

Here’s an example of a left join:

Here’s an example with an inner join:

When exploring datasets, developing reports or models, and validating data, it is

6. Utilize stored procedures

Stored procedures improve performance in your cloud database because they

You can create a stored procedure like so:

Later you can execute the stored procedure like below:

7. Consider partitioning and sharding for MySQL and Postgres

8. Normalize database tables

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

9. Monitor query performance

10. Use UNION ALL instead of UNION

11. Optimize subquery performance

While I don’t recommend using subqueries when trying to optimize performance,

12. Leverage cloud database-specific features

SELECT DISTINCT is a simple way of removing duplicates from a database.

Unduplicated records are returned without using SELECT DISTINCT by adding

15. Use WHERE instead of HAVING

HAVING statements are determined in the SQL operating order

You might also like