Stay organized with collections
Save and categorize content based on your preferences.
Manage partition and cluster recommendations
This document describes how the partition and cluster recommender works, how
to view your recommendations and insights, and how to apply partition and
cluster recommendations.
How the recommender works
The BigQuery partitioning and clustering recommender generates
partition or cluster
recommendations to optimize your BigQuery tables. The recommender
analyzes workflows on your BigQuery tables and offers
recommendations to better optimize your workflows and query costs using either
table partitioning or table clustering.
For more information about the Recommender service, see the
Recommender overview.
The partitioning and clustering recommender
uses your organization's workload execution data from up to 30 days in the past
to analyze each
BigQuery table for suboptimal partitioning and clustering
configurations. The recommender also uses machine learning to predict how much
the workload execution could be optimized with different partitioning or
clustering configurations. If the recommender finds that partitioning or
clustering a table yields significant savings, the recommender generates a
recommendation. The partitioning and clustering recommender
generates the following types of recommendations:
Existing table type
Recommendation subtype
Recommendation example
Non-partitioned, non-clustered
Partition
"Save about 64 slot hours per month by partitioning on column_C by DAY"
Non-partitioned, non-clustered
Cluster
"Save about 64 slot hours per month by clustering on column_C"
Partitioned, non-clustered
Cluster
"Save about 64 slot hours per month by clustering on column_C"
Each recommendation consists of three parts:
Guidance to either partition or cluster a specific table
The specific column in a table to partition or cluster
Estimated monthly savings for applying the recommendation
To calculate potential workload savings, the recommender assumes that the historical execution workload data from the past 30 days represents the future workload.
The recommender API also returns table workload information in the form of
insights.
Insights are findings that help you
understand your project's workload, providing more context on how a partition or
cluster recommendation might improve workload costs.
Limitations
The partitioning and clustering recommender does not support
BigQuery tables with legacy SQL. When generating a
recommendation, the recommender excludes any legacy SQL queries in its
analysis. Additionally, applying partition recommendations on
BigQuery tables with legacy SQL breaks any legacy SQL
workflows in that table.
BigQuery does not support changing the partitioning scheme
of a table in place. You can only change the partitioning of a table on a copy
of the table. For more information, see Apply partition recommendations.
The partitioning and clustering recommender runs daily. However, if the
run takes longer than 24 hours to complete, the following day's run is skipped.
Locations
The partitioning and clustering recommender is available in the following
processing locations:
This predefined role contains
the permissions required to access partition and cluster recommendations. To see the exact permissions that are
required, expand the Required permissions section:
Required permissions
The following permissions are required to access partition and cluster recommendations:
For more information about IAM roles and permissions in
BigQuery, see
Introduction to IAM.
View recommendations
This section describes how to view partition and cluster recommendations and
insights using the Google Cloud console, the Google Cloud CLI, or the Recommender API.
Select one of the following options:
Console
In the Google Cloud console, go to the BigQuery page.
The recommendations tab lists all recommendations available to your
project.
In the Optimize BigQuery workload cost panel, click View all.
The cost recommendation table lists all recommendations generated for
the current project. For example, the following screenshot shows that
the recommender analyzed the example_table table, and then recommended
clustering the example_column column to save an approximate amount of
bytes and slots.
To see more information about the table insight and recommendation,
click a recommendation.
To view partition or cluster recommendations for a specific project,
use the REST API. With each command, you must provide an authentication
token, which you can get using the gcloud CLI. For more
information about getting an authentication token, see
Methods for getting an ID token.
You can use the curl list request to view all recommendations for a
specific project:
You can also view your recommendations and insights using INFORMATION_SCHEMA
views. For example, you can use the INFORMATION_SCHEMA.RECOMMENDATIONS view to
view your top three recommendations based on slots savings, as seen in the
following example:
You can apply cluster recommendations directly to an existing BigQuery
table. This method is quicker than applying recommendations to a copied table,
but it does not preserve a backup table.
Follow these steps to apply a new clustering specification to unpartitioned or
partitioned tables.
In the bq tool, update the clustering specification of your
table to match the new clustering:
When you apply cluster recommendations to a BigQuery table, you
can first copy the original table and then apply the recommendation to the
copied table. This method ensures that your original data is preserved if you
need to roll back the change to the clustering configuration.
You can use this method to apply cluster recommendations to both
unpartitioned and partitioned tables.
In the Google Cloud console, go to the BigQuery page.
In the query editor, create an empty table with the same metadata (including
the clustering specifications) of the original table by using the LIKE operator:
In the query editor, retrieve the table schema with the partitioning and
clustering configuration of the original table, if any partitioning or
clustering exists. You can retrieve the schema by viewing the
INFORMATION_SCHEMA.TABLES view of the original table:
The output is the full data definition language (DDL) statement of ORIGINAL_TABLE,
including the PARTITION BY clause. For more information about the arguments
in your DDL output, see CREATE TABLE statement.
The DDL output indicates the type of partitioning in the original table:
Partitioning type
Output example
Not partitioned
The PARTITION BY clause is absent.
Partitioned by table column
PARTITION BY c0
PARTITION BY DATE(c0)
PARTITION BY DATETIME_TRUNC(c0, MONTH)
Partitioned by ingestion time
PARTITION BY _PARTITIONDATE
PARTITION BY DATETIME_TRUNC(_PARTITIONTIME, MONTH)
Ingest data into the copied table. The process that you use is based on
the partition type.
If the original table is non-partitioned or partitioned by a table column,
ingest the data from the original table to the copied table:
Replace COLUMN_NAMES with the list of columns
that was the output in the preceding step, separated by commas—for example, col1, col2, col3.
You now have a clustered copied table with the same data as the original table.
In the next steps, you replace your original table with a newly clustered table.
Your original table is now clustered according to the cluster recommendation.
We recommend that you review the clustered table to ensure that all table functions
work as intended. Many table functions are likely tied to the table ID and not
the table name, so it is best to review the following table functions before
proceeding:
The status of any ongoing table processes, such as any materialized views or any jobs that ran when you copied the table.
The ability to access historical table data using time travel.
Any metadata associated with the original table—for example,
table_option_list or column_option_list.
For more information, see Data definition language statements.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the clustered table, you can optionally delete the backup
table with the following command:
DROPTABLEDATASET.BACKUP_TABLE
Apply clusters in a materialized view
You can create a materialized view of the table to store data from the original table with the recommendation applied. Using materialized views to apply recommendations ensures that the clustered data is kept up to date using automatic refreshes.
There are pricing considerations when you query, maintain, and store materialized views.
To learn how to create a clustered materialized view, see Clustered materialized views.
Apply partition recommendations
To apply partition recommendations, you must apply it to a copy of the original
table. BigQuery does not support the changing of a partitioning
scheme of a table in place, such as changing an unpartitioned table to a
partitioned table, changing the partitioning scheme of a table, or creating a
materialized view with a different partitioning scheme from the base table. You
can only change the partitioning of a table on a copy of the table.
Apply partition recommendations to a copied table
When you apply partition recommendations to a BigQuery table, you
must first copy the original table and then apply the recommendation to the
copied table. This approach ensures that your original data is preserved if you
need to roll back a partition.
The following procedure uses an example recommendation to partition a table by
the partition time unit DAY.
Create a copied table using the partition recommendations:
Your original table is now partitioned according to the partition
recommendation.
We recommend that you review the partitioned table to ensure that all table functions
work as intended. Many table functions are likely tied to the table ID and not
the table name, so it is best to review the following table functions before
proceeding:
The status of any ongoing table processes, such as any materialized views or any jobs that ran when you copied the table.
The ability to access historical table data using time travel.
Any metadata associated with the original table—for example,
table_option_list or column_option_list.
For more information, see Data definition language statements.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the partitioned table, you can optionally delete the backup
table with the following command:
DROPTABLEDATASET.BACKUP_TABLE
Overestimation of savings
In some cases, the BigQuery clustering recommender might provide
estimated savings that appear disproportionately large—for example,
exceeding your total monthly billed bytes for that specific table. This is
typically caused by a pattern known as subquery summation.
What triggers this overestimation
The overestimation is most common for workloads involving complex
GoogleSQL queries that reference the same table multiple times.
Examples include the following:
Queries with many self-joins on a single large table.
Queries with multiple common table expressions or subqueries that all
scan the same underlying table.
Why overestimation happens
BigQuery execution plans often break complex queries into
multiple distinct stages. The clustering recommender calculates and
sums the potential savings for every individual stage independently.
If a single job consists of many stages that each read from the same table, the
recommender might count the potential savings for each stage in that single
job, rather than de-duplicating the savings at the job level. This can lead to
overestimated recommendations for tables with complex query patterns.
Verify if your workload is affected
If you have a specific clustering recommendation that you want to verify, you
can run the following query in the Google Cloud console to identify jobs
that might be triggering this overestimation.
This query searches your job history for instances where a single job scans the
same table across more than 10 distinct execution stages.
SELECTjob_id,project_id,user_email,table_name,scan_count,total_billed_gb,creation_timeFROM(SELECTjob_id,project_id,user_email,creation_time,total_bytes_billed/(1024*1024*1024)astotal_billed_gb,-- Extract the table name from the 'READ' substepsREGEXP_EXTRACT(substep,r'FROM ([^ ]+)')astable_name,COUNT(DISTINCTstage.id)asscan_countFROM`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS,UNNEST(job_stages)asstage,UNNEST(stage.steps)asstep,UNNEST(step.substeps)assubstepWHEREcreation_time>TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL14DAY)ANDstep.kind='READ'ANDsubstepLIKE'FROM %'-- Exclude internal intermediate stagesANDNOTREGEXP_CONTAINS(substep,r'FROM __stage')GROUPBY1,2,3,4,5,6)WHEREscan_count>10-- Adjust this threshold to find more complex query patternsORDERBYscan_countDESCLIMIT100;
Replace REGION_NAME with the region that your project is in.
If you find jobs with a high scan_count (for example, greater than 20) for the
table in your recommendation, it is likely that the estimated savings for that
table are inflated. While clustering might still provide a performance benefit,
the actual savings won't reach the level suggested by the recommendation.
Pricing
When you apply a recommendation to a table, you can incur the following
costs:
Processing costs. When you apply a recommendation, you execute a
data definition language (DDL) or data manipulation language
(DML) query to your BigQuery project.
Storage costs. If you use the method of copying a table, you use
extra storage for the copied (or backup) table.
Standard processing and storage charges apply depending on the billing account
that's associated with the project. For more information, see
BigQuery pricing.
Troubleshooting
Issue: No recommendations appear for a specific table.
Partition recommendations might not appear for tables that meet these criteria:
The table is less than 100GB.
The table is already partitioned or clustered.
Cluster recommendations might not appear for tables that meet these criteria:
The table is less than 10GB.
The table is already clustered.
Both partition and cluster recommendations might be suppressed when:
The estimated monthly savings is too insignificant (less than 1 slot hour of savings).
Issue: The estimated savings appear disproportionately large.
Estimated monthly savings might be overestimated if your workload involves
complex queries that reference the same table multiple times across many
distinct execution stages. For more information, see
Overestimation of savings.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-09 UTC."],[],[]]