Scaling based on predictions
Stay organized with collections
Save and categorize content based on your preferences.
You can configure autoscaling for a
managed instance group (MIG)
to automatically add or remove virtual machine (VM) instances based on increases
or decreases in load. However, if your application takes a few minutes or more
to initialize, adding instances in response to real-time changes might not
increase your application's capacity quickly enough. For example, if there's a
large increase in load (like when users first wake up in the morning), some
users might experience delays while your application is initializing on new
instances.
You can use predictive autoscaling to improve response times for applications
with long initialization times and whose workloads vary predictably with daily
or weekly cycles.
When you enable predictive autoscaling, Compute Engine forecasts future
load based on your MIG's history and scales out the MIG in advance of predicted
load, so that new instances are ready to serve when the load arrives. Without
predictive autoscaling, an autoscaler can only scale a group reactively, based
on observed changes in load in real time. With predictive autoscaling enabled,
the autoscaler works with real-time data as well as with historical data to
cover both the current and forecasted load. For more information, see
How predictive autoscaling works
and
Checking if predictive autoscaling is suitable for your workload.
If you haven't already, set up authentication.
Authentication verifies your identity for access to Google Cloud services and APIs. To run
code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and
APIs, you don't need to set up authentication.
gcloud
Install the Google Cloud CLI.
After installation,
initialize the Google Cloud CLI by running the following command:
Predictive autoscaling is free of charge. However, if you enable predictive
autoscaling to optimize for availability, you pay for the Compute Engine
resources that your MIG uses.
Limitations
Predictive autoscaling works only with CPU utilization as the scaling metric.
Cloud Load Balancing or Cloud Monitoring metrics are not supported.
Compute Engine requires 3 days of CPU-based autoscaling history before
it can generate predictions.
Predictions are based on weekly and daily load patterns.
Compute Engine doesn't predict monthly, annual, or one-time events,
and it doesn't predict load patterns that are shorter than 10 minutes.
You can use schedule-based autoscaling
to request capacity for one-time or other load patterns.
Suitable workloads
Predictive autoscaling works best if your workload meets the following criteria:
Your application takes a long time to initialize—for example, if you configure
an initialization period of more
than 2 minutes.
Your workload varies predictably with daily or weekly cycles.
If your service takes a long time to initialize, your users might experience
service latency after a scale-out event, that is, while the new VMs are
provisioned but not yet serving. Predictive autoscaling takes into account your
application's initialization time and scales out in advance of predicted
increases in usage, helping to ensure that the number of available serving
instances is sufficient for the target utilization.
You can enable predictive autoscaling when scaling based on CPU utilization.
For more information about setting up CPU-based autoscaling, see
Scaling based on CPU utilization.
If your MIG has no autoscaler history, it can take 3 days before the
predictive algorithm affects the autoscaler. During this time, the group
scales based on real-time data only. After 3 days, the group starts to scale
using predictions. As more historical load is collected, the predictive
autoscaler better understands your load patterns and its forecasts improve.
Compute Engine uses up to 3 weeks of your MIG's load history to feed the
machine learning model.
Permissions required for this task
To perform this task, you must have the following
permissions:
compute.autoscalers.update on the group's autoscaler
For more information about how to configure the target utilization, minimum and
maximum number of instances, and the initialization period, see
Scaling based on CPU utilization.
When you configure these options, the predictive autoscaler works to maintain
all instances at the target utilization level that you set, within the minimum
and maximum bounds of the group, in the same way that a real-time autoscaler
does.
Use the initialization period setting to account for the time it takes for your
application to initialize. This setting influences how far in advance the
predictive autoscaler starts new instances ahead of predicted increase in load,
so that your application is ready to serve when the load arrives.
Checking if predictive autoscaling is suitable for your workload
To see if predictive autoscaling might improve your application's
availability, you can compare the performance of your group's current
CPU-based autoscaling configuration against predictive autoscaling. You don't
need to enable predictive autoscaling in order to make the comparison.
For more information about workloads that are suitable for predictive
autoscaling, see Suitable workloads.
Checking for overloads
Your autoscaled MIG is overloaded when its average CPU utilization exceeds your
target. To check if your autoscaling configuration resulted in overloaded VMs
during the last 7 days, and to see if predictive autoscaling can reduce
overloads, complete the following steps:
Click an existing MIG for which CPU-based autoscaling is configured. The
group's overview page opens.
Click Edit.
Click Group size & autoscaling to expand the section.
In the Autoscaling section, under Autoscaling signals,
expand the CPU utilization section, then click
See if predictive autoscaling can optimize your availability.
Based on data for the last 7 days, the table shows how many VMs were used
per day and how many VMs were overloaded per day for the following rows:
Current autoscaling configuration: shows how the autoscaler performed
based on the autoscaler's configuration over the last 7 days.
With predictive autoscaling set to "Optimize for availability": shows
how the autoscaler would have performed if predictive autoscaling was
enabled over the last 7 days.
You can use the "Number of VMs used per day" as a proxy for costs. For
example, to reduce the daily number of overloaded VMs, the predictive autoscaler
might create VMs earlier and run them for longer, which results in additional
charges.
Monitoring and simulating predictive autoscaling
You can visualize the historical size of your group using Cloud Monitoring.
The monitoring graph shows how your autoscaling configuration scaled your group
over time, and it also shows how predictive autoscaling, if enabled, would have
scaled your group.
For groups with predictive autoscaling disabled, you can use this tool to
simulate predictive autoscaling before enabling it.
Click an existing MIG for which CPU-based autoscaling is configured. The
group's overview page opens.
Click Monitoring to see charts related to the group.
In the first chart, click its title and select Predictive autoscaling.
This view shows the group's actual size as well as its predicted size.
You can select a different time range to see more history or zoom into
a period where demand grew to see how predictive autoscaling affects group
size ahead of forecasted load.
How predictive autoscaling works
Predictive autoscaler forecasts your scaling metric based on the metric's
historical trends. Forecasts are recomputed every few minutes, which lets the
autoscaler rapidly adapt its forecast to very recent changes in load. Predictive
autoscaler needs at least 3 days of history from which to determine a
representative service usage pattern before it can provide predictions.
Compute Engine uses up to 3 weeks of your MIG's load history to feed the
machine learning model.
Predictive autoscaler calculates the number of VMs needed to achieve your
utilization target based on numerous factors, including the following:
The predicted future value of the scaling metric
The current value of the scaling metric
Confidence in past trends, including past variability of the scaling metric
The configured application initialization period, also referred to as the
initialization period
Based on such factors, the predictive autoscaler scales out your group ahead
of anticipated demand.
Figure 1. Comparison of serving VMs with and without predictive autoscaling.
In figure 1, the blue line shows a growing demand for VMs. The black line shows
the autoscaler's response: more VMs are added. However, for applications with
long initialization times, the grey line shows that the added VMs require
additional time before they are ready to serve, which can result in not enough
serving VMs to meet the demand. With predictive autoscaling enabled, the
predicted increase in demand and the long application initialization time are
accounted for: the autoscaler responds by adding VMs earlier, resulting in a
sufficient number of serving VMs. You can configure how far in advance new
instances are added by setting the
initialization period.
Real-time usage data
Predictive autoscaler can't determine a pattern for all future changes in usage
based on historical data, so it works seamlessly with real-time data, too. For
example, an unexpected news event might contribute to a spike in usage that
couldn't have been predicted based on history alone. To handle such
unpredictable changes in load, the predictive autoscaler responds as follows:
It adapts its predictions: Predictions are recalculated constantly, within
minutes, so they adjust to incorporate the latest data. The exact timing of
adjustments to new patterns depends on, among other things, how repeatable
the new pattern is and how large the difference is between the new pattern and
past predictions.
It yields to real-time data: The autoscaler's recommended number of instances,
based on real-time values of the metric, is always sufficient to meet the
group's target utilization. If the current value of a real-time signal is
greater than the prediction, the current value of the signal takes priority
over the prediction. As a result, MIGs that have predictive autoscaling
enabled always have more availability than MIGs that don't.
Figure 2. Two charts show how predictions adapt to actual CPU usage.
In figure 2, the dotted yellow line shows the prediction at t1. But
the actual CPU usage, as shown by the solid blue line, is different than
predicted. On the left chart, the actual CPU usage is higher than predicted. On
the right chart, the actual CPU usage is lower than predicted. The dotted blue
line shows the adjusted prediction.
Short, unpredictable spikes
Short, unpredictable peaks are covered in real time. The autoscaler creates at
least as many instances as needed to keep utilization at the configured target,
based on the current actual value of the metric. However, these instances
aren't created in advance, as shown in the following figure.
Figure 3. A short, unpredictable spike causes the autoscaler to react in
real time.
In figure 3, the solid blue line shows actual CPU usage. An unexpected spike in
CPU usage could not be predicted. Because the autoscaler always monitors
real-time data, it adds instances to accommodate the spike. The solid black line
illustrates the autoscaler's reactive addition of VMs in response to the spike.
The solid grey line shows the number of serving VMs. The grey line lags behind
the black line due to the application's initialization time. In this scenario,
the group is temporarily overloaded.
Sudden dips
Another type of unpredictable change in usage is a sudden dip, for example, a
dip caused by a failure in part of the application stack. When that happens, the
number of instances initially follows the forecast. However, over time, the
forecast adjusts to the lower-than-forecasted usage, resulting in a scale-in.
The exact timing of this adjustment depends on numerous factors, including: how
often the pattern occurred in the past, how long the dip lasts, and how deep the
dip is.
Figure 4. A sudden dip causes the predictive autoscaler to change its forecast.
In figure 4, the dotted yellow line shows the prediction at t1. But
the actual CPU usage, as shown by the solid blue line, fell lower than
predicted. The dotted blue line shows the updated prediction, which was
automatically adjusted after observing lower-than-forecasted usage. This results
in the autoscaler removing instances following the standard
stabilization period.
Historical data
Predictive autoscaler needs at least 3 days of historical load to start
forecasting. If you have a new MIG that lacks historical data,
Compute Engine scales your group reactively using real-time data until
sufficient historical data becomes available. After 3 days, as
Compute Engine collects additional usage data, the predictions improve.
If you update your application by creating a new MIG and deleting the old
one—for example, a blue-green deployment—then your new MIG needs 3 days of
historical load data before predictive autoscaling can start generating
forecasts again. If you want to preserve load history across MIGs so that
forecasts can start immediately when you create a new MIG,
contact us to request instructions to join a
private preview.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-11 UTC."],[],[]]