Scaling based on predictions

You can configure autoscaling for a managed instance group (MIG) to automatically add or remove virtual machine (VM) instances based on increases or decreases in load. However, if your application takes a few minutes or more to initialize, adding instances in response to real-time changes might not increase your application's capacity quickly enough. For example, if there's a large increase in load (like when users first wake up in the morning), some users might experience delays while your application is initializing on new instances.

You can use predictive autoscaling to improve response times for applications with long initialization times and whose workloads vary predictably with daily or weekly cycles.

When you enable predictive autoscaling, Compute Engine forecasts future load based on your MIG's history and scales out the MIG in advance of predicted load, so that new instances are ready to serve when the load arrives. Without predictive autoscaling, an autoscaler can only scale a group reactively, based on observed changes in load in real time. With predictive autoscaling enabled, the autoscaler works with real-time data as well as with historical data to cover both the current and forecasted load. For more information, see How predictive autoscaling works and Checking if predictive autoscaling is suitable for your workload.

Before you begin

Pricing

Predictive autoscaling is free of charge. However, if you enable predictive autoscaling to optimize for availability, you pay for the Compute Engine resources that your MIG uses.

Limitations

Suitable workloads

Predictive autoscaling works best if your workload meets the following criteria:

If your service takes a long time to initialize, your users might experience service latency after a scale-out event, that is, while the new VMs are provisioned but not yet serving. Predictive autoscaling takes into account your application's initialization time and scales out in advance of predicted increases in usage, helping to ensure that the number of available serving instances is sufficient for the target utilization.

To preview how predictive autoscaling can affect your group, see Checking if predictive autoscaling is suitable for your workload.

Enabling and disabling predictive autoscaling

You can enable predictive autoscaling when scaling based on CPU utilization. For more information about setting up CPU-based autoscaling, see Scaling based on CPU utilization.

If your MIG has no autoscaler history, it can take 3 days before the predictive algorithm affects the autoscaler. During this time, the group scales based on real-time data only. After 3 days, the group starts to scale using predictions. As more historical load is collected, the predictive autoscaler better understands your load patterns and its forecasts improve. Compute Engine uses up to 3 weeks of your MIG's load history to feed the machine learning model.