This tutorial teaches you how to use an
ARIMA_PLUS univariate time series model to forecast the future value for a given column based on the historical values
for that column.
This tutorial forecasts a single time series. Forecasted values are calculated once for each time point in the input data.
This tutorial uses data from the public
bigquery-public-data.google_analytics_sample.ga_sessions sample table. This
table contains obfuscated ecommerce data from the Google Merchandise Store.
This tutorial guides you through completing the following tasks:
CREATE MODEL statement.ML.ARIMA_EVALUATE function.ML.ARIMA_COEFFICIENTS function.ML.FORECAST function.ML.EXPLAIN_FORECAST function.
You can inspect these time series components in order to explain the
forecasted values.This tutorial uses billable components of Google Cloud, including the following:
For more information about BigQuery costs, see the BigQuery pricing page.
For more information about BigQuery ML costs, see BigQuery ML pricing.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
To create the dataset, you need the bigquery.datasets.create
IAM permission.
To create the model, you need the following permissions:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateDataTo run inference, you need the following permissions:
bigquery.models.getDatabigquery.jobs.createFor more information about IAM roles and permissions in BigQuery, see Introduction to IAM.
In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, click your project name.
Click View actions > Create dataset
On the Create dataset page, do the following:
For Dataset ID, enter bqml_tutorial.
For Location type, select Multi-region, and then select US.
Leave the remaining default settings as they are, and click Create dataset.
To create a new dataset, use the
bq mk --dataset command.
Create a dataset named bqml_tutorial with the data location set to US.
bq mk --dataset \ --location=US \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Confirm that the dataset was created:
bq lsCall the datasets.insert
method with a defined dataset resource.
{ "datasetReference": { "datasetId": "bqml_tutorial" } }
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Before creating the model, you can optionally visualize your input time series data to get a sense of the distribution. You can do this by using Data Studio.
Follow these steps to visualize the time series data:
In the following GoogleSQL query, the
SELECT statement parses the date column from the input
table to the TIMESTAMP type and renames it to parsed_date, and uses
the SUM(...) clause and the GROUP BY date clause to create a daily
totals.visits value.
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
SELECT PARSE_TIMESTAMP("%Y%m%d", date) AS parsed_date, SUM(totals.visits) AS total_visits FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` GROUP BY date;
When the query completes, click Open in > Data Studio. Data Studio opens in a new tab. Complete the following steps in the new tab.
In Data Studio, click Insert > Time series chart.
In the Chart pane, choose the Setup tab.
In the Metric section, add the total_visits field, and remove the default Record Count metric. The resulting chart looks similar to the following:
Looking at the chart, you can see that the input time series has a weekly seasonal pattern.
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
The result is similar to the following:
Create a time series model to forecast total site visits as represented by
totals.visits column, and train it on the Google Analytics 360
data.
In the following query, the
OPTIONS(model_type='ARIMA_PLUS', time_series_timestamp_col='date', ...)
clause indicates that you are creating an
ARIMA-based
time series model. The
auto_arima option
of the CREATE MODEL statement defaults to TRUE, so the auto.ARIMA
algorithm automatically tunes the hyperparameters in the model. The algorithm
fits dozens of candidate models and chooses the best model, which is the model
with the lowest
Akaike information criterion (AIC).
The
data_frequency option
of the CREATE MODEL statements defaults to AUTO_FREQUENCY, so the
training process automatically infers the data frequency of the input time
series. The
decompose_time_series option
of the CREATE MODEL statement defaults to TRUE, so that information about
the time series data is returned when you evaluate the model in the next step.
Follow these steps to create the model:
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
CREATE OR REPLACE MODEL `bqml_tutorial.ga_arima_model` OPTIONS (model_type = 'ARIMA_PLUS', time_series_timestamp_col = 'parsed_date', time_series_data_col = 'total_visits', auto_arima = TRUE, data_frequency = 'AUTO_FREQUENCY', decompose_time_series = TRUE ) AS SELECT PARSE_TIMESTAMP("%Y%m%d", date) AS parsed_date, SUM(totals.visits) AS total_visits FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` GROUP BY date;
The query takes about 4 seconds to complete, after which you can access the
ga_arima_model model. Because the query uses a CREATE MODEL statement
to create a model, you don't see query results.
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
Evaluate the time series models by using the ML.ARIMA_EVALUATE
function. The ML.ARIMA_EVALUATE function shows you the evaluation metrics of
all the candidate models evaluated during the process of automatic
hyperparameter tuning.
Follow these steps to evaluate the model:
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
SELECT * FROM ML.ARIMA_EVALUATE(MODEL `bqml_tutorial.ga_arima_model`);
The results should look similar to the following:
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
The non_seasonal_p, non_seasonal_d, non_seasonal_q, and has_drift
output columns define an ARIMA model in the training pipeline. The
log_likelihood, AIC, and variance output columns are relevant to the ARIMA
model fitting process.
The auto.ARIMA algorithm uses the
KPSS test to determine the best value
for non_seasonal_d, which in this case is 1. When non_seasonal_d is 1,
the auto.ARIMA algorithm trains 42 different candidate ARIMA models in parallel.
In this example, all 42 candidate models are valid, so the output contains 42
rows, one for each candidate ARIMA model; in cases where some of the models
aren't valid, they are excluded from the output. These candidate models are
returned in ascending order by AIC. The model in the first row has the lowest
AIC, and is considered the best model. The best model is saved as the final
model and is used when you call functions such as ML.FORECAST on the model
The seasonal_periods column contains information about the seasonal pattern
identified in the time series data. It has nothing to do with the ARIMA
modeling, therefore it has the same value across all output rows. It reports a
weekly pattern, which agrees with the results you saw if you chose to
visualize the input data.
The has_holiday_effect, has_spikes_and_dips, and has_step_changes columns
are only populated when decompose_time_series=TRUE. These columns also reflect
information about the input time series data, and are not related to the ARIMA
modeling. These columns also have the same values across all output rows.
The error_message column shows any errors that incurred during the
auto.ARIMA fitting process. One possible reason for errors is when the selected
non_seasonal_p, non_seasonal_d, non_seasonal_q, and has_drift columns
are not able to stabilize the time series. To retrieve the error
message of all the candidate models, set the show_all_candidate_models
option to TRUE when you create the model.
For more information about the output columns, see
ML.ARIMA_EVALUATE function.
Inspect the time series model's coefficients by using the
ML.ARIMA_COEFFICIENTS function.
Follow these steps to retrieve the model's coefficients:
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
SELECT * FROM ML.ARIMA_COEFFICIENTS(MODEL `bqml_tutorial.ga_arima_model`);
The ar_coefficients output column shows the model coefficients of the
autoregressive (AR) part of the ARIMA model. Similarly, the ma_coefficients
output column shows the model coefficients of the moving-average (MA) part of
the ARIMA model. Both of these columns contain array values, whose lengths are
equal to non_seasonal_p and non_seasonal_q, respectively. You saw in the
output of the ML.ARIMA_EVALUATE function that the best model has a
non_seasonal_p value of 2 and a non_seasonal_q value of 3. Therefore, in
the ML.ARIMA_COEFFICIENTS output, the ar_coefficients value is a 2-element
array and the ma_coefficients value is a 3-element array. The
intercept_or_drift value is the constant term in the ARIMA model.
For more information about the output columns, see
ML.ARIMA_COEFFICIENTS function.
Inspect the time series model's coefficients by using the coef_ function.
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
The ar_coefficients output column shows the model coefficients of the
autoregressive (AR) part of the ARIMA model. Similarly, the ma_coefficients
output column shows the model coefficients of the moving-average (MA) part of
the ARIMA model. Both of these columns contain array values, whose lengths are
equal to non_seasonal_p and non_seasonal_q, respectively.
Forecast future time series values by using the ML.FORECAST
function.
In the following GoogleSQL query, the
STRUCT(30 AS horizon, 0.8 AS confidence_level) clause indicates that the
query forecasts 30 future time points, and generates a prediction interval
with an 80% confidence level.
Follow these steps to forecast data with the model:
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
SELECT * FROM ML.FORECAST(MODEL `bqml_tutorial.ga_arima_model`, STRUCT(30 AS horizon, 0.8 AS confidence_level));
The results should look similar to the following:
Forecast future time series values by using the predict function.
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
The output rows are in chronological order by the
forecast_timestamp column value. In time series forecasting, the prediction
interval, as represented by the prediction_interval_lower_bound and
prediction_interval_upper_bound column values, is as important as the
forecast_value column value. The forecast_value value is the middle point
of the prediction interval. The prediction interval depends on the
standard_error and confidence_level column values.
For more information about the output columns, see
ML.FORECAST function.
You can get explainability metrics in addition to forecast data by using the
ML.EXPLAIN_FORECAST function. The ML.EXPLAIN_FORECAST function forecasts
future time series values and also returns all the separate components of the
time series.
Similar to the ML.FORECAST function, the
STRUCT(30 AS horizon, 0.8 AS confidence_level) clause used in the
ML.EXPLAIN_FORECAST function indicates that the query forecasts 30 future
time points and generates a prediction interval with 80% confidence.
Follow these steps to explain the model's results:
In the Google Cloud console, go to the BigQuery page.
In the query editor, paste in the following query and click Run:
SELECT * FROM ML.EXPLAIN_FORECAST(MODEL `bqml_tutorial.ga_arima_model`, STRUCT(30 AS horizon, 0.8 AS confidence_level));
The results should look similar to the following:
The output rows are ordered chronologically by the time_series_timestamp
column value.
For more information about the output columns, see
ML.EXPLAIN_FORECAST function.
You can get explainability metrics in addition to forecast data by using the
predict_explain function. The predict_explain function forecasts
future time series values and also returns all the separate components of the
time series.
Similar to the predict function, the
horizon=30, confidence_level=0.8 clause used in the
predict_explain function indicates that the query forecasts 30 future
time points and generates a prediction interval with 80% confidence.
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.
If you would like to visualize the results, you can use Data Studio as described in the Visualize the input data section to create a chart, using the following columns as metrics:
time_series_dataprediction_interval_lower_boundprediction_interval_upper_boundtrendseasonal_period_weeklystep_changesTo avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Deleting your project removes all datasets and all tables in the project. If you prefer to reuse the project, you can delete the dataset you created in this tutorial:
If necessary, open the BigQuery page in the Google Cloud console.
In the navigation, click the bqml_tutorial dataset you created.
Click Delete dataset on the right side of the window. This action deletes the dataset, the table, and all the data.
In the Delete dataset dialog box, confirm the delete command by typing
the name of your dataset (bqml_tutorial) and then click Delete.
To delete the project:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-03 UTC.