Generate embeddings with transformer models in ONNX format

If you create a new project, you're the project owner, and you're granted all of the required Identity and Access Management (IAM) permissions that you need to complete this tutorial.

Convert the transformer model files to ONNX

Optionally, you can follow the steps in this section to manually convert the sentence-transformers/all-MiniLM-L6-v2 model and tokenizer to ONNX. Otherwise, you can use sample files from the public gs://cloud-samples-data Cloud Storage bucket that have already been converted.

If you choose to manually convert the files, you must have a local command-line environment that has Python installed. For more information on installing Python, see Python downloads.

Export the transformer model to ONNX

Use the Hugging Face Optimum CLI to export the sentence-transformers/all-MiniLM-L6-v2 model to ONNX. For more information about exporting models with the Optimum CLI, see Export a model to ONNX with optimum.exporters.onnx.

To export the model, open a command-line environment and follow these steps:

Install the Optimum CLI:
```
pip install optimum[onnx]
```
Export the model. The --model argument specifies the Hugging Face model ID. The --opset argument specifies the ONNXRuntime library version, and is set to 17 to maintain compatibility with the ONNXRuntime library supported by BigQuery.
```
optimum-cli export onnx \
  --model sentence-transformers/all-MiniLM-L6-v2 \
  --task sentence-similarity \
  --opset 17 all-MiniLM-L6-v2/
```

The model file is exported to the all-MiniLM-L6-v2 directory as model.onnx.

Apply quantization to the transformer model

Use the Optimum CLI to apply quantization to the exported transformer model in order to reduce model size and speed up inference. For more information, see Quantization.

To apply quantization to the model, run the following command on the command-line:

optimum-cli onnxruntime quantize \
  --onnx_model all-MiniLM-L6-v2/ \
  --avx512_vnni -o all-MiniLM-L6-v2_quantized

The quantized model file is exported to the all-MiniLM-L6-v2_quantized directory as model_quantized.onnx.

Convert the tokenizer to ONNX

To generate embeddings using a transformer model in ONNX format, you typically use a tokenizer to produce two inputs to the model, input_ids and attention_mask.

To produce these inputs, convert the tokenizer for the sentence-transformers/all-MiniLM-L6-v2 model to ONNX format by using the onnxruntime-extensions library. After you convert the tokenizer, you can perform tokenization directly on raw text inputs to generate ONNX predictions.

To convert the tokenizer, follow these steps on the command-line:

Install the Optimum CLI:
```
pip install optimum[onnx]
```
Using the text editor of your choice, create a file named convert-tokenizer.py. The following example uses the nano text editor:
```
nano convert-tokenizer.py
```

Copy and paste the following Python script into the convert-tokenizer.py file:

from onnxruntime_extensions import gen_processing_models

# Load the Huggingface tokenizer
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

# Export the tokenizer to ONNX using gen_processing_models
onnx_tokenizer_path = "tokenizer.onnx"

# Generate the tokenizer ONNX model, and set the maximum token length.
# Ensure 'max_length' is set to a value less than the model's maximum sequence length, failing to do so will result in error during inference.
tokenizer_onnx_model = gen_processing_models(tokenizer, pre_kwargs={'max_length': 256})[0]

# Modify the tokenizer ONNX model signature.
# This is because certain tokenizers don't support batch inference.
tokenizer_onnx_model.graph.input[0].type.tensor_type.shape.dim[0].dim_value = 1

# Save the tokenizer ONNX model
with open(onnx_tokenizer_path, "wb") as f:
  f.write(tokenizer_onnx_model.SerializeToString())

Save the convert-tokenizer.py file.
Run the Python script to convert the tokenizer:
```
python convert-tokenizer.py
```

The converted tokenizer is exported to the all-MiniLM-L6-v2_quantized directory as tokenizer.onnx.

Upload the converted model files to Cloud Storage

After you have converted the transformer model and the tokenizer, do the following:

Create a Cloud Storage bucket to store the converted files.
Upload the converted transformer model and tokenizer files to your Cloud Storage bucket.

Create a dataset

Create a BigQuery dataset to store your ML model.

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames. For more information, see the BigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.

import google.cloud.bigquery

bqclient = google.cloud.bigquery.Client()
bqclient.create_dataset("bqml_tutorial", exists_ok=True)

Import the ONNX models into BigQuery

Import the converted tokenizer and the sentence transformer models as BigQuery ML models.

Select one of the following options:

Console

In the Google Cloud console, open BigQuery Studio.

Go to BigQuery Studio
In the query editor, run the following CREATE MODEL statement to create the tokenizer model.
```
 CREATE OR REPLACE MODEL `bqml_tutorial.tokenizer`
  OPTIONS (MODEL_TYPE='ONNX',
   MODEL_PATH='TOKENIZER_BUCKET_PATH')
```
Replace TOKENIZER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TOKENIZER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx.

When the operation is complete, you see a message similar to the following: Successfully created model named tokenizer in the Query results pane.
Click Go to model to open the Details pane.
Review the Feature Columns section to see the model inputs and the Label Column to see model outputs.
In the query editor, run the following CREATE MODEL statement to create the all-MiniLM-L6-v2 model.
```
 CREATE OR REPLACE MODEL `bqml_tutorial.all-MiniLM-L6-v2`
  OPTIONS (MODEL_TYPE='ONNX',
   MODEL_PATH='TRANSFORMER_BUCKET_PATH')
```
Replace TRANSFORMER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TRANSFORMER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx.

When the operation is complete, you see a message similar to the following: Successfully created model named all-MiniLM-L6-v2 in the Query results pane.
Click Go to model to open the Details pane.
Review the Feature Columns section to see the model inputs and the Label Column to see model outputs.

bq

Use the bq command-line tool query command to run the CREATE MODEL statement.

On the command line, run the following command to create the tokenizer model.
```
bq query --use_legacy_sql=false \
"CREATE OR REPLACE MODEL
`bqml_tutorial.tokenizer`
OPTIONS
(MODEL_TYPE='ONNX',
MODEL_PATH='TOKENIZER_BUCKET_PATH')"
```
Replace TOKENIZER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TOKENIZER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx.

When the operation is complete, you see a message similar to the following: Successfully created model named tokenizer.
On the command line, run the following command to create the all-MiniLM-L6-v2 model.
```
bq query --use_legacy_sql=false \
"CREATE OR REPLACE MODEL
`bqml_tutorial.all-MiniLM-L6-v2`
OPTIONS
(MODEL_TYPE='ONNX',
  MODEL_PATH='TRANSFORMER_BUCKET_PATH')"
```
Replace TRANSFORMER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TRANSFORMER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx.

When the operation is complete, you see a message similar to the following: Successfully created model named all-MiniLM-L6-v2.

After you import the models, verify that the models appear in the dataset.

bq ls -m bqml_tutorial

The output is similar to the following:

tableId            Type
------------------------
tokenizer          MODEL
all-MiniLM-L6-v2   MODEL

API

Use the jobs.insert method to import the models. Populate the query parameter of the QueryRequest resource in the request body with the CREATE MODEL statement.

Use the following query parameter value to create the tokenizer model.
```
{
"query": "CREATE MODEL `PROJECT_ID :bqml_tutorial.tokenizer` OPTIONS(MODEL_TYPE='ONNX' MODEL_PATH='TOKENIZER_BUCKET_PATH')"
}
```
Replace the following:
- PROJECT_ID with your project ID.
- TOKENIZER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TOKENIZER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx.
Use the following query parameter value to create the all-MiniLM-L6-v2 model.
```
{
"query": "CREATE MODEL `PROJECT_ID :bqml_tutorial.all-MiniLM-L6-v2` OPTIONS(MODEL_TYPE='ONNX' MODEL_PATH='TRANSFORMER_BUCKET_PATH')"
}
```
Replace the following:
- PROJECT_ID with your project ID.
- TRANSFORMER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TRANSFORMER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx.

BigQuery DataFrames

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.

Import the tokenizer and sentence transformer models by using the ONNXModel object.

import bigframes
from bigframes.ml.imported import ONNXModel

bigframes.options.bigquery.project = PROJECT_ID

bigframes.options.bigquery.location = "US"

tokenizer = ONNXModel(
  model_path= "TOKENIZER_BUCKET_PATH"
)
imported_onnx_model = ONNXModel(
  model_path="TRANSFORMER_BUCKET_PATH"
)

Replace the following:

PROJECT_ID with your project ID.
TOKENIZER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TOKENIZER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/tokenizer.onnx.
TRANSFORMER_BUCKET_PATH with the path to the model that you uploaded to Cloud Storage. If you're using the sample model, replace TRANSFORMER_BUCKET_PATH with the following value: gs://cloud-samples-data/bigquery/ml/onnx/all-MiniLM-L6-v2/model_quantized.onnx.

Generate embeddings with the imported ONNX models

Use the imported tokenizer and the sentence transformer models to generate embeddings based on data from the bigquery-public-data.imdb.reviews public dataset.

Select one of the following options:

Console

Use the ML.PREDICT function to generate embeddings with the models.

The query uses a nested ML.PREDICT call, to process raw text directly through the tokenizer and the embedding model, as follows:

Tokenization (inner query): the inner ML.PREDICT call uses the bqml_tutorial.tokenizer model. It takes the title column from the bigquery-public-data.imdb.reviews public dataset as its text input. The tokenizer model converts the raw text strings into the numerical token inputs that the main model requires, including the input_ids and attention_mask inputs.
Embedding generation (outer query): the outer ML.PREDICT call uses the bqml_tutorial.all-MiniLM-L6-v2 model. The query takes the input_ids and attention_mask columns from the inner query's output as its input.

The SELECT statement retrieves the sentence_embedding column, which is an array of FLOAT values that represent the text's semantic embedding.

In the Google Cloud console, open BigQuery Studio.

Go to BigQuery Studio

In the query editor, run the following query.

SELECT
sentence_embedding
FROM
ML.PREDICT (MODEL `bqml_tutorial.all-MiniLM-L6-v2`,
  (
  SELECT
    input_ids, attention_mask
  FROM
    ML.PREDICT(MODEL `bqml_tutorial.tokenizer`,
      (
      SELECT
        title AS text
      FROM
        `bigquery-public-data.imdb.reviews` limit 10))))

The result is similar to the following:

+-----------------------+
| sentence_embedding    |
+-----------------------+
| -0.02361682802438736  |
| 0.02025664784014225   |
| 0.005168713629245758  |
| -0.026361213997006416 |
| 0.0655381828546524    |
| ...                   |
+-----------------------+

bq

Use the bq command-line tool query command to run a query. The query uses the ML.PREDICT function to generate embeddings with the models.

The query uses a nested ML.PREDICT call, to process raw text directly through the tokenizer and the embedding model, as follows:

Tokenization (inner query): the inner ML.PREDICT call uses the bqml_tutorial.tokenizer model. It takes the title column from the bigquery-public-data.imdb.reviews public dataset as its text input. The tokenizer model converts the raw text strings into the numerical token inputs that the main model requires, including the input_ids and attention_mask inputs.
Embedding generation (outer query): the outer ML.PREDICT call uses the bqml_tutorial.all-MiniLM-L6-v2 model. The query takes the input_ids and attention_mask columns from the inner query's output as its input.

The SELECT statement retrieves the sentence_embedding column, which is an array of FLOAT values that represent the text's semantic embedding.

On the command line, run the following command to run the query.

bq query --use_legacy_sql=false \
'SELECT
sentence_embedding
FROM
ML.PREDICT (MODEL `bqml_tutorial.all-MiniLM-L6-v2`,
  (
  SELECT
    input_ids, attention_mask
  FROM
    ML.PREDICT(MODEL `bqml_tutorial.tokenizer`,
      (
      SELECT
        title AS text
      FROM
        `bigquery-public-data.imdb.reviews` limit 10))))'

The result is similar to the following:

+-----------------------+
| sentence_embedding    |
+-----------------------+
| -0.02361682802438736  |
| 0.02025664784014225   |
| 0.005168713629245758  |
| -0.026361213997006416 |
| 0.0655381828546524    |
| ...                   |
+-----------------------+

BigQuery DataFrames

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment.

Use the predict method to generate embeddings using the ONNX models.

import bigframes.pandas as bpd

df = bpd.read_gbq("bigquery-public-data.imdb.reviews", max_results=10)
df_pred = df.rename(columns={"title": "text"})
tokens = tokenizer.predict(df_pred)
predictions = imported_onnx_model.predict(tokens)
predictions.peek(5)

The output is similar to the following:

Output from the transformer model.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

Console

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

gcloud

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

Delete individual resources

Alternatively, to remove the individual resources used in this tutorial, do the following:

Delete the imported models.
Optional: Delete the dataset.

What's next

Learn how to use text embeddings for semantic search and retrieval-augmented generation (RAG).
For more information about converting transformers models to ONNX, see Export a model to ONNX with optimum.exporters.onnx.
For more information about importing ONNX models, see The CREATE MODEL statement for ONNX models.
For more information about performing prediction, see The ML.PREDICT function.
For an overview of BigQuery ML, see Introduction to BigQuery ML.
To get started using BigQuery ML, see Create machine learning models in BigQuery ML.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-07-17 UTC.

Objectives

Costs

Before you begin

Required roles

Check for the roles

Grant the roles

Convert the transformer model files to ONNX

Export the transformer model to ONNX

Apply quantization to the transformer model

Convert the tokenizer to ONNX

Upload the converted model files to Cloud Storage

Create a dataset

Console

bq

API

BigQuery DataFrames

Import the ONNX models into BigQuery

Console

bq

API

BigQuery DataFrames

Generate embeddings with the imported ONNX models

Console

bq

BigQuery DataFrames

Clean up

Delete the project

Console

gcloud

Delete individual resources

What's next

Generate embeddings with transformer models in ONNX format Stay organized with collections Save and categorize content based on your preferences.

Objectives

Costs

Before you begin

Required roles

Check for the roles

Grant the roles

Convert the transformer model files to ONNX

Export the transformer model to ONNX

Apply quantization to the transformer model

Convert the tokenizer to ONNX

Upload the converted model files to Cloud Storage

Create a dataset

Console

bq

API

BigQuery DataFrames

Import the ONNX models into BigQuery

Console

bq

API

BigQuery DataFrames

Generate embeddings with the imported ONNX models

Console

bq

BigQuery DataFrames

Clean up

Delete the project

Console

gcloud

Delete individual resources

What's next

Generate embeddings with transformer models in ONNX format