Mistral AI models on Gemini Enterprise Agent Platform offer fully managed and serverless models as APIs. To use a Mistral AI model on Agent Platform, send a request directly to the Agent Platform API endpoint. Because Mistral AI models use a managed API, there's no need to provision or manage infrastructure.
You can stream your responses to reduce the end-user latency perception. A streamed response uses server-sent events (SSE) to incrementally stream the response.
You pay for Mistral AI models as you use them (pay as you go). For
pay-as-you-go pricing, see Mistral AI model pricing on the
Gemini Enterprise Agent Platform pricing
page. The following models are available from Mistral AI to use in
Gemini Enterprise Agent Platform. To access a Mistral AI model, go to its
Model Garden model card. Mistral Medium 3 is a versatile model designed for a wide range of
tasks, including programming, mathematical reasoning, understanding long
documents, summarization, and dialogue. It excels at complex tasks requiring
advanced reasoning abilities, visual understanding or a high level of
specialization (e.g. creative writing, agentic workflows, code generation). It boasts multi-modal capabilities, enabling it to process visual inputs, and
supports dozens of languages, including over 80 coding languages. Additionally,
it features function calling and agentic workflows. Mistral Medium 3 is optimized for single-node inference, particularly
for long-context applications. Its size allows it to achieve high throughput on
a single node. Go to the Mistral Medium 3 model card Mistral OCR (25.05) is an Optical Character Recognition API for document
understanding. Mistral OCR (25.05) excels in understanding complex
document elements, including interleaved imagery, mathematical expressions,
tables, and advanced layouts such as LaTeX formatting. The model enables deeper
understanding of rich documents such as scientific papers with charts, graphs,
equations and figures. Mistral OCR (25.05) is an ideal model to use in combination with a RAG
system that takes multimodal documents (such as slides or complex PDFs) as
input. You can couple Mistral OCR (25.05) with other Mistral models to reformat
the results. This combination ensures that the extracted content is not only
accurate but also presented in a structured and coherent manner, making it
suitable for various downstream applications and analyses. Go to the Mistral OCR (25.05) model card Mistral Small 3.1 (25.03) features multimodal capabilities and a context of up
to 128,000. The model can process and understand visual inputs and long
documents, further expanding its range of applications compared to the previous
Mistral AI Small model. Mistral Small 3.1 (25.03) is a versatile model
designed for various tasks such as programming, mathematical reasoning, document
understanding, and dialogue. Mistral Small 3.1 (25.03) is designed for
low-latency applications to deliver best-in-class efficiency compared to models
of the same quality. Mistral Small 3.1 (25.03) has undergone a full post-training process to align
the model with human preferences and needs, making it usable out-of-the-box for
applications that require chat or precise instruction following. Go to the Mistral Small 3.1 (25.03) model card Codestral 2 is Mistral's code generation specialized model built
specifically for high-precision fill-in-the-middle (FIM) completion. It helps
developers write and interact with code through a shared instruction and
completion API endpoint. As it masters code and can also converse in a variety
of languages, it can be used to design advanced AI applications for software
developers. The latest release of Codestral 2 delivers measurable upgrades over prior
version Codestral (25.01): Improved performance on academic benchmarks for short and long-context FIM
completion. Go to the Codestral 2 model card You can use curl commands to send requests to the Gemini Enterprise Agent Platform endpoint
using the following model names: For more information about using the Mistral AI SDK, see the
Mistral AI Gemini Enterprise Agent Platform documentation. To use Mistral AI models with Gemini Enterprise Agent Platform, you must perform the
following steps. The Agent Platform API
( In the Google Cloud console, on the project selector page,
select or create a Google Cloud project. Roles required to select or create a project
Verify that billing is enabled for your Google Cloud project.
Enable the Gemini Enterprise Agent Platform API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role ( In the Google Cloud console, on the project selector page,
select or create a Google Cloud project. Roles required to select or create a project
Verify that billing is enabled for your Google Cloud project.
Enable the Gemini Enterprise Agent Platform API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role ( The following sample makes a streaming call to a Mistral AI model.
After you set up your environment, you can use REST to test a text prompt. The
following sample sends a request to the publisher model endpoint.
Before using any of the request data,
make the following replacements:
Specify a lower value for shorter responses and a higher value for potentially longer
responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following. The following sample makes a unary call to a Mistral AI model.
After you set up your environment, you can use REST to test a text prompt. The
following sample sends a request to the publisher model endpoint.
Before using any of the request data,
make the following replacements:
Specify a lower value for shorter responses and a higher value for potentially longer
responses.
HTTP method and URL:
Request JSON body:
To send your request, choose one of these options:
Save the request body in a file named
Save the request body in a file named You should receive a JSON response similar to the following. For Mistral AI models, a quota applies for each region where the model is
available. The quota is specified in queries per minute (QPM) and tokens per
minute (TPM). TPM includes both input and output tokens.Available Mistral AI models
Mistral Medium 3
Mistral OCR (25.05)
Mistral Small 3.1 (25.03)
Codestral 2
Use Mistral AI models
mistral-medium-3mistral-ocr-2505mistral-small-2503codestral-2Before you begin
aiplatform.googleapis.com) must be enabled to use
Gemini Enterprise Agent Platform. If you already have an existing project with the
Agent Platform API enabled, you can use that project instead of creating a
new project.
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
Make a streaming call to a Mistral AI model
REST
@ model version
number.user or an assistant.
The first message must use the user role. The models
operate with alternating user and assistant turns.
If the final message uses the assistant role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.true to stream the response
and false to return the response all at once.user or assistant message.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict
{
"model": MODEL,
"messages": [
{
"role": "ROLE",
"content": "CONTENT"
}],
"max_tokens": MAX_TOKENS,
"stream": true
}
curl
request.json,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict"PowerShell
request.json,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:streamRawPredict" | Select-Object -Expand ContentMake a unary call to a Mistral AI model
REST
@ model version
number.user or an assistant.
The first message must use the user role. The models
operate with alternating user and assistant turns.
If the final message uses the assistant role, then the response
content continues immediately from the content in that message. You can use
this to constrain part of the model's response.true to stream the response
and false to return the response all at once.user or assistant message.POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict
{
"model": MODEL,
"messages": [
{
"role": "ROLE",
"content": "CONTENT"
}],
"max_tokens": MAX_TOKENS,
"stream": false
}
curl
request.json,
and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict"PowerShell
request.json,
and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/mistralai/models/MODEL:rawPredict" | Select-Object -Expand ContentMistral AI model region availability and quotas
| Model | Region | Quotas | Context length |
|---|---|---|---|
| Mistral Medium 3 | |||
us-central1 |
|
128,000 | |
europe-west4 |
|
128,000 | |
| Mistral OCR (25.05) | |||
us-central1 |
|
30 pages | |
europe-west4 |
|
30 pages | |
| Mistral Small 3.1 (25.03) | |||
us-central1 |
|
128,000 | |
europe-west4 |
|
128,000 | |
| Codestral 2 | |||
us-central1 |
|
128,000 tokens | |
europe-west4 |
|
128,000 tokens |
If you want to increase any of your quotas for Generative AI on Gemini Enterprise Agent Platform, you can use the Google Cloud console to request a quota increase. To learn more about quotas, see the Cloud Quotas overview.