Use models in Model Garden
Stay organized with collections
Save and categorize content based on your preferences.
Discover, test, tune, and deploy models by using Model Garden in the
Google Cloud console. You can also deploy Model Garden models by using the
Google Cloud CLI.
Send test prompts
In the Google Cloud console, go to the Model Garden page.
Find a supported model that you want to deploy, and click its model card.
Click Deploy to open the Deploy model pane.
In the Deploy model pane, specify details for your deployment.
Use or modify the generated model and endpoint names.
Select a location to create your model endpoint in.
Select a machine type to use for each node of your deployment.
To use a Compute Engine reservation, under the Deployment
settings section, select Advanced.
For the Reservation type field, select a reservation type. The
reservation must match your specified machine specs.
Automatically use created reservation: Gemini Enterprise Agent Platform
automatically selects an allowed reservation with matching properties.
If there's no capacity in the automatically selected reservation,
Gemini Enterprise Agent Platform uses the general Google Cloud resource pool.
Select specific reservations: Gemini Enterprise Agent Platform uses a specific
reservation. If there's no capacity for your selected reservation, an
error is thrown.
Don't use (default): Gemini Enterprise Agent Platform uses the general
Google Cloud resource pool. This value has the same effect as not
specifying a reservation.
List the models that you can deploy and record the model ID to deploy. You
can optionally list the supported Hugging Face models in
Model Garden and even filter them by model names. The output
doesn't include any tuned models.
importvertexaifromvertexaiimportmodel_garden# TODO(developer): Update and un-comment below lines# PROJECT_ID = "your-project-id"vertexai.init(project=PROJECT_ID,location="us-central1")# List deployable models, optionally list Hugging Face models only or filter by model name.deployable_models=model_garden.list_deployable_models(list_hf_models=False,model_filter="gemma")print(deployable_models)# Example response:# ['google/gemma2@gemma-2-27b','google/gemma2@gemma-2-27b-it', ...]
View the deployment specifications for a model by using the model ID from
the previous step. You can view the machine type, accelerator type, and
container image URI that Model Garden has verified for a particular
model.
importvertexaifromvertexaiimportmodel_garden# TODO(developer): Update and un-comment below lines# PROJECT_ID = "your-project-id"# model = "google/gemma3@gemma-3-1b-it"vertexai.init(project=PROJECT_ID,location="us-central1")# For Hugging Face modelsm the format is the Hugging Face model name, as in# "meta-llama/Llama-3.3-70B-Instruct".# Go to https://console.cloud.google.com/vertex-ai/model-garden to find all deployable# model names.model=model_garden.OpenModel(model)deploy_options=model.list_deploy_options()print(deploy_options)# Example response:# [# dedicated_resources {# machine_spec {# machine_type: "g2-standard-12"# accelerator_type: NVIDIA_L4# accelerator_count: 1# }# }# container_spec {# ...# }# ...# ]
Deploy a model to an endpoint. Model Garden uses the default
deployment configuration unless you specify additional argument and values.
importvertexaifromvertexaiimportmodel_garden# TODO(developer): Update and un-comment below lines# PROJECT_ID = "your-project-id"vertexai.init(project=PROJECT_ID,location="us-central1")open_model=model_garden.OpenModel("google/gemma3@gemma-3-12b-it")endpoint=open_model.deploy(machine_type="g2-standard-48",accelerator_type="NVIDIA_L4",accelerator_count=4,accept_eula=True,)# Optional. Run predictions on the deployed endoint.# endpoint.predict(instances=[{"prompt": "What is Generative AI?"}])
REST
List all deployable models and then get the ID of the model to deploy. You can
then deploy the model with its default configuration and endpoint. Or, you can
choose to customize your deployment, such as setting a specific machine type or
using a dedicated endpoint.
1. List models that you can deploy
Before using any of the request data,
make the following replacements:
PROJECT_ID: Your Google Cloud project ID.
QUERY_PARAMETERS: To list Model Garden
models, add the following query parameters
listAllVersions=True&filter=can_deploy(true). To list
Hugging Face models, set the filter to
alt=json&is_hf_wildcard(true)+AND+labels.VERIFIED_DEPLOYMENT_CONFIG%3DVERIFIED_DEPLOYMENT_SUCCEED&listAllVersions=True.
HTTP method and URL:
GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS
To send your request, choose one of these options:
Deploy a model from Model Garden or a model from Hugging Face. You can
also customize the deployment by specifying additional JSON fields.
Deploy a model with its default configuration.
Before using any of the request data,
make the following replacements:
LOCATION: A region where the
model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The ID of the model to
deploy, which you can get from listing all the deployable models. The ID
uses the following format: publishers/PUBLISHER_NAME/models/
MODEL_NAME@MODEL_VERSION.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
To send your request, choose one of these options:
curl
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Before using any of the request data,
make the following replacements:
LOCATION: A region where the
model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The Hugging Face model
ID model to deploy, which you can get from listing all the deployable
models. The ID uses the following format:
PUBLISHER_NAME/MODEL_NAME.
ACCESS_TOKEN: If the model is
gated, provide an access
token.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
To send your request, choose one of these options:
curl
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Before using any of the request data,
make the following replacements:
LOCATION: A region where the
model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The ID of the model to
deploy, which you can get from listing all the deployable models. The ID
uses the following format: publishers/PUBLISHER_NAME/models/
MODEL_NAME@MODEL_VERSION, such as
google/gemma@gemma-2b or
stabilityai/stable-diffusion-xl-base-1.0.
MACHINE_TYPE: Defines the set
of resources to deploy for your model, such as g2-standard-4.
ACCELERATOR_TYPE:
Specifies accelerators to add to your deployment to help improve performance
when working with intensive workloads, such as NVIDIA_L4
ACCELERATOR_COUNT: The
number of accelerators to use in your deployment.
reservation_affinity_type: To use an existing
Compute Engine reservation for your deployment, specify any
reservation or a specific one. If you specify this value, don't specify
spot.
spot: Whether to use spot VMs for your deployment.
IMAGE_URI: The location of the
container image to use, such as
us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas
CONTAINER_ARGS: Arguments
to pass to the container during the deployment.
CONTAINER_PORT: A port
number for your container.
fast_tryout_enabled: When testing a model, you can choose to
use a faster deployment. This option is available only for the highly-used
models with certain machine types. If enabled, you cannot specify model or
deployment configurations.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
To send your request, choose one of these options:
curl
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Save the request body in a file named request.json.
Run the following command in the terminal to create or overwrite
this file in the current directory:
Before you begin, specify a quota project to run the following commands. The
commands you run are counted against the quotas for that project. For more
information, see Set the quota project.
List the models that you can deploy by running the gcloud ai model-garden
models list command. This command lists all model IDs and which ones you
can self deploy.
gcloudaimodel-gardenmodelslist
In the output, find the model ID to deploy. The following example shows an
abbreviated output.
MODEL_ID CAN_DEPLOY CAN_PREDICT
google/gemma2@gemma-2-27b Yes No
google/gemma2@gemma-2-27b-it Yes No
google/gemma2@gemma-2-2b Yes No
google/gemma2@gemma-2-2b-it Yes No
google/gemma2@gemma-2-9b Yes No
google/gemma2@gemma-2-9b-it Yes No
google/gemma3@gemma-3-12b-it Yes No
google/gemma3@gemma-3-12b-pt Yes No
google/gemma3@gemma-3-1b-it Yes No
google/gemma3@gemma-3-1b-pt Yes No
google/gemma3@gemma-3-27b-it Yes No
google/gemma3@gemma-3-27b-pt Yes No
google/gemma3@gemma-3-4b-it Yes No
google/gemma3@gemma-3-4b-pt Yes No
google/gemma3n@gemma-3n-e2b Yes No
google/gemma3n@gemma-3n-e2b-it Yes No
google/gemma3n@gemma-3n-e4b Yes No
google/gemma3n@gemma-3n-e4b-it Yes No
google/gemma@gemma-1.1-2b-it Yes No
google/gemma@gemma-1.1-2b-it-gg-hf Yes No
google/gemma@gemma-1.1-7b-it Yes No
google/gemma@gemma-1.1-7b-it-gg-hf Yes No
google/gemma@gemma-2b Yes No
google/gemma@gemma-2b-gg-hf Yes No
google/gemma@gemma-2b-it Yes No
google/gemma@gemma-2b-it-gg-hf Yes No
google/gemma@gemma-7b Yes No
google/gemma@gemma-7b-gg-hf Yes No
google/gemma@gemma-7b-it Yes No
google/gemma@gemma-7b-it-gg-hf Yes No
The output doesn't include any tuned models or Hugging Face models. To view
which Hugging Face models are supported, add the
--can-deploy-hugging-face-models flag.
To view the deployment specifications for a model, run the gcloud ai
model-garden models list-deployment-config command. You can view the
machine type, accelorator type, and container image URI that
Model Garden supports for a particular model.
Replace MODEL_ID with the model ID from the previous list
command, such as google/gemma@gemma-2b or
stabilityai/stable-diffusion-xl-base-1.0.
Deploy a model to an endpoint by running the gcloud ai model-garden models
deploy command. Model Garden generates a display name for your
endpoint and uses the default deployment configuration unless you specify
additional argument and values.
To run the command asynchronously, include the --asynchronous flag.
MODEL_ID: The model ID from the previous list command. For
Hugging Face models, use the Hugging Face model URL format, such as
stabilityai/stable-diffusion-xl-base-1.0.
MACHINE_TYPE: Defines the set of resources to deploy for your
model, such as g2-standard-4.
ACCELERATOR_TYPE: Specifies accelerators to add to your
deployment to help improve performance when working with intensive
workloads, such as NVIDIA_L4.
ENDPOINT_NAME: A name for the deployed Gemini Enterprise Agent Platform
endpoint.
HF_ACCESS_TOKEN: For Hugging Face models, if the model is
gated, provide an access token.
RESERVATION_RESOURCE_NAME: To use a specific
Compute Engine reservation, specify the name of your
reservation. If you specify a specific reservation, you can't specify
any-reservation.
The output includes the deployment configuration that Model Garden
used, the endpoint ID, and the deployment operation ID, which you can use to
check the deployment status.
Using the default deployment configuration:
Machine type: g2-standard-12
Accelerator type: NVIDIA_L4
Accelerator count: 1
The project has enough quota. The current usage of quota for accelerator type NVIDIA_L4 in region us-central1 is 0 out of 28.
Deploying the model to the endpoint. To check the deployment status, you can try one of the following methods:
1) Look for endpoint `ENDPOINT_DISPLAY_NAME` at the [Agent Platform] -> [Online prediction] tab in Cloud Console
2) Use `gcloud ai operations describe OPERATION_ID --region=LOCATION` to find the status of the deployment long-running operation
To see details about your deployment, run the gcloud ai endpoints list
--list-model-garden-endpoints-only command:
Replace LOCATION_ID with the region where you deployed the model.
The output includes all endpoints that were created from
Model Garden and includes information such as the endpoint ID,
endpoint name, and whether the endpoint is associated with a deployed model.
To find your deployment, look for the endpoint name that was returned from
the previous command.
After you apply the configuration, Terraform provisions a new
Agent Platform endpoint and deploys the specified open model.
Clean Up
To delete the endpoint and model deployment, run the following command:
terraformdestroy
Deploy a partner model and make prediction requests
In the Google Cloud console, go to the Model Garden page and use
the Model collections filter to view the Self-deploy partner models.
Choose from the list of self-deploy partner models, and purchase the model by
clicking Enable.
You must deploy on the partner's required machine types, as described in the
"Recommended hardware configuration" section on their Model Garden
model card. When deployed, the model serving resources are located in a secure
Google-managed project.
You can deploy models from Model Garden to a Private Service Connect
(PSC) endpoint to create a secure and private connection to your model. This
setup can also be integrated with an internal and external regional Application Load Balancer when deployed with a PSC Network Endpoint Group. Follow the steps below to configure a PSC endpoint for
your model, ensuring private connectivity.
In the Google Cloud console, go to the Model Garden page.
Click Deploy model.
In the Deploy model pane, the predefined deployment settings are based on a
public dedicated endpoint.
Select edit setting that enables further deployment options including
private access.
Configure your deployment settings.
Select a location to create your model endpoint in.
Accept or modify the generated model and endpoint name.
Selecting a machine type for your deployment is optional, as the
recommended configuration has been pre-selected for you.
For the Reservation type field, select a reservation type. The
reservation must match your specified machine specs.
Automatically use created reservation: Gemini Enterprise Agent Platform selects an
available reservation with matching properties. If no capacity is
available in the selected reservation, Gemini Enterprise Agent Platform uses the
general Google Cloud resource pool.
Select specific reservations: Gemini Enterprise Agent Platform uses a specific
reservation. If no capacity is available in your selected reservation,
the deployment fails.
No reservation (default): Gemini Enterprise Agent Platform uses the general
Google Cloud resource pool.
Configure your availability policies.
Standard: Ideal for most workloads.
Spot: Ideal for fault-tolerant workloads.
Flex Start: Uses Dynamic Workload Scheduling (DWS) to manage and
prioritize resource allocation requests
Configure endpoint Access for private networking.
Select Private (Private Service Connect)
Select Project IDs. To grant access to other projects, enter their
Project IDs here. If you leave this blank, the endpoint can only be
accessed from within the current project.
Click Deploy
To view your deployment, go to the Model Garden page and select
View my endpoints and models to see it listed in the My Endpoints
section. Make sure you have selected the correct region to ensure your
endpoint is visible. Select the endpoint, the status will show as
Deploying and will change to Ready once completed.
Obtain the Endpoint ID, then open Cloud Shell and perform the following to
obtain the Private Service Attachment URI:
user@cloudshell:$ gcloud ai endpoints describe 2124795225560842240 --region=europe-west4 | grep -i serviceAttachment:
Using endpoint [https://europe-west4-aiplatform.googleapis.com/]
serviceAttachment: projects/o9457b320a852208e-tp/regions/europe-west4/serviceAttachments/gkedpm-52065579567eaf39bfe24f25f7981d
After you get the service attachment, you have the following options to access
the model:
Deploy a PSC endpoint in the same VPC as the granted project. This approach
allows for reachability over hybrid networking and within the same VPC. It's
important to note that PSC endpoints are not reachable over VPC peering.
Deploy a Private Service Connect (PSC) Network Endpoint Group (NEG), you can
do so within the same VPC as the allowed project. This allows you to expose
the model through an internal or external load balancer, which provides
several benefits:
Access over VPC peering: The load balancer can be accessed across peered
VPC networks.
Security features: You get support for Cloud Armor and Model Armor to help
protect your endpoint.
Traffic management: It enables advanced traffic routing, such as host and
path rewriting.
Centralized access: A single Application Load Balancer can be used to
route traffic to the model using path rules.
View or manage an endpoint
To view and manage your endpoint, go to the Agent Platform
Online prediction page.
Agent Platform lists all endpoints in your project for a particular
region. Click an endpoint to view its details such as which models are deployed
to the endpoint.
Undeploy models and delete resources
To stop a deployed model from using resources in your project, undeploy your
model from its endpoint. You must undeploy a model before you can delete the
endpoint and the model.
Undeploy models
Undeploy a model from its endpoint.
Console
In the Google Cloud console, go to the Endpoints tab on the Online
prediction page.
LOCATION with your region, for example, "us-central1"
ENDPOINT_ID with your endpoint ID
fromgoogle.cloudimportaiplatformaiplatform.init(project=PROJECT_ID,location=LOCATION)# To find out which endpoints are available, un-comment the line below:# endpoints = aiplatform.Endpoint.list()endpoint=aiplatform.Endpoint(ENDPOINT_ID)endpoint.undeploy_all()
gcloud
In these commands, replace:
PROJECT_ID with your project name
LOCATION_ID with the region where you deployed the model and
endpoint
ENDPOINT_ID with the endpoint ID
MODEL_ID with the model ID from the list model command
DEPLOYED_MODEL_ID with the deployed model ID
Find the endpoint ID that is associated with your deployment by running the
gcloud ai endpoints list command.
Run the gcloud ai endpoints undeploy-model command to undeploy
the model from the endpoint by using the endpoint ID and the deployed model
ID from the previous commands.
LOCATION with your region, for example, "us-central1"
ENDPOINT_ID with your endpoint ID
fromgoogle.cloudimportaiplatformaiplatform.init(project=PROJECT_ID,location=LOCATION)# To find out which endpoints are available, un-comment the line below:# endpoints = aiplatform.Endpoint.list()endpoint=aiplatform.Endpoint(ENDPOINT_ID)endpoint.delete()
gcloud
In these commands, replace:
PROJECT_ID with your project name
LOCATION_ID with the region where you deployed the model and
endpoint
ENDPOINT_ID with the endpoint ID
Get the endpoint ID to delete by running the
gcloud ai endpoints list command. This command lists the
endpoint IDs for all endpoints in your project.
LOCATION with your region, for example, "us-central1"
MODEL_ID with your model ID
fromgoogle.cloudimportaiplatformaiplatform.init(project=PROJECT_ID,location=LOCATION)# To find out which models are available in Model Registry, un-comment the line below:# models = aiplatform.Model.list()model=aiplatform.Model(MODEL_ID)model.delete()
gcloud
In these commands, replace:
PROJECT_ID with your project name
LOCATION_ID with the region where you deployed the model and
endpoint
MODEL_ID with the model ID from the list model command
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-10 UTC."],[],[]]