Use models in Model Garden

Discover, test, tune, and deploy models by using Model Garden in the Google Cloud console. You can also deploy Model Garden models by using the Google Cloud CLI.

Send test prompts

In the Google Cloud console, go to the Model Garden page.

Go to Model Garden
Find a supported model that you want to test and click View details.
Click Open prompt design.

You're taken to the Prompt design page.
In Prompt, enter the prompt that you want to test.
Optional: Configure the model parameters.
Click Submit.

Tune a model

In the Google Cloud console, go to the Model Garden page.

Go to Model Garden
In Search models, enter Gemma 3, then click the magnifying glass to search.
Click View details on the Gemma 3 model card.
Click Open fine-tuning pipeline.

You're taken to the Agent Platform Pipelines page.
To start tuning, click Create run.

Tune in a notebook

The model cards for most open source foundation models and fine-tunable models support tuning in a notebook.

In the Google Cloud console, go to the Model Garden page.

Go to Model Garden
Find a supported model that you want to tune and go to its model card.
Click Open notebook.

Deploy an open model

You can deploy a model by using its model card in the Google Cloud console or programmatically.

For more information about setting up the Google Gen AI SDK or Google Cloud CLI, see the Google Gen AI SDK overview or Install the Google Cloud CLI.

Console

REST

List all deployable models and then get the ID of the model to deploy. You can then deploy the model with its default configuration and endpoint. Or, you can choose to customize your deployment, such as setting a specific machine type or using a dedicated endpoint.

1. List models that you can deploy

Before using any of the request data, make the following replacements:

PROJECT_ID: Your Google Cloud project ID.
QUERY_PARAMETERS: To list Model Garden models, add the following query parameters listAllVersions=True&filter=can_deploy(true). To list Hugging Face models, set the filter to alt=json&is_hf_wildcard(true)+AND+labels.VERIFIED_DEPLOYMENT_CONFIG%3DVERIFIED_DEPLOYMENT_SUCCEED&listAllVersions=True.

HTTP method and URL:

GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Execute the following command:

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "x-goog-user-project: PROJECT_ID" \
     "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS" | Select-Object -Expand Content

You receive a JSON response similar to the following.

{
  "publisherModels": [
    {
      "name": "publishers/google/models/gemma3",
      "versionId": "gemma-3-1b-it",
      "openSourceCategory": "GOOGLE_OWNED_OSS_WITH_GOOGLE_CHECKPOINT",
      "supportedActions": {
        "openNotebook": {
          "references": {
            "us-central1": {
              "uri": "https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gradio_streaming_chat_completions.ipynb"
            }
          },
          "resourceTitle": "Notebook",
          "resourceUseCase": "Chat Completion Playground",
          "resourceDescription": "Chat with deployed Gemma 2 endpoints via Gradio UI."
        },
        "deploy": {
          "modelDisplayName": "gemma-3-1b-it",
          "containerSpec": {
            "imageUri": "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01",
            "args": [
              "python",
              "-m",
              "vllm.entrypoints.api_server",
              "--host=0.0.0.0",
              "--port=8080",
              "--model=gs://vertex-model-garden-restricted-us/gemma3/gemma-3-1b-it",
              "--tensor-parallel-size=1",
              "--swap-space=16",
              "--gpu-memory-utilization=0.95",
              "--disable-log-stats"
            ],
            "env": [
              {
                "name": "MODEL_ID",
                "value": "google/gemma-3-1b-it"
              },
              {
                "name": "DEPLOY_SOURCE",
                "value": "UI_NATIVE_MODEL"
              }
            ],
            "ports": [
              {
                "containerPort": 8080
              }
            ],
            "predictRoute": "/generate",
            "healthRoute": "/ping"
          },
          "dedicatedResources": {
            "machineSpec": {
              "machineType": "g2-standard-12",
              "acceleratorType": "NVIDIA_L4",
              "acceleratorCount": 1
            }
          },
          "publicArtifactUri": "gs://vertex-model-garden-restricted-us/gemma3/gemma3.tar.gz",
          "deployTaskName": "vLLM 128K context",
          "deployMetadata": {
            "sampleRequest": "{\n    \"instances\": [\n        {\n          \"@requestFormat\": \"chatCompletions\",\n          \"messages\": [\n              {\n                  \"role\": \"user\",\n                  \"content\": \"What is machine learning?\"\n              }\n          ],\n          \"max_tokens\": 100\n        }\n    ]\n}\n"
          }
        },
        ...

2. Deploy a model

Deploy a model from Model Garden or a model from Hugging Face. You can also customize the deployment by specifying additional JSON fields.

Deploy a model with its default configuration.

Before using any of the request data, make the following replacements:

LOCATION: A region where the model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The ID of the model to deploy, which you can get from listing all the deployable models. The ID uses the following format: publishers/PUBLISHER_NAME/models/ MODEL_NAME@MODEL_VERSION.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

Request JSON body:

{
  "publisher_model_name": "MODEL_ID",
  "model_config": {
    "accept_eula": "true"
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

cat > request.json << 'EOF'
{
  "publisher_model_name": "MODEL_ID",
  "model_config": {
    "accept_eula": "true"
  }
}
EOF

Then execute the following command to send your REST request:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

PowerShell

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

@'
{
  "publisher_model_name": "MODEL_ID",
  "model_config": {
    "accept_eula": "true"
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

Then execute the following command to send your REST request:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

You receive a JSON response similar to the following.

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
    "genericMetadata": {
      "createTime": "2025-03-13T21:44:44.538780Z",
      "updateTime": "2025-03-13T21:44:44.538780Z"
    },
    "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
    "destination": "projects/PROJECT_ID/locations/LOCATION",
    "projectNumber": "PROJECT_ID"
  }
}

Deploy a Hugging Face model

Before using any of the request data, make the following replacements:

LOCATION: A region where the model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The Hugging Face model ID model to deploy, which you can get from listing all the deployable models. The ID uses the following format: PUBLISHER_NAME/MODEL_NAME.
ACCESS_TOKEN: If the model is gated, provide an access token.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

Request JSON body:

{
  "hugging_face_model_id": "MODEL_ID",
  "hugging_face_access_token": "ACCESS_TOKEN",
  "model_config": {
    "accept_eula": "true"
  }
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

cat > request.json << 'EOF'
{
  "hugging_face_model_id": "MODEL_ID",
  "hugging_face_access_token": "ACCESS_TOKEN",
  "model_config": {
    "accept_eula": "true"
  }
}
EOF

Then execute the following command to send your REST request:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

PowerShell

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

@'
{
  "hugging_face_model_id": "MODEL_ID",
  "hugging_face_access_token": "ACCESS_TOKEN",
  "model_config": {
    "accept_eula": "true"
  }
}
'@  | Out-File -FilePath request.json -Encoding utf8

Then execute the following command to send your REST request:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

You receive a JSON response similar to the following.

{
  "name": "projects/PROJECT_ID/locations/us-central1LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
    "genericMetadata": {
      "createTime": "2025-03-13T21:44:44.538780Z",
      "updateTime": "2025-03-13T21:44:44.538780Z"
    },
    "publisherModel": "publishers/PUBLISHER_NAME/model/MODEL_NAME",
    "destination": "projects/PROJECT_ID/locations/LOCATION",
    "projectNumber": "PROJECT_ID"
  }
}

Deploy a model with customizations

Before using any of the request data, make the following replacements:

LOCATION: A region where the model is deployed.
PROJECT_ID: Your Google Cloud project ID.
MODEL_ID: The ID of the model to deploy, which you can get from listing all the deployable models. The ID uses the following format: publishers/PUBLISHER_NAME/models/ MODEL_NAME@MODEL_VERSION, such as google/gemma@gemma-2b or stabilityai/stable-diffusion-xl-base-1.0.
MACHINE_TYPE: Defines the set of resources to deploy for your model, such as g2-standard-4.
ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as NVIDIA_L4
ACCELERATOR_COUNT: The number of accelerators to use in your deployment.
reservation_affinity_type: To use an existing Compute Engine reservation for your deployment, specify any reservation or a specific one. If you specify this value, don't specify spot.
spot: Whether to use spot VMs for your deployment.
IMAGE_URI: The location of the container image to use, such as us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas
CONTAINER_ARGS: Arguments to pass to the container during the deployment.
CONTAINER_PORT: A port number for your container.
fast_tryout_enabled: When testing a model, you can choose to use a faster deployment. This option is available only for the highly-used models with certain machine types. If enabled, you cannot specify model or deployment configurations.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

Request JSON body:

{
  "publisher_model_name": "MODEL_ID",
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE",
        "accelerator_type": "ACCELERATOR_TYPE",
        "accelerator_count": ACCELERATOR_COUNT,
        "reservation_affinity": {
          "reservation_affinity_type": "ANY_RESERVATION"
        }
      },
      "spot": "false"
    }
  },
  "model_config": {
    "accept_eula": "true",
    "container_spec": {
      "image_uri": "IMAGE_URI",
      "args": [CONTAINER_ARGS ],
      "ports": [
        {
          "container_port": CONTAINER_PORT
        }
      ]
    }
  },
  "deploy_config": {
    "fast_tryout_enabled": false
  },
}

To send your request, choose one of these options:

curl

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

cat > request.json << 'EOF'
{
  "publisher_model_name": "MODEL_ID",
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE",
        "accelerator_type": "ACCELERATOR_TYPE",
        "accelerator_count": ACCELERATOR_COUNT,
        "reservation_affinity": {
          "reservation_affinity_type": "ANY_RESERVATION"
        }
      },
      "spot": "false"
    }
  },
  "model_config": {
    "accept_eula": "true",
    "container_spec": {
      "image_uri": "IMAGE_URI",
      "args": [CONTAINER_ARGS ],
      "ports": [
        {
          "container_port": CONTAINER_PORT
        }
      ]
    }
  },
  "deploy_config": {
    "fast_tryout_enabled": false
  },
}
EOF

Then execute the following command to send your REST request:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

PowerShell

Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

@'
{
  "publisher_model_name": "MODEL_ID",
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "MACHINE_TYPE",
        "accelerator_type": "ACCELERATOR_TYPE",
        "accelerator_count": ACCELERATOR_COUNT,
        "reservation_affinity": {
          "reservation_affinity_type": "ANY_RESERVATION"
        }
      },
      "spot": "false"
    }
  },
  "model_config": {
    "accept_eula": "true",
    "container_spec": {
      "image_uri": "IMAGE_URI",
      "args": [CONTAINER_ARGS ],
      "ports": [
        {
          "container_port": CONTAINER_PORT
        }
      ]
    }
  },
  "deploy_config": {
    "fast_tryout_enabled": false
  },
}
'@  | Out-File -FilePath request.json -Encoding utf8

Then execute the following command to send your REST request:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

You receive a JSON response similar to the following.

{
  "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
    "genericMetadata": {
      "createTime": "2025-03-13T21:44:44.538780Z",
      "updateTime": "2025-03-13T21:44:44.538780Z"
    },
    "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
    "destination": "projects/PROJECT_ID/locations/LOCATION",
    "projectNumber": "PROJECT_ID"
  }
}

gcloud

Before you begin, specify a quota project to run the following commands. The commands you run are counted against the quotas for that project. For more information, see Set the quota project.

Console

In the Google Cloud console, go to the Model Garden page.

Go to Model Garden
To find a specific model, enter its name in the Model Garden search box.
To view all the models that you can self-deploy, in the Model collections section of the filter pane, select Self-deploy partner models. The resulting list includes all the self-deployable partner models.
Click the name of the model to deploy, which opens its model card.
Click Deploy options.
In the Deploy on Agent Platform pane, configure your deployment such as the location and machine type.
Click Deploy.

After the deployment is complete, you can request predictions by using the SDK or API. Additional instructions are available in the "Documentation" section on the model card.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

1-click model deployment

You can self-deploy a partner model with 1-click deployment.

View the deployment specifications for a model by using the model ID from the previous step. You can view the machine type, accelerator type, and container image URI that Model Garden has verified for a particular model.

import vertexai
from vertexai import model_garden

vertexai.init(project=PROJECT_ID, location="us-central1")

model = model_garden.PartnerModel(model)
deploy_options = model.list_deploy_options()
print(deploy_options)

Deploy a model with 1-click deployment.

from vertexai import model_garden

# Deploy model
model = model_garden.PartnerModel(f"{PUBLISHER}/{MODEL}@{VERSION}")

endpoint = model.deploy(
    machine_type=MACHINE_TYPE,
    accelerator_type=ACCELERATOR_TYPE,
    accelerator_count=ACCELERATOR_COUNT,
    min_replica_count=1,
    max_replica_count=1,
)

Multi-step deployment

You can also upload a partner model, create an endpoint, and manually deploy the model.

In your code, replace the following placeholders:

LOCATION: The region where you plan to deploy the model and endpoint.
PROJECT_ID: Your project ID.
DISPLAY_NAME: A descriptive name for the associated resource.
PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
PUBLISHER_MODEL_NAME: The name of the model to upload.
MACHINE_TYPE: Defines the set of resources to deploy for your model, such as g2-standard-4. You must match one of the confirgurations provided by the partner.
ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as NVIDIA_L4. You must match one of the confirgurations provided by the partner.
ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

# Upload a model
model = aiplatform.Model.upload(
    display_name="DISPLAY_NAME_MODEL",
    model_garden_source_model_name = f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME",
)

# Create endpoint
my_endpoint = aiplatform.Endpoint.create(display_name="DISPLAY_NAME_ENDPOINT")

# Deploy model
MACHINE_TYPE = "MACHINE_TYPE"  # @param {type: "string"}
ACCELERATOR_TYPE = "ACCELERATOR_TYPE" # @param {type: "string"}
ACCELERATOR_COUNT = ACCELERATOR_COUNT # @param {type: "number"}

model.deploy(
    endpoint=my_endpoint,
    deployed_model_display_name="DISPLAY_NAME_DEPLOYED_MODEL",
    traffic_split={"0": 100},
    machine_type=MACHINE_TYPE,
    accelerator_type=ACCELERATOR_TYPE,
    accelerator_count=ACCELERATOR_COUNT,
    min_replica_count=1,
    max_replica_count=1,
)

# Unary call for predictions
PAYLOAD = {
    REQUEST_PAYLOAD
}

request = json.dumps(PAYLOAD)

response = my_endpoint.raw_predict(
    body = request,
    headers = {'Content-Type':'application/json'}
)

print(response)

# Streaming call for predictions
PAYLOAD = {
    REQUEST_PAYLOAD
}

request = json.dumps(PAYLOAD)

for stream_response in my_endpoint.stream_raw_predict(
    body = request,
    headers = {'Content-Type':'application/json'}
):
    print(stream_response)

REST

In the sample curl commands, replace the following placeholders:

LOCATION: The region where you plan to deploy the model and endpoint.
PROJECT_ID: Your project ID.
DISPLAY_NAME: A descriptive name for the associated resource.
PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
PUBLISHER_MODEL_NAME: The name of the model to upload.
ENDPOINT_ID: The ID of the endpoint.
MACHINE_TYPE: Defines the set of resources to deploy for your model, such as g2-standard-4. You must match one of the confirgurations provided by the partner.
ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as NVIDIA_L4. You must match one of the confirgurations provided by the partner.
ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.

Upload a model to add it to your Model Registry.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/models:upload \
-d '{
"model": {
  "displayName": "DISPLAY_NAME_MODEL",
  "baseModelSource": {
    "modelGardenSource": {
      "publicModelName": f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME",
    }
  }
}
}'

Create an endpoint.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints \
-d '{
"displayName": "DISPLAY_NAME_ENDPOINT"
}'

Deploy the uploaded model to the endpoint.

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel \
-d '{
"deployedModel": {
  "model": f"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
  "displayName": "DISPLAY_NAME_DEPLOYED_MODEL",
  "dedicatedResources": {
   "machineSpec": {
      "machineType": "MACHINE_TYPE",
      "acceleratorType": "ACCELERATOR_TYPE",
      "acceleratorCount":"ACCELERATOR_COUNT",
   },
   "minReplicaCount": 1,
   "maxReplicaCount": 1
  },
},
"trafficSplit": {
  "0": 100
}
}'

After the model is deployed, you can make an unary or streaming call for predictions. View the partner's Model Garden model card to see which API methods are supported.

Sample unary call:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:rawPredict \
-d 'REQUEST_PAYLOAD'

Sample streaming call:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:streamRawPredict \
-d 'REQUEST_PAYLOAD'

Console

gcloud

Console

gcloud

Console

View code samples

Most of the model cards for task-specific solutions models contain code samples that you can copy and test.

Create a vision app

The model cards for applicable computer vision models support creating a vision application.

Use models in Model Garden Stay organized with collections Save and categorize content based on your preferences.

Send test prompts

Tune a model

Tune in a notebook

Deploy an open model

Console

Python

REST

1. List models that you can deploy

curl

PowerShell

2. Deploy a model

Deploy a model with its default configuration.

curl

PowerShell

Deploy a Hugging Face model

curl

PowerShell

Deploy a model with customizations

curl

PowerShell

gcloud

Terraform

Deploy a model

Apply the Configuration

Clean Up

Deploy a partner model and make prediction requests

Console

Python

1-click model deployment

Multi-step deployment

REST

Deploy a model to a private endpoint

View or manage an endpoint

Undeploy models and delete resources

Undeploy models

Console

Python

gcloud

Delete endpoints

Console

Python

gcloud

Delete models

Console

Python

gcloud

View code samples

Create a vision app