Use models in Model Garden

Discover, test, tune, and deploy models by using Model Garden in the Google Cloud console. You can also deploy Model Garden models by using the Google Cloud CLI.

Send test prompts

  1. In the Google Cloud console, go to the Model Garden page.

    Go to Model Garden

  2. Find a supported model that you want to test and click View details.

  3. Click Open prompt design.

    You're taken to the Prompt design page.

  4. In Prompt, enter the prompt that you want to test.

  5. Optional: Configure the model parameters.

  6. Click Submit.

Tune a model

  1. In the Google Cloud console, go to the Model Garden page.

    Go to Model Garden

  2. In Search models, enter Gemma 3, then click the magnifying glass to search.

  3. Click View details on the Gemma 3 model card.

  4. Click Open fine-tuning pipeline.

    You're taken to the Agent Platform Pipelines page.

  5. To start tuning, click Create run.

Tune in a notebook

The model cards for most open source foundation models and fine-tunable models support tuning in a notebook.

  1. In the Google Cloud console, go to the Model Garden page.

    Go to Model Garden

  2. Find a supported model that you want to tune and go to its model card.

  3. Click Open notebook.

Deploy an open model

You can deploy a model by using its model card in the Google Cloud console or programmatically.

For more information about setting up the Google Gen AI SDK or Google Cloud CLI, see the Google Gen AI SDK overview or Install the Google Cloud CLI.

Console

  • In the Google Cloud console, go to the Model Garden page.

    Go to Model Garden

  • Find a supported model that you want to deploy, and click its model card.

  • Click Deploy to open the Deploy model pane.

  • In the Deploy model pane, specify details for your deployment.

    1. Use or modify the generated model and endpoint names.
    2. Select a location to create your model endpoint in.
    3. Select a machine type to use for each node of your deployment.
    4. To use a Compute Engine reservation, under the Deployment settings section, select Advanced.

    For the Reservation type field, select a reservation type. The reservation must match your specified machine specs.

  • Click Deploy.

    Python

    To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

    1. List the models that you can deploy and record the model ID to deploy. You can optionally list the supported Hugging Face models in Model Garden and even filter them by model names. The output doesn't include any tuned models.

      
      import vertexai
      from vertexai import model_garden
      
      # TODO(developer): Update and un-comment below lines
      # PROJECT_ID = "your-project-id"
      vertexai.init(project=PROJECT_ID, location="us-central1")
      
      # List deployable models, optionally list Hugging Face models only or filter by model name.
      deployable_models = model_garden.list_deployable_models(list_hf_models=False, model_filter="gemma")
      print(deployable_models)
      # Example response:
      # ['google/gemma2@gemma-2-27b','google/gemma2@gemma-2-27b-it', ...]
      
    2. View the deployment specifications for a model by using the model ID from the previous step. You can view the machine type, accelerator type, and container image URI that Model Garden has verified for a particular model.

      
      import vertexai
      from vertexai import model_garden
      
      # TODO(developer): Update and un-comment below lines
      # PROJECT_ID = "your-project-id"
      # model = "google/gemma3@gemma-3-1b-it"
      vertexai.init(project=PROJECT_ID, location="us-central1")
      
      # For Hugging Face modelsm the format is the Hugging Face model name, as in
      # "meta-llama/Llama-3.3-70B-Instruct".
      # Go to https://console.cloud.google.com/vertex-ai/model-garden to find all deployable
      # model names.
      
      model = model_garden.OpenModel(model)
      deploy_options = model.list_deploy_options()
      print(deploy_options)
      # Example response:
      # [
      #   dedicated_resources {
      #     machine_spec {
      #       machine_type: "g2-standard-12"
      #       accelerator_type: NVIDIA_L4
      #       accelerator_count: 1
      #     }
      #   }
      #   container_spec {
      #     ...
      #   }
      #   ...
      # ]
      
    3. Deploy a model to an endpoint. Model Garden uses the default deployment configuration unless you specify additional argument and values.

      
      import vertexai
      from vertexai import model_garden
      
      # TODO(developer): Update and un-comment below lines
      # PROJECT_ID = "your-project-id"
      vertexai.init(project=PROJECT_ID, location="us-central1")
      
      open_model = model_garden.OpenModel("google/gemma3@gemma-3-12b-it")
      endpoint = open_model.deploy(
          machine_type="g2-standard-48",
          accelerator_type="NVIDIA_L4",
          accelerator_count=4,
          accept_eula=True,
      )
      
      # Optional. Run predictions on the deployed endoint.
      # endpoint.predict(instances=[{"prompt": "What is Generative AI?"}])
      

    REST

    List all deployable models and then get the ID of the model to deploy. You can then deploy the model with its default configuration and endpoint. Or, you can choose to customize your deployment, such as setting a specific machine type or using a dedicated endpoint.

    1. List models that you can deploy

    Before using any of the request data, make the following replacements:

    HTTP method and URL:

    GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS

    To send your request, choose one of these options:

    curl

    Execute the following command:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "x-goog-user-project: PROJECT_ID" \
    "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS"

    PowerShell

    Execute the following command:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }

    Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS" | Select-Object -Expand Content

    You receive a JSON response similar to the following.

    {
      "publisherModels": [
        {
          "name": "publishers/google/models/gemma3",
          "versionId": "gemma-3-1b-it",
          "openSourceCategory": "GOOGLE_OWNED_OSS_WITH_GOOGLE_CHECKPOINT",
          "supportedActions": {
            "openNotebook": {
              "references": {
                "us-central1": {
                  "uri": "https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gradio_streaming_chat_completions.ipynb"
                }
              },
              "resourceTitle": "Notebook",
              "resourceUseCase": "Chat Completion Playground",
              "resourceDescription": "Chat with deployed Gemma 2 endpoints via Gradio UI."
            },
            "deploy": {
              "modelDisplayName": "gemma-3-1b-it",
              "containerSpec": {
                "imageUri": "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01",
                "args": [
                  "python",
                  "-m",
                  "vllm.entrypoints.api_server",
                  "--host=0.0.0.0",
                  "--port=8080",
                  "--model=gs://vertex-model-garden-restricted-us/gemma3/gemma-3-1b-it",
                  "--tensor-parallel-size=1",
                  "--swap-space=16",
                  "--gpu-memory-utilization=0.95",
                  "--disable-log-stats"
                ],
                "env": [
                  {
                    "name": "MODEL_ID",
                    "value": "google/gemma-3-1b-it"
                  },
                  {
                    "name": "DEPLOY_SOURCE",
                    "value": "UI_NATIVE_MODEL"
                  }
                ],
                "ports": [
                  {
                    "containerPort": 8080
                  }
                ],
                "predictRoute": "/generate",
                "healthRoute": "/ping"
              },
              "dedicatedResources": {
                "machineSpec": {
                  "machineType": "g2-standard-12",
                  "acceleratorType": "NVIDIA_L4",
                  "acceleratorCount": 1
                }
              },
              "publicArtifactUri": "gs://vertex-model-garden-restricted-us/gemma3/gemma3.tar.gz",
              "deployTaskName": "vLLM 128K context",
              "deployMetadata": {
                "sampleRequest": "{\n    \"instances\": [\n        {\n          \"@requestFormat\": \"chatCompletions\",\n          \"messages\": [\n              {\n                  \"role\": \"user\",\n                  \"content\": \"What is machine learning?\"\n              }\n          ],\n          \"max_tokens\": 100\n        }\n    ]\n}\n"
              }
            },
            ...
    

    2. Deploy a model

    Deploy a model from Model Garden or a model from Hugging Face. You can also customize the deployment by specifying additional JSON fields.

    Deploy a model with its default configuration.

    Before using any of the request data, make the following replacements:

    HTTP method and URL:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

    Request JSON body:

    {
      "publisher_model_name": "MODEL_ID",
      "model_config": {
        "accept_eula": "true"
      }
    }
    

    To send your request, choose one of these options:

    curl

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    cat > request.json << 'EOF'
    {
      "publisher_model_name": "MODEL_ID",
      "model_config": {
        "accept_eula": "true"
      }
    }
    EOF

    Then execute the following command to send your REST request:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

    PowerShell

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    @'
    {
      "publisher_model_name": "MODEL_ID",
      "model_config": {
        "accept_eula": "true"
      }
    }
    '@  | Out-File -FilePath request.json -Encoding utf8

    Then execute the following command to send your REST request:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

    You receive a JSON response similar to the following.

    {
      "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
        "genericMetadata": {
          "createTime": "2025-03-13T21:44:44.538780Z",
          "updateTime": "2025-03-13T21:44:44.538780Z"
        },
        "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
        "destination": "projects/PROJECT_ID/locations/LOCATION",
        "projectNumber": "PROJECT_ID"
      }
    }
    

    Deploy a Hugging Face model

    Before using any of the request data, make the following replacements:

    HTTP method and URL:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

    Request JSON body:

    {
      "hugging_face_model_id": "MODEL_ID",
      "hugging_face_access_token": "ACCESS_TOKEN",
      "model_config": {
        "accept_eula": "true"
      }
    }
    

    To send your request, choose one of these options:

    curl

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    cat > request.json << 'EOF'
    {
      "hugging_face_model_id": "MODEL_ID",
      "hugging_face_access_token": "ACCESS_TOKEN",
      "model_config": {
        "accept_eula": "true"
      }
    }
    EOF

    Then execute the following command to send your REST request:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

    PowerShell

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    @'
    {
      "hugging_face_model_id": "MODEL_ID",
      "hugging_face_access_token": "ACCESS_TOKEN",
      "model_config": {
        "accept_eula": "true"
      }
    }
    '@  | Out-File -FilePath request.json -Encoding utf8

    Then execute the following command to send your REST request:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

    You receive a JSON response similar to the following.

    {
      "name": "projects/PROJECT_ID/locations/us-central1LOCATION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
        "genericMetadata": {
          "createTime": "2025-03-13T21:44:44.538780Z",
          "updateTime": "2025-03-13T21:44:44.538780Z"
        },
        "publisherModel": "publishers/PUBLISHER_NAME/model/MODEL_NAME",
        "destination": "projects/PROJECT_ID/locations/LOCATION",
        "projectNumber": "PROJECT_ID"
      }
    }
    

    Deploy a model with customizations

    Before using any of the request data, make the following replacements:

    HTTP method and URL:

    POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy

    Request JSON body:

    {
      "publisher_model_name": "MODEL_ID",
      "deploy_config": {
        "dedicated_resources": {
          "machine_spec": {
            "machine_type": "MACHINE_TYPE",
            "accelerator_type": "ACCELERATOR_TYPE",
            "accelerator_count": ACCELERATOR_COUNT,
            "reservation_affinity": {
              "reservation_affinity_type": "ANY_RESERVATION"
            }
          },
          "spot": "false"
        }
      },
      "model_config": {
        "accept_eula": "true",
        "container_spec": {
          "image_uri": "IMAGE_URI",
          "args": [CONTAINER_ARGS ],
          "ports": [
            {
              "container_port": CONTAINER_PORT
            }
          ]
        }
      },
      "deploy_config": {
        "fast_tryout_enabled": false
      },
    }
    

    To send your request, choose one of these options:

    curl

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    cat > request.json << 'EOF'
    {
      "publisher_model_name": "MODEL_ID",
      "deploy_config": {
        "dedicated_resources": {
          "machine_spec": {
            "machine_type": "MACHINE_TYPE",
            "accelerator_type": "ACCELERATOR_TYPE",
            "accelerator_count": ACCELERATOR_COUNT,
            "reservation_affinity": {
              "reservation_affinity_type": "ANY_RESERVATION"
            }
          },
          "spot": "false"
        }
      },
      "model_config": {
        "accept_eula": "true",
        "container_spec": {
          "image_uri": "IMAGE_URI",
          "args": [CONTAINER_ARGS ],
          "ports": [
            {
              "container_port": CONTAINER_PORT
            }
          ]
        }
      },
      "deploy_config": {
        "fast_tryout_enabled": false
      },
    }
    EOF

    Then execute the following command to send your REST request:

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json; charset=utf-8" \
    -d @request.json \
    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"

    PowerShell

    Save the request body in a file named request.json. Run the following command in the terminal to create or overwrite this file in the current directory:

    @'
    {
      "publisher_model_name": "MODEL_ID",
      "deploy_config": {
        "dedicated_resources": {
          "machine_spec": {
            "machine_type": "MACHINE_TYPE",
            "accelerator_type": "ACCELERATOR_TYPE",
            "accelerator_count": ACCELERATOR_COUNT,
            "reservation_affinity": {
              "reservation_affinity_type": "ANY_RESERVATION"
            }
          },
          "spot": "false"
        }
      },
      "model_config": {
        "accept_eula": "true",
        "container_spec": {
          "image_uri": "IMAGE_URI",
          "args": [CONTAINER_ARGS ],
          "ports": [
            {
              "container_port": CONTAINER_PORT
            }
          ]
        }
      },
      "deploy_config": {
        "fast_tryout_enabled": false
      },
    }
    '@  | Out-File -FilePath request.json -Encoding utf8

    Then execute the following command to send your REST request:

    $cred = gcloud auth print-access-token
    $headers = @{ "Authorization" = "Bearer $cred" }

    Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content

    You receive a JSON response similar to the following.

    {
      "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
      "metadata": {
        "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata",
        "genericMetadata": {
          "createTime": "2025-03-13T21:44:44.538780Z",
          "updateTime": "2025-03-13T21:44:44.538780Z"
        },
        "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it",
        "destination": "projects/PROJECT_ID/locations/LOCATION",
        "projectNumber": "PROJECT_ID"
      }
    }
    

    gcloud

    Before you begin, specify a quota project to run the following commands. The commands you run are counted against the quotas for that project. For more information, see Set the quota project.

    1. List the models that you can deploy by running the gcloud ai model-garden models list command. This command lists all model IDs and which ones you can self deploy.

      gcloud ai model-garden models list
      

      In the output, find the model ID to deploy. The following example shows an abbreviated output.

      MODEL_ID                                      CAN_DEPLOY  CAN_PREDICT
      google/gemma2@gemma-2-27b                     Yes         No
      google/gemma2@gemma-2-27b-it                  Yes         No
      google/gemma2@gemma-2-2b                      Yes         No
      google/gemma2@gemma-2-2b-it                   Yes         No
      google/gemma2@gemma-2-9b                      Yes         No
      google/gemma2@gemma-2-9b-it                   Yes         No
      google/gemma3@gemma-3-12b-it                  Yes         No
      google/gemma3@gemma-3-12b-pt                  Yes         No
      google/gemma3@gemma-3-1b-it                   Yes         No
      google/gemma3@gemma-3-1b-pt                   Yes         No
      google/gemma3@gemma-3-27b-it                  Yes         No
      google/gemma3@gemma-3-27b-pt                  Yes         No
      google/gemma3@gemma-3-4b-it                   Yes         No
      google/gemma3@gemma-3-4b-pt                   Yes         No
      google/gemma3n@gemma-3n-e2b                   Yes         No
      google/gemma3n@gemma-3n-e2b-it                Yes         No
      google/gemma3n@gemma-3n-e4b                   Yes         No
      google/gemma3n@gemma-3n-e4b-it                Yes         No
      google/gemma@gemma-1.1-2b-it                  Yes         No
      google/gemma@gemma-1.1-2b-it-gg-hf            Yes         No
      google/gemma@gemma-1.1-7b-it                  Yes         No
      google/gemma@gemma-1.1-7b-it-gg-hf            Yes         No
      google/gemma@gemma-2b                         Yes         No
      google/gemma@gemma-2b-gg-hf                   Yes         No
      google/gemma@gemma-2b-it                      Yes         No
      google/gemma@gemma-2b-it-gg-hf                Yes         No
      google/gemma@gemma-7b                         Yes         No
      google/gemma@gemma-7b-gg-hf                   Yes         No
      google/gemma@gemma-7b-it                      Yes         No
      google/gemma@gemma-7b-it-gg-hf                Yes         No
      

      The output doesn't include any tuned models or Hugging Face models. To view which Hugging Face models are supported, add the --can-deploy-hugging-face-models flag.

    2. To view the deployment specifications for a model, run the gcloud ai model-garden models list-deployment-config command. You can view the machine type, accelorator type, and container image URI that Model Garden supports for a particular model.

      gcloud ai model-garden models list-deployment-config \
          --model=MODEL_ID
      

      Replace MODEL_ID with the model ID from the previous list command, such as google/gemma@gemma-2b or stabilityai/stable-diffusion-xl-base-1.0.

    3. Deploy a model to an endpoint by running the gcloud ai model-garden models deploy command. Model Garden generates a display name for your endpoint and uses the default deployment configuration unless you specify additional argument and values.

      To run the command asynchronously, include the --asynchronous flag.

      gcloud ai model-garden models deploy \
          --model=MODEL_ID \
          [--machine-type=MACHINE_TYPE] \
          [--accelerator-type=ACCELERATOR_TYPE] \
          [--endpoint-display-name=ENDPOINT_NAME] \
          [--hugging-face-access-token=HF_ACCESS_TOKEN] \
          [--reservation-affinity reservation-affinity-type=any-reservation] \
          [--reservation-affinity reservation-affinity-type=specific-reservation, key="compute.googleapis.com/reservation-name", values=RESERVATION_RESOURCE_NAME] \
          [--asynchronous]
      

      Replace the following placeholders:

      The output includes the deployment configuration that Model Garden used, the endpoint ID, and the deployment operation ID, which you can use to check the deployment status.

      Using the default deployment configuration:
       Machine type: g2-standard-12
       Accelerator type: NVIDIA_L4
       Accelerator count: 1
      
      The project has enough quota. The current usage of quota for accelerator type NVIDIA_L4 in region us-central1 is 0 out of 28.
      
      Deploying the model to the endpoint. To check the deployment status, you can try one of the following methods:
      1) Look for endpoint `ENDPOINT_DISPLAY_NAME` at the [Agent Platform] -> [Online prediction] tab in Cloud Console
      2) Use `gcloud ai operations describe OPERATION_ID --region=LOCATION` to find the status of the deployment long-running operation
      
    4. To see details about your deployment, run the gcloud ai endpoints list --list-model-garden-endpoints-only command:

      gcloud ai endpoints list --list-model-garden-endpoints-only \
          --region=LOCATION_ID
      

      Replace LOCATION_ID with the region where you deployed the model.

      The output includes all endpoints that were created from Model Garden and includes information such as the endpoint ID, endpoint name, and whether the endpoint is associated with a deployed model. To find your deployment, look for the endpoint name that was returned from the previous command.

      Terraform

      To learn how to apply or remove a Terraform configuration, see Basic Terraform commands. For more information, see the Terraform provider reference documentation.

      Deploy a model

      The following example deploys the gemma-3-1b-it model to a new Agent Platform endpoint in us-central1 by using default configurations.

      terraform {
        required_providers {
          google = {
            source = "hashicorp/google"
            version = "6.45.0"
          }
        }
      }
      
      provider "google" {
        region  = "us-central1"
      }
      
      resource "google_vertex_ai_endpoint_with_model_garden_deployment" "gemma_deployment" {
        publisher_model_name = "publishers/google/models/gemma3@gemma-3-1b-it"
        location = "us-central1"
        model_config {
          accept_eula = True
        }
      }
      

      To deploy a model with customization, see Agent Platform Endpoint with Model Garden Deployment for details.

      Apply the Configuration

      terraform init
      terraform plan
      terraform apply
      

      After you apply the configuration, Terraform provisions a new Agent Platform endpoint and deploys the specified open model.

      Clean Up

      To delete the endpoint and model deployment, run the following command:

      terraform destroy
      

      Deploy a partner model and make prediction requests

      In the Google Cloud console, go to the Model Garden page and use the Model collections filter to view the Self-deploy partner models. Choose from the list of self-deploy partner models, and purchase the model by clicking Enable.

      You must deploy on the partner's required machine types, as described in the "Recommended hardware configuration" section on their Model Garden model card. When deployed, the model serving resources are located in a secure Google-managed project.

      Console

      1. In the Google Cloud console, go to the Model Garden page.

        Go to Model Garden

      2. To find a specific model, enter its name in the Model Garden search box.

      3. To view all the models that you can self-deploy, in the Model collections section of the filter pane, select Self-deploy partner models. The resulting list includes all the self-deployable partner models.

      4. Click the name of the model to deploy, which opens its model card.

      5. Click Deploy options.

      6. In the Deploy on Agent Platform pane, configure your deployment such as the location and machine type.

      7. Click Deploy.

      After the deployment is complete, you can request predictions by using the SDK or API. Additional instructions are available in the "Documentation" section on the model card.

      Python

      To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

      1-click model deployment

      You can self-deploy a partner model with 1-click deployment.

      1. View the deployment specifications for a model by using the model ID from the previous step. You can view the machine type, accelerator type, and container image URI that Model Garden has verified for a particular model.

        import vertexai
        from vertexai import model_garden
        
        vertexai.init(project=PROJECT_ID, location="us-central1")
        
        model = model_garden.PartnerModel(model)
        deploy_options = model.list_deploy_options()
        print(deploy_options)
        
      2. Deploy a model with 1-click deployment.

        from vertexai import model_garden
        
        # Deploy model
        model = model_garden.PartnerModel(f"{PUBLISHER}/{MODEL}@{VERSION}")
        
        endpoint = model.deploy(
            machine_type=MACHINE_TYPE,
            accelerator_type=ACCELERATOR_TYPE,
            accelerator_count=ACCELERATOR_COUNT,
            min_replica_count=1,
            max_replica_count=1,
        )
        

      Multi-step deployment

      You can also upload a partner model, create an endpoint, and manually deploy the model.

      In your code, replace the following placeholders:

      • LOCATION: The region where you plan to deploy the model and endpoint.
      • PROJECT_ID: Your project ID.
      • DISPLAY_NAME: A descriptive name for the associated resource.
      • PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
      • PUBLISHER_MODEL_NAME: The name of the model to upload.
      • MACHINE_TYPE: Defines the set of resources to deploy for your model, such as g2-standard-4. You must match one of the confirgurations provided by the partner.
      • ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as NVIDIA_L4. You must match one of the confirgurations provided by the partner.
      • ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
      • REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
      from google.cloud import aiplatform
      
      aiplatform.init(project=PROJECT_ID, location=LOCATION)
      
      # Upload a model
      model = aiplatform.Model.upload(
          display_name="DISPLAY_NAME_MODEL",
          model_garden_source_model_name = f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME",
      )
      
      # Create endpoint
      my_endpoint = aiplatform.Endpoint.create(display_name="DISPLAY_NAME_ENDPOINT")
      
      # Deploy model
      MACHINE_TYPE = "MACHINE_TYPE"  # @param {type: "string"}
      ACCELERATOR_TYPE = "ACCELERATOR_TYPE" # @param {type: "string"}
      ACCELERATOR_COUNT = ACCELERATOR_COUNT # @param {type: "number"}
      
      model.deploy(
          endpoint=my_endpoint,
          deployed_model_display_name="DISPLAY_NAME_DEPLOYED_MODEL",
          traffic_split={"0": 100},
          machine_type=MACHINE_TYPE,
          accelerator_type=ACCELERATOR_TYPE,
          accelerator_count=ACCELERATOR_COUNT,
          min_replica_count=1,
          max_replica_count=1,
      )
      
      # Unary call for predictions
      PAYLOAD = {
          REQUEST_PAYLOAD
      }
      
      request = json.dumps(PAYLOAD)
      
      response = my_endpoint.raw_predict(
          body = request,
          headers = {'Content-Type':'application/json'}
      )
      
      print(response)
      
      # Streaming call for predictions
      PAYLOAD = {
          REQUEST_PAYLOAD
      }
      
      request = json.dumps(PAYLOAD)
      
      for stream_response in my_endpoint.stream_raw_predict(
          body = request,
          headers = {'Content-Type':'application/json'}
      ):
          print(stream_response)
      

      REST

      List all deployable models and then get the ID of the model to deploy. You can then deploy the model with its default configuration and endpoint. Or, you can choose to customize your deployment, such as setting a specific machine type or using a dedicated endpoint.

      In the sample curl commands, replace the following placeholders:

      • LOCATION: The region where you plan to deploy the model and endpoint.
      • PROJECT_ID: Your project ID.
      • DISPLAY_NAME: A descriptive name for the associated resource.
      • PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
      • PUBLISHER_MODEL_NAME: The name of the model to upload.
      • ENDPOINT_ID: The ID of the endpoint.
      • MACHINE_TYPE: Defines the set of resources to deploy for your model, such as g2-standard-4. You must match one of the confirgurations provided by the partner.
      • ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as NVIDIA_L4. You must match one of the confirgurations provided by the partner.
      • ACCELERATOR_COUNT: The number of accelerators to use. You must match one of the confirgurations provided by the partner.
      • REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
      1. Upload a model to add it to your Model Registry.

        curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json" \
        https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/models:upload \
        -d '{
        "model": {
          "displayName": "DISPLAY_NAME_MODEL",
          "baseModelSource": {
            "modelGardenSource": {
              "publicModelName": f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME",
            }
          }
        }
        }'
        
      2. Create an endpoint.

        curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json" \
        https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints \
        -d '{
        "displayName": "DISPLAY_NAME_ENDPOINT"
        }'
        
      3. Deploy the uploaded model to the endpoint.

        curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json" \
        https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel \
        -d '{
        "deployedModel": {
          "model": f"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID",
          "displayName": "DISPLAY_NAME_DEPLOYED_MODEL",
          "dedicatedResources": {
           "machineSpec": {
              "machineType": "MACHINE_TYPE",
              "acceleratorType": "ACCELERATOR_TYPE",
              "acceleratorCount":"ACCELERATOR_COUNT",
           },
           "minReplicaCount": 1,
           "maxReplicaCount": 1
          },
        },
        "trafficSplit": {
          "0": 100
        }
        }'
        
      4. After the model is deployed, you can make an unary or streaming call for predictions. View the partner's Model Garden model card to see which API methods are supported.

        • Sample unary call:
        curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json" \
        https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:rawPredict \
        -d 'REQUEST_PAYLOAD'
        
        • Sample streaming call:
        curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json" \
        https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:streamRawPredict \
        -d 'REQUEST_PAYLOAD'
        

      Deploy a model to a private endpoint

      You can deploy models from Model Garden to a Private Service Connect (PSC) endpoint to create a secure and private connection to your model. This setup can also be integrated with an internal and external regional Application Load Balancer when deployed with a PSC Network Endpoint Group. Follow the steps below to configure a PSC endpoint for your model, ensuring private connectivity.

      1. In the Google Cloud console, go to the Model Garden page.

        Go to Model Garden

      2. Find a model to deploy and click its model card.

      3. Click Deploy model. In the Deploy model pane, the predefined deployment settings are based on a public dedicated endpoint.

      4. Select edit setting that enables further deployment options including private access.

      5. Configure your deployment settings.

        • Select a location to create your model endpoint in.
        • Accept or modify the generated model and endpoint name.
        • Selecting a machine type for your deployment is optional, as the recommended configuration has been pre-selected for you.
        • For the Reservation type field, select a reservation type. The reservation must match your specified machine specs.
          • Automatically use created reservation: Gemini Enterprise Agent Platform selects an available reservation with matching properties. If no capacity is available in the selected reservation, Gemini Enterprise Agent Platform uses the general Google Cloud resource pool.
          • Select specific reservations: Gemini Enterprise Agent Platform uses a specific reservation. If no capacity is available in your selected reservation, the deployment fails.
          • No reservation (default): Gemini Enterprise Agent Platform uses the general Google Cloud resource pool.
        • Configure your availability policies.
          • Standard: Ideal for most workloads.
          • Spot: Ideal for fault-tolerant workloads.
          • Flex Start: Uses Dynamic Workload Scheduling (DWS) to manage and prioritize resource allocation requests
      6. Configure endpoint Access for private networking.

        • Select Private (Private Service Connect)
        • Select Project IDs. To grant access to other projects, enter their Project IDs here. If you leave this blank, the endpoint can only be accessed from within the current project.
      7. Click Deploy

      8. To view your deployment, go to the Model Garden page and select View my endpoints and models to see it listed in the My Endpoints section. Make sure you have selected the correct region to ensure your endpoint is visible. Select the endpoint, the status will show as Deploying and will change to Ready once completed.

      9. Obtain the Endpoint ID, then open Cloud Shell and perform the following to obtain the Private Service Attachment URI:

        gcloud ai endpoints describe ENDPOINT_ID --region=REGION  | grep -i serviceAttachment:
        

        Below is an example:

        user@cloudshell:$ gcloud ai endpoints describe 2124795225560842240 --region=europe-west4 | grep -i serviceAttachment:
        Using endpoint [https://europe-west4-aiplatform.googleapis.com/]
            serviceAttachment: projects/o9457b320a852208e-tp/regions/europe-west4/serviceAttachments/gkedpm-52065579567eaf39bfe24f25f7981d
        

      After you get the service attachment, you have the following options to access the model:

      View or manage an endpoint

      To view and manage your endpoint, go to the Agent Platform Online prediction page.

      Go to Online prediction

      Agent Platform lists all endpoints in your project for a particular region. Click an endpoint to view its details such as which models are deployed to the endpoint.

      Undeploy models and delete resources

      To stop a deployed model from using resources in your project, undeploy your model from its endpoint. You must undeploy a model before you can delete the endpoint and the model.

      Undeploy models

      Undeploy a model from its endpoint.

    Console

  • In the Google Cloud console, go to the Endpoints tab on the Online prediction page.

    Go to Endpoints

  • In the Region drop-down list, choose the region where your endpoint is located.

  • Click the endpoint name to open the details page.

  • On the row for the model, click Actions, and then select Undeploy model from endpoint.

  • In the Undeploy model from endpoint dialog, click Undeploy.

    Python

    To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

    In your code, replace:

    from google.cloud import aiplatform
    
    aiplatform.init(project=PROJECT_ID, location=LOCATION)
    
    # To find out which endpoints are available, un-comment the line below:
    # endpoints = aiplatform.Endpoint.list()
    
    endpoint = aiplatform.Endpoint(ENDPOINT_ID)
    endpoint.undeploy_all()
    

    gcloud

    In these commands, replace:

    1. Find the endpoint ID that is associated with your deployment by running the gcloud ai endpoints list command.

      gcloud ai endpoints list \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      
    2. Find the model ID by running the gcloud ai models list command.

      gcloud ai models list \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      
    3. Use the model ID from the previous command to get the deployed model ID by running the gcloud ai models describe command.

      gcloud ai models describe MODEL_ID \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      

      The abbreviated output looks like the following example. In the output, the ID is called deployedModelId.

      Using endpoint [https://us-central1-aiplatform.googleapis.com/]
      artifactUri: [URI removed]
      baseModelSource:
        modelGardenSource:
          publicModelName: publishers/google/models/gemma2
      ...
      deployedModels:
      -   deployedModelId: '1234567891234567891'
        endpoint: projects/12345678912/locations/us-central1/endpoints/12345678912345
      displayName: gemma2-2b-it-12345678912345
      etag: [ETag removed]
      modelSourceInfo:
        sourceType: MODEL_GARDEN
      name: projects/123456789123/locations/us-central1/models/gemma2-2b-it-12345678912345
      ...
      
    4. Run the gcloud ai endpoints undeploy-model command to undeploy the model from the endpoint by using the endpoint ID and the deployed model ID from the previous commands.

      gcloud ai endpoints undeploy-model ENDPOINT_ID \
          --project=PROJECT_ID \
          --region=LOCATION_ID \
          --deployed-model-id=DEPLOYED_MODEL_ID
      

      This command produces no output.

      Delete endpoints

      Delete the Gemini Enterprise Agent Platform endpoint that was associated with your model deployment.

    Console

  • In the Google Cloud console, go to the Endpoints tab on the Online prediction page.

    Go to Endpoints

  • In the Region drop-down list, choose the region your endpoint is located.

  • At the end of the endpoint's row, click Actions, and then select Delete endpoint.

  • In the confirmation prompt, click Confirm.

    Python

    To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

    In your code, replace:

    from google.cloud import aiplatform
    
    aiplatform.init(project=PROJECT_ID, location=LOCATION)
    
    # To find out which endpoints are available, un-comment the line below:
    # endpoints = aiplatform.Endpoint.list()
    
    endpoint = aiplatform.Endpoint(ENDPOINT_ID)
    endpoint.delete()
    

    gcloud

    In these commands, replace:
    1. Get the endpoint ID to delete by running the gcloud ai endpoints list command. This command lists the endpoint IDs for all endpoints in your project.

      gcloud ai endpoints list \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      
    2. Run the gcloud ai endpoints delete command to delete the endpoint.

      gcloud ai endpoints delete ENDPOINT_ID \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      

      When prompted, type y to confirm. This command produces no output.

      Delete models

      Delete the model resource that was associated with your model deployment.

    Console

  • Go to the Model Registry page from the Agent Platform section in the Google Cloud console.

    Go to the Model Registry page

  • In the Region drop-down list, choose the region where you deployed your model.

  • On the row for your model, click Actions and then select Delete model.

    When you delete the model, all associated model versions and evaluations are deleted from your Google Cloud project.

  • In the confirmation prompt, click Delete.

    Python

    To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

    In your code, replace:

    from google.cloud import aiplatform
    
    aiplatform.init(project=PROJECT_ID, location=LOCATION)
    
    # To find out which models are available in Model Registry, un-comment the line below:
    # models = aiplatform.Model.list()
    
    model = aiplatform.Model(MODEL_ID)
    model.delete()
    

    gcloud

    In these commands, replace:
    1. Find the model ID to delete by running the gcloud ai models list command.

      gcloud ai models list \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      
    2. Run the gcloud ai models delete command to delete the model by providing the model ID and the model's location.

      gcloud ai models delete MODEL_ID \
          --project=PROJECT_ID \
          --region=LOCATION_ID
      

    View code samples

    Most of the model cards for task-specific solutions models contain code samples that you can copy and test.

    1. In the Google Cloud console, go to the Model Garden page.

      Go to Model Garden

    2. Find a supported model that you want to view code samples for and click the Documentation tab.

    3. The page scrolls to the documentation section with sample code embedded in place.

    Create a vision app

    The model cards for applicable computer vision models support creating a vision application.

    1. In the Google Cloud console, go to the Model Garden page.

      Go to Model Garden

    2. Find a vision model in the Task specific solutions section that you want to use to create a vision application and click View details.

    3. Click Build app.

      You're taken to Vertex AI Vision.

    4. In Application name, enter a name for your application and click Continue.

    5. Select a billing plan and click Create.

      You're taken to Vertex AI Vision Studio where you can continue creating your computer vision application.