Create an AI-optimized A4X Max instance

This document describes the steps to create standalone Compute Engine instances that use A4X Max accelerator-optimized machine types. To learn about compute instance and cluster creation options, see Deployment options overview page.

A4X Max instance type

A Compute Engine instance, or compute instance, is a computing resource hosted on Google's infrastructure that can be either a virtual machine (VM) or a bare metal instance. A4X Max instances are available as bare metal instances, which differ from VM instances by providing direct, non-virtualized access to the underlying physical hardware. To learn more about the A4X Max machine type, see A4X Max series in the Compute Engine documentation.

Limitations

Before you begin

Before creating A4X Max instances, if you haven't already done so, complete the following steps:

Required roles

This predefined role contains the permissions required to create compute instances. To see the exact permissions that are required, expand the Required permissions section:

A4X Max fundamentals

An A4X Max cluster is organized into a hierarchy of blocks and sub-blocks to facilitate large-scale, non-blocking network performance. Understanding this topology is key when reserving capacity and deploying workloads.

Overview

Topology (`gpuTopology`)	Number of GPUs	Number of instances
`1x72`	72	18

Tip: To use an NVLink domain with the 1x72 topology, you can run the single instance creation commands 18 times or use the instances.bulkInsert method that is designed to create multiple instances with a single API request. To create A4X Max instances in bulk, see Create A4X Max instances in bulk.

Creating an instance with the A4X Max machine type includes the following steps:

Create VPC networks

To set up the network for A4X Max machine types, create two VPC networks for the following network interfaces:

Create the networks either manually by following the instruction guides or automatically by using the provided script.

Create a compact placement policy

Create an A4X Max instance

To obtain a GPU topology of 1x72, create 18 A4X Max instances. When you create the instances, apply the compact placement policy that specifies the gpuTopology field. Applying the policy ensures that Compute Engine creates all 18 A4X Max instances in one sub-block to use an NVLink domain. If a sub-block lacks capacity for an A4X Max instance, then the request to create the instance fails.

gcloud

To create the A4X Max instance, use the gcloud compute instances create command.

gcloud compute instances create INSTANCE_NAME  \
    --machine-type=a4x-maxgpu-4g-metal \
    --image-family=IMAGE_FAMILY \
    --image-project=IMAGE_PROJECT \
    --zone=ZONE \
    --boot-disk-type=hyperdisk-balanced \
    --boot-disk-size=DISK_SIZE \
    --scopes=cloud-platform \
    --network-interface=nic-type=IDPF,network=IDPF_NETWORK_PREFIX-net,stack-type=IPV6_ONLY,subnet=IDPF_NETWORK_PREFIX-sub-0 \
    --network-interface=nic-type=IDPF,network=IDPF_NETWORK_PREFIX-net,stack-type=IPV6_ONLY,subnet=IDPF_NETWORK_PREFIX-sub-1,no-address \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --network-interface=subnet=default-subnet-1-RDMA_NAME_PREFIX-net,stack-type=IPV6_ONLY,nic-type=MRDMA \
    --reservation-affinity=specific \
    --reservation=RESERVATION \
    --provisioning-model=RESERVATION_BOUND \
    --instance-termination-action=TERMINATION_ACTION \
    --maintenance-policy=TERMINATE \
    --restart-on-failure \
    --resource-policies=POLICY_NAME

Replace the following:

INSTANCE_NAME: the name of the A4X Max instance.
IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Operating system details.
IMAGE_PROJECT: the project ID of the OS image.
ZONE: the zone in which the machine type that you want to use is available. You must use a zone in the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.
DISK_SIZE: the size of the boot disk in GB.
IDPF_NETWORK_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use IDPF NICs.
RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:
- To create A4X Max instances on any single block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME
    
```
- To create A4X Max instances on a specific block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME
    
```
- To create A4X Max instances in a specific subblock:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
    
```
Tip: If the reservation exists in the current project, then you can omit projects/RESERVATION_OWNER_PROJECT_ID/reservations/ from the reservation value.
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the A4X Max instance at the end of the reservation period.
POLICY_NAME: the name of the compact placement policy.

REST

To create the A4X Max instance, make a POST request to the instances.insert method.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances
{
  "machineType": "projects/PROJECT_ID/zones/ZONE/machineTypes/a4x-maxgpu-4g-metal",
  "name": "INSTANCE_NAME",
  "disks":[
    {
      "boot":true,
      "initializeParams":{
        "diskSizeGb": "DISK_SIZE",
        "diskType": "hyperdisk-balanced",
        "sourceImage": "projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"
      },
      "mode": "READ_WRITE",
      "type": "PERSISTENT"
    }
  ],
  "serviceAccounts": [
    {
      "email": "default",
      "scopes": [
        "https://www.googleapis.com/auth/cloud-platform"
      ]
    }
  ],
  "networkInterfaces": [
    {
      "accessConfigs": [
        {
          "name": "external-nat",
          "type": "ONE_TO_ONE_NAT"
        }
      ],
      "network": "projects/NETWORK_PROJECT_ID/global/networks/IDPF_NETWORK_PREFIX-net",
      "nicType": "IDPF",
      "stackType": "IPV6_ONLY",
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/IDPF_NETWORK_PREFIX-sub-0"
    },
    {
      "network": "projects/NETWORK_PROJECT_ID/global/networks/IDPF_NETWORK_PREFIX-net",
      "nicType": "IDPF",
      "stackType": "IPV6_ONLY",
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/IDPF_NETWORK_PREFIX-sub-1"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    },
    {
      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/default-subnet-1-RDMA_NAME_PREFIX-net",
      "nicType": "MRDMA",
      "stackType": "IPV6_ONLY"
    }
  ],
  "reservationAffinity":{
    "consumeReservationType": "SPECIFIC_RESERVATION",
    "key": "compute.googleapis.com/reservation-name",
    "values":[
      "RESERVATION"
    ]
  },
  "scheduling":{
    "provisioningModel": "RESERVATION_BOUND",
    "instanceTerminationAction": "TERMINATION_ACTION",
    "onHostMaintenance": "TERMINATE",
    "automaticRestart": true
  },
  "resourcePolicies": [
    "projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"
  ]
}

Replace the following:

PROJECT_ID: the project ID of the project where you want to create the A4X Max instance.
ZONE: the zone in which the machine type that you want to use is available. You must use a zone in the same region as the compact placement policy. For information about regions, see GPU availability by regions and zones.
INSTANCE_NAME: the name of the A4X Max instance.
DISK_SIZE: the size of the boot disk in GB.
IMAGE_PROJECT: the project ID of the OS image.
IMAGE_FAMILY: the image family of the OS image that you want to use. For a list of supported operating systems, see Operating system details.
NETWORK_PROJECT_ID: the project ID of the network.
IDPF_NETWORK_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use IDPF NICs.
REGION: the region of the subnetwork.
RDMA_NAME_PREFIX: the name prefix that you specified when creating the VPC networks and subnets that use RDMA NICs.
RESERVATION: the reservation name, a block, or a subblock within a reservation. To get the reservation name or the available blocks, see View reserved capacity. Based on your requirements for instance placement, choose one of the following:
- To create A4X Max instances on any single block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME
    
```
- To create A4X Max instances on a specific block:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME
    
```
- To create A4X Max instances in a specific subblock:
```
    projects/RESERVATION_OWNER_PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/RESERVATION_BLOCK_NAME/reservationSubBlocks/RESERVATION_SUBBLOCK_NAME
    
```
Tip: If the reservation exists in the current project, then you can omit projects/RESERVATION_OWNER_PROJECT_ID/reservations/ from the reservation value.
TERMINATION_ACTION: whether Compute Engine stops (STOP) or deletes (DELETE) the A4X Max instance at the end of the reservation period.
PROJECT_ID: the project ID of the compact placement policy.
REGION: the region of the compact placement policy.
POLICY_NAME: the name of the compact placement policy.

A4X Max instance type

Limitations

Before you begin

Console

gcloud

REST

Required roles

Required permissions

A4X Max fundamentals

Overview

Create VPC networks

Instruction guides

Script

Create a compact placement policy

gcloud

REST

Create an A4X Max instance

gcloud

REST

What's next

Create an AI-optimized A4X Max instance Stay organized with collections Save and categorize content based on your preferences.

A4X Max instance type

Limitations

Before you begin

Console

gcloud

REST

Required roles

Required permissions

A4X Max fundamentals

Overview

Create VPC networks

Instruction guides

Script

Create a compact placement policy

gcloud

REST

Create an A4X Max instance

gcloud

REST

What's next

Create an AI-optimized A4X Max instance