This document describes the Compute Engine instances in the accelerator-optimized machine family that have Tensor Processing Units (TPUs). TPUs are Google's custom-developed, application-specific integrated circuits (ASICs) that are optimized specifically for artificial intelligence (AI) and machine learning (ML) workloads.
Compute Engine supports the following TPU versions:
- TPU7x
- TPU v6e
- TPU v5p
Each machine type within a version has a specific topology and a number of TPU chips attached.
Fundamentals of TPU architecture
Understanding the fundamentals of TPU architecture helps you to choose the TPU version and machine type for your workload.
TPU chip: A TPU chip is a specialized accelerator designed by Google for machine learning. Each TPU chip contains one or more TensorCores to handle massive matrix operations. Each TensorCore consists of one or more matrix-multiply units (MXUs), which use a systolic array architecture to perform thousands of multiply-accumulate operations per cycle without constant memory access. While primarily used for high-speed matrix processing, the TPU chip also includes vector and scalar units for general computation and control flow operations.
TPU Pod: A TPU Pod is a contiguous set of TPUs grouped together over a specialized network. The number of TPU chips in a TPU Pod is dependent on the TPU version.
TPU VM: A TPU VM is a Linux virtual machine that runs on a TPU host and has access to the underlying TPUs. You can connect directly to TPU VMs using SSH. You have root access to the VM, so you can run arbitrary code. You can access compiler and runtime debug logs and error messages.
TPU slice: A logical group of interconnected TPU chips, accessed through one or more TPU VMs. Slices have one of the following scopes:
- Single-host slice: A slice consisting of one host machine. In general, this maps to one TPU VM.
- Multi-host slice: A slice consisting of multiple TPU VMs interconnected using a high-speed inter-chip interconnect (ICI).
TPU cube: A 4x4x4 topology of interconnected TPU chips. This is only applicable to 3D topologies.
SparseCore: SparseCores are dataflow processors that accelerate models using sparse operations. A primary use case is accelerating recommendation models, which rely heavily on embeddings.
TPU versions: The exact architecture of a TPU chip depends on the TPU version that you use. Each TPU version also supports different slice sizes and configurations.
For information about how TPUs work, see TPU architecture document in the Cloud TPU documentation.
Recommended TPU versions by workload types
| TPU version | Primary workload types |
|---|---|
| TPU7x (Ironwood) |
|
| TPU v6e (Trillium) |
|
| TPU v5p |
|
Consumption options
To optimize resource utilization and cost while balancing workload performance, Compute Engine supports the following TPU consumption options:
On-demand: to consume TPUs without arranging capacity in advance. Before requesting resources, you must have enough on-demand quota for the specific type and quantity of TPU VMs. On-demand is the most flexible consumption option; however, there is no guarantee that enough on-demand resources will be available to fulfill your request.
Spot VMs: to provision Spot VMs, you can get significant discounts, but Spot VMs can be preempted at any time, with a 30-second warning. For more information, see About Spot VMs.
Flex-start: to provision Flex-start VMs for up to seven days, with Compute Engine automatically allocating the hardware on a best-effort basis based on availability. For more information, see About Flex-start VMs.
Future reservation: to request a future reservation for one year or longer. For more information, see Request a future reservation for one year or longer in the Cloud TPU documentation.
Future reservation in calendar mode: to provision TPU resources for up to 90 days, for a specified time period. For more information, see About future reservation requests in calendar mode.
On-demand is the default consumption model for TPUs if you don't specify another option.
For information about the underlying provisioning model that enables the consumption option, see About VM provisioning models.
Consumption option availability by TPU versions
The following table summarizes the availability of each consumption option by TPU versions.
| TPU version | On-demand | Spot | Flex-start | On-demand reservations | Future reservations | Future reservations in calendar mode |
|---|---|---|---|---|---|---|
| 1 | 1 | 1 | ||||
1 Spot, Flex-start, and Future reservations in calendar mode for TPU7x is restricted by an allowlist. To request access, contact your account team or the sales team.