This tutorial shows you how to use GPUs on Dataflow to process Landsat 8 satellite images and render them as JPEG files. The tutorial is based on the example Processing Landsat satellite images with GPUs.
This tutorial uses billable components of Google Cloud, including:
Use the pricing calculator to generate a cost estimate based on your projected usage.
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first
sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
Create or select a Google Cloud project. Roles required to select or create a project Create a Google Cloud project: Replace Select the Google Cloud project that you created: Replace
Verify that billing is enabled for your Google Cloud project.
Enable the Dataflow, Cloud Build, and Artifact Registry APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (
If you're using a local shell, then create local authentication credentials for your user
account:
You don't need to do this if you're using Cloud Shell.
If an authentication error is returned, and you are using an external identity provider
(IdP), confirm that you have
signed in to the gcloud CLI with your federated identity.
Grant roles to your user account. Run the following command once for each of the following
IAM roles:
Replace the following:
Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first
sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
Create or select a Google Cloud project. Roles required to select or create a project Create a Google Cloud project: Replace Select the Google Cloud project that you created: Replace
Verify that billing is enabled for your Google Cloud project.
Enable the Dataflow, Cloud Build, and Artifact Registry APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (
If you're using a local shell, then create local authentication credentials for your user
account:
You don't need to do this if you're using Cloud Shell.
If an authentication error is returned, and you are using an external identity provider
(IdP), confirm that you have
signed in to the gcloud CLI with your federated identity.
Grant roles to your user account. Run the following command once for each of the following
IAM roles:
Replace the following: Grant roles to your Compute Engine default service account. Run the following command once
for each of the following IAM roles: In the Replication settings section,
click Configure to configure settings for the
replication job. The Configure cross-bucket replication pane
appears.
gcloud init
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
gcloud projects create PROJECT_ID
PROJECT_ID with a name for the Google Cloud project you are creating.gcloud config set project PROJECT_ID
PROJECT_ID with your Google Cloud project name.roles/serviceusage.serviceUsageAdmin), which contains the
serviceusage.services.enable permission. Learn how to grant
roles.
gcloud services enable dataflow
gcloud auth application-default login
roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
PROJECT_ID: Your project ID.USER_IDENTIFIER: The identifier for your user
account. For example, myemail@example.com.
ROLE: The IAM role that you grant to your user account.gcloud init
roles/resourcemanager.projectCreator), which contains the
resourcemanager.projects.create permission. Learn how to grant
roles.
gcloud projects create PROJECT_ID
PROJECT_ID with a name for the Google Cloud project you are creating.gcloud config set project PROJECT_ID
PROJECT_ID with your Google Cloud project name.roles/serviceusage.serviceUsageAdmin), which contains the
serviceusage.services.enable permission. Learn how to grant
roles.
gcloud services enable dataflow
gcloud auth application-default login
roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE
PROJECT_ID: Your project ID.USER_IDENTIFIER: The identifier for your user
account. For example, myemail@example.com.
ROLE: The IAM role that you grant to your user account.roles/dataflow.admin,
roles/dataflow.worker, roles/bigquery.dataEditor,
roles/pubsub.editor, roles/storage.objectAdmin,
and roles/artifactregistry.reader.gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com" --role=SERVICE_ACCOUNT_ROLE
PROJECT_ID with your project ID.PROJECT_NUMBER with your project number.
To find your project number, see Identify projects.SERVICE_ACCOUNT_ROLE with each individual role.
Set up cross-bucket replication
Download the starter files, and then create your Artifact Registry repository.
Download the starter files and then change directories.
Clone the python-docs-samples repository.
git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
Navigate to the sample code directory.
cd python-docs-samples/dataflow/gpu-examples/tensorflow-landsat
Create an Artifact Registry repository so that you can upload artifacts. Each repository can contain artifacts for a single supported format.
All repository content is encrypted using either Google-owned and Google-managed encryption keys or customer-managed encryption keys. Artifact Registry uses Google-owned and Google-managed encryption keys by default and no configuration is required for this option.
You must have at least Artifact Registry Writer access to the repository.
Run the following command to create a new repository. The command uses the
--async flag and returns immediately, without waiting for the operation in
progress to complete.
gcloud artifacts repositories create REPOSITORY \
--repository-format=docker \
--location=LOCATION \
--async
Replace REPOSITORY with a name for your repository. For each repository location in a project, repository names must be unique.
Before you can push or pull images, configure Docker to authenticate requests for Artifact Registry. To set up authentication to Docker repositories, run the following command:
gcloud auth configure-docker LOCATION-docker.pkg.dev
The command updates your Docker configuration. You can now connect with Artifact Registry in your Google Cloud project to push images.
Cloud Build allows you to build a Docker image using a Dockerfile and save it into Artifact Registry, where the image is accessible to other Google Cloud products.
Build the container image by using the
build.yaml
config file.
gcloud builds submit --config build.yaml
The following code block demonstrates how to launch this Dataflow pipeline with GPUs.
We run the Dataflow pipeline using the
run.yaml
config file.
export PROJECT=PROJECT_NAME
export BUCKET=BUCKET_NAME
export JOB_NAME="satellite-images-$(date +%Y%m%d-%H%M%S)"
export OUTPUT_PATH="gs://$BUCKET/samples/dataflow/landsat/output-images/"
export REGION="us-central1"
export GPU_TYPE="nvidia-tesla-t4"
gcloud builds submit \
--config run.yaml \
--substitutions _JOB_NAME=$JOB_NAME,_OUTPUT_PATH=$OUTPUT_PATH,_REGION=$REGION,_GPU_TYPE=$GPU_TYPE \
--no-source
Replace the following:
gs:// prefix)After you run this pipeline, wait for the command to finish. If you exit your shell, you might lose the environment variables that you've set.
To avoid sharing the GPU between multiple worker processes, this sample uses a machine type with 1 vCPU. The memory requirements of the pipeline are addressed by using 13 GB of extended memory. For more information, read GPUs and worker parallelism.
The pipeline in
tensorflow-landsat/main.py
processes Landsat 8 satellite images and
renders them as JPEG files. Use the following steps to view these files.
List the output JPEG files with details by using the Google Cloud CLI.
gcloud storage ls "gs://$BUCKET/samples/dataflow/landsat/" --long --readable-sizes
Copy the files into your local directory.
mkdir outputs
gcloud storage cp "gs://$BUCKET/samples/dataflow/landsat/*" outputs/
Open these image files with the image viewer of your choice.
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-09 UTC.