This page describes how to prepare Avro files that you exported from non- Spanner databases and then import those files into Spanner. These procedures include information for both GoogleSQL-dialect databases and PostgreSQL-dialect databases. If you want to import a Spanner database that you previously exported, see Import Spanner Avro files.
The process uses Dataflow; it imports data from a Cloud Storage bucket that contains a set of Avro files and a JSON manifest file that specifies the destination tables and Avro files that populate each table.
To import a Spanner database, first you need to enable the Spanner, Cloud Storage, Compute Engine, and Dataflow APIs:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains the serviceusage.services.enable permission. Learn how to grant
roles.
You also need enough quota and the required IAM permissions.
The quota requirements for import jobs are as follows:
Compute Engine: Before running your import job, you must set up initial quotas for Compute Engine, which Dataflow uses. These quotas represent the maximum number of resources that you allow Dataflow to use for your job. Recommended starting values are:
Generally, you don't have to make any other adjustments. Dataflow provides autoscaling so that you only pay for the actual resources used during the import. If your job can make use of more resources, the Dataflow UI displays a warning icon. The job should finish even if there is a warning icon.
To get the permissions that you need to export a database, ask your administrator to grant you the following IAM roles on your Dataflow worker service account:
roles/spanner.viewer)roles/dataflow.worker)roles/storage.admin)roles/spanner.databaseReader)roles/spanner.databaseAdmin)
The import process brings data in from Avro files located in a Cloud Storage bucket. You can export data in Avro format from any source and can use any available method to do so.
To export data from a non-Spanner database to Avro files, follow these steps:
Keep the following things in mind when exporting your data:
Each column in your Avro files must use one of the following column types:
ARRAYBOOLBYTES*DOUBLEFLOATINTLONG†STRING‡* A column of type BYTES is used to import a Spanner NUMERIC; see
the following recommended mappings section for details.
†,‡ You can import a LONG storing a timestamp or a STRING storing a
timestamp as a Spanner TIMESTAMP; see the following
recommended mappings section for details.
You don't have to include or generate any metadata when you export the Avro files.
You don't have to follow any particular naming convention for your files.
If you don't export your files directly to Cloud Storage, you must upload the Avro files to a Cloud Storage bucket. For detailed instructions, see Upload objects to your Cloud Storage.
To import Avro files from a non-Spanner database to Spanner, follow these steps:
spanner-export.json file in your Cloud Storage bucket.Before you run your import, you must create the target table in Spanner and define its schema.
You must create a schema that uses the appropriate column type for each column in the Avro files.
| Avro column type | Spanner column type |
|---|---|
ARRAY |
ARRAY |
BOOL |
BOOL |
BYTES |
|
DOUBLE |
FLOAT64 |
FLOAT |
FLOAT64 |
INT |
INT64 |
LONG |
|
STRING |
|
| Avro column type | Spanner column type |
|---|---|
ARRAY |
ARRAY |
BOOL |
BOOLEAN |
BYTES |
|
DOUBLE |
DOUBLE PRECISION |
FLOAT |
DOUBLE PRECISION |
INT |
BIGINT |
LONG |
|
STRING |
|
You must also create a file named spanner-export.json in your
Cloud Storage bucket. This file specifies the database dialect and
contains a tables array that lists
the name and data file locations for each table.
The contents of the file have the following format:
{ "tables": [ { "name": "TABLE1", "dataFiles": [ "RELATIVE/PATH/TO/TABLE1_FILE1", "RELATIVE/PATH/TO/TABLE1_FILE2" ] }, { "name": "TABLE2", "dataFiles": ["RELATIVE/PATH/TO/TABLE2_FILE1"] } ], "dialect":"DATABASE_DIALECT" }
Where DATABASE_DIALECT = {GOOGLE_STANDARD_SQL | POSTGRESQL}
If the dialect element is omitted, the dialect defaults to
GOOGLE_STANDARD_SQL.
To start your import job, follow the instructions for using the Google Cloud CLI to run a job with the Avro to Spanner template.
After you have started an import job, you can see details about the job in the Google Cloud console.
After the import job is finished, add any necessary secondary indexes and foreign-keys.
You might want to choose a different region based on the location of your Cloud Storage bucket. To avoid outbound data transfer charges, choose a region that matches your Cloud Storage bucket's location.
If your Cloud Storage bucket location is a region, you can take advantage of free network usage by choosing the same region for your import job, assuming that region is available.
If your Cloud Storage bucket location is a dual-region, you can take advantage of free network usage by choosing one of the two regions that make up the dual-region for your import job, assuming one of the regions is available.
After you start an import job, you can view details of the job, including logs, in the Dataflow section of the Google Cloud console.
To see details for any import or export jobs that you ran within the last week, including any jobs that are running now:
In the database Import/Export page, click the job name in the Dataflow job name column:

The Google Cloud console displays details of the Dataflow job.
To view a job that you ran more than one week ago:
Go to the Dataflow jobs page in the Google Cloud console.
Find your job in the list, then click its name.
The Google Cloud console displays details of the Dataflow job.
To view a Dataflow job's logs, navigate to the job's details page, then click Logs to the right of the job's name.
If a job fails, look for errors in the logs. If there are errors, the error count displays next to Logs:

To view job errors:
Click the error count next to Logs.
The Google Cloud console displays the job's logs. You may need to scroll to see the errors.
Locate entries with the error icon
.
Click an individual log entry to expand its contents.
For more information about troubleshooting Dataflow jobs, see Troubleshoot your pipeline.
If you see the following errors in your job logs:
com.google.cloud.spanner.SpannerException: NOT_FOUND: Session not found --or-- com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: Deadline expired before operation could complete.
Check the 99% Write latency in the Monitoring tab of your Spanner database in the Google Cloud console. If it is showing high (multiple second) values, then it indicates that the instance is overloaded, causing writes to timeout and fail.
One cause of high latency is that the Dataflow job is running using too many workers, putting too much load on the Spanner instance.
To specify a limit on the number of Dataflow workers, instead of using the Import/Export tab in the instance details page of your Spanner database in the Google Cloud console, you must start the import using the Dataflow Cloud Storage Avro to Spanner template and specify the maximum number of workers as described:If you are using the Dataflow console, the Max workers parameter is located in the Optional parameters section of the Create job from template page.
Run the gcloud dataflow jobs run
command, and specify the max-workers argument. For example:
gcloud dataflow jobs run my-import-job \
--gcs-location='gs://dataflow-templates/latest/GCS_Avro_to_Cloud_Spanner' \
--region=us-central1 \
--parameters='instanceId=test-instance,databaseId=example-db,inputDir=gs://my-gcs-bucket' \
--max-workers=10 \
--network=network-123
The following error might occur when you export your Spanner databases:
Workflow failed. Causes: Error: Message: Invalid value for field 'resource.properties.networkInterfaces[0].subnetwork': ''. Network interface must specify a subnet if the network resource is in custom subnet mode. HTTP Code: 400
This error occurs because Spanner assumes that you intend to use
an auto mode VPC network named default in the same project as the
Dataflow job. If you don't have a default VPC network in the
project, or if your VPC network is in a custom mode VPC network, then you must
create a Dataflow job and
specify an alternate network or subnetwork.
If you have followed the suggestions in initial settings, you should generally not have to make any other adjustments. If your job is running slowly, there are a few other optimizations you can try:
Optimize the job and data location: Run your Dataflow job in the same region where your Spanner instance and Cloud Storage bucket are located.
Ensure sufficient Dataflow resources: If the
relevant Compute Engine quotas
limit your Dataflow job's resources, the job's
Dataflow page in the Google Cloud console
displays a warning icon
and log
messages:

In this situation, increasing the quotas for CPUs, in-use IP addresses, and standard persistent disk might shorten the run time of the job, but you might incur more Compute Engine charges.
Check the Spanner CPU utilization: If you see that the CPU utilization for the instance is over 65%, you can increase the compute capacity in that instance. The capacity adds more Spanner resources and the job should speed up, but you incur more Spanner charges.
Several factors influence the time it takes to complete an import job.
Spanner database size: Processing more data takes more time and resources.
Spanner database schema, including:
Data location: Data is transferred between Spanner and Cloud Storage using Dataflow. Ideally all three components are located in the same region. If the components are not in the same region, moving the data across regions slows the job down.
Number of Dataflow workers: Optimal Dataflow workers are necessary for good performance. By using autoscaling, Dataflow chooses the number of workers for the job depending on the amount of work that needs to be done. The number of workers will, however, be capped by the quotas for CPUs, in-use IP addresses, and standard persistent disk. The Dataflow UI displays a warning icon if it encounters quota caps. In this situation, progress is slower, but the job should still complete. Autoscaling can overload Spanner leading to errors when there is a large amount of data to import.
Existing load on Spanner: An import job adds significant CPU load on a Spanner instance. If the instance already has a substantial existing load, then the job runs more slowly.
Amount of Spanner compute capacity: If the CPU utilization for the instance is over 65%, then the job runs more slowly.
When starting a Spanner import job, Dataflow workers must be set to an optimal value for good performance. Too many workers overloads Spanner and too few workers results in an underwhelming import performance.
The maximum number of workers is heavily dependent on the data size, but ideally, the total Spanner CPU utilization should be between 70% to 90%. This provides a good balance between Spanner efficiency and error-free job completion.
To achieve that utilization target in the majority of schemas and scenarios, we recommend a max number of worker vCPUs between 4-6x the number of Spanner nodes.
For example, for a 10 node Spanner instance, using n1-standard-2 workers, you would set max workers to 25, giving 50 vCPUs.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-06-09 UTC.