Stay organized with collections
Save and categorize content based on your preferences.
Create Amazon S3 BigLake tables
This document describes how to create an Amazon Simple Storage Service (Amazon S3) BigLake table. A
BigLake table lets you use
access delegation to query data in Amazon S3. Access delegation
decouples access to the BigLake table from access to the
underlying datastore.
This predefined role contains
the permissions required to create an external table. To see the exact permissions that are
required, expand the Required permissions section:
Required permissions
The following permissions are required to create an external table:
Optional: To delete tables automatically, select the Enable table expiration checkbox and set
the Default maximum table age in days. Data in Amazon S3
is not deleted when the table expires.
If you want to use default collation,
expand the Advanced options section and then select the Enable default collation option.
The --project_id parameter overrides the default project.
Replace the following:
LOCATION: the location of your dataset
For information about supported regions, see
Locations.
After you
create a dataset, you can't change its location. You can set a default
value for the location by using the
.bigqueryrc file.
PROJECT_ID: your project ID
DATASET_NAME: the name of the dataset that
you want to create
To create a dataset in a project other than your default project, add the
project ID to the dataset name in the following format:
PROJECT_ID:DATASET_NAME.
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Dataset;importcom.google.cloud.bigquery.DatasetInfo;// Sample to create a aws datasetpublicclassCreateDatasetAws{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";StringdatasetName="MY_DATASET_NAME";// Note: As of now location only supports aws-us-east-1Stringlocation="aws-us-east-1";createDatasetAws(projectId,datasetName,location);}publicstaticvoidcreateDatasetAws(StringprojectId,StringdatasetName,Stringlocation){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();DatasetInfodatasetInfo=DatasetInfo.newBuilder(projectId,datasetName).setLocation(location).build();Datasetdataset=bigquery.create(datasetInfo);System.out.println("Aws dataset created successfully :"+dataset.getDatasetId().getDataset());}catch(BigQueryExceptione){System.out.println("Aws dataset was not created. \n"+e.toString());}}}
You can create a BigLake table for Hive partitioned data in
Amazon S3. After you create an externally partitioned table, you can't
change the partition key. You need to recreate the table to change the
partition key.
To create a BigLake table based on Hive partitioned data,
select one of the following options:
Delta Lake is an open source table format that supports petabyte scale data
tables. Delta Lake tables can be queried as both temporary and permanent tables,
and is supported as a BigLake
table.
Schema synchronization
Delta Lake maintains a canonical schema as part of its metadata. You
can't update a schema using a JSON metadata file. To update the schema:
Delta Lake tables are only supported on
BigQuery Omni and have the associated
limitations.
You can't update a table with a new JSON metadata file. You must use an auto
detect schema table update operation. See Schema
synchronization for more information.
BigLake security features only protect Delta Lake
tables when accessed through BigQuery services.
Create a Delta Lake table
The following example creates an external table by using the CREATE EXTERNAL
TABLE
statement with the
Delta Lake format:
CREATE [OR REPLACE] EXTERNAL TABLE table_name
WITH CONNECTION connection_name
OPTIONS (
format = 'DELTA_LAKE',
uris = ["parent_directory"]
);
Replace the following:
table_name: The name of the table.
connection_name: The name of the connection. The connection must
identify either an
Amazon S3 or a
Blob Storage source.
parent_directory: The URI of the parent directory.
BigQuery Omni transfer with Delta Lake
The following example uses the LOAD DATA
statement to load data to the appropriate table:
LOAD DATA [INTO | OVERWRITE] table_name
FROM FILES (
format = 'DELTA_LAKE',
uris = ["parent_directory"]
)
WITH CONNECTION connection_name;
You can use VPC Service Controls perimeters to restrict access from
BigQuery Omni to an external cloud service as an extra layer of
defense. For example, VPC Service Controls perimeters can limit exports from
your BigQuery Omni tables to a specific Amazon S3 bucket
or Blob Storage container.
Ensure that you have the required permissions to configure service perimeters.
To view a list of IAM roles required to
configure VPC Service Controls, see Access control with
IAM in the
VPC Service Controls documentation.
Set up VPC Service Controls using the Google Cloud console
In the Google Cloud console navigation menu, click Security, and then
click VPC Service Controls.
To set up VPC Service Controls for BigQuery Omni,
follow the steps in the Create a service
perimeter
guide, and when you are in the Egress rules pane, follow these steps:
In the Egress rules panel, click Add rule.
In the From attributes of the API client section, select an option
from the Identity list.
Select To attributes of external resources.
To add an external resource, click Add external resources.
In the Add external resource dialog, for External resource name,
enter a valid resource name. For example:
For Amazon Simple Storage Service (Amazon S3): s3://BUCKET_NAME
Replace BUCKET_NAME with the name of your Amazon S3 bucket.
For Azure Blob Storage: azure://myaccount.blob.core.windows.net/CONTAINER_NAME
Replace CONTAINER NAME with the name of your Blob Storage
container.
Select the methods that you want to allow on your external resources:
If you want to allow all methods, select All methods in the
Methods list.
If you want to allow specific methods, select Selected method,
click Select methods, and then select the methods that you
want to allow on your external resources.
Click Create perimeter.
Set up VPC Service Controls using the gcloud CLI
To set up VPC Service Controls using the gcloud CLI, follow these
steps:
An access policy is an organization-wide container
for access levels and service perimeters. For information about setting a
default access policy or getting an access policy name, see Managing an access
policy.
Create the egress policy input file
An egress rule block defines the allowed access from within a perimeter to resources
outside of that perimeter. For external resources, the externalResources property
defines the external resource paths allowed access from within your
VPC Service Controls perimeter.
Egress rules can be configured using
a JSON file, or a YAML file. The following sample uses the .yaml format:
egressTo: lists allowed service operations on Google Cloud resources
in specified projects outside the perimeter.
operations: list accessible services and actions or methods that a
client satisfying the from block conditions is allowed to access.
serviceName: set bigquery.googleapis.com for BigQuery Omni.
methodSelectors: list methods that a client satisfying the from conditions
can access. For restrictable methods and permissions for services, see
Supported service method restrictions.
method : a valid service method, or \"*\" to allow all serviceName methods.
permission: a valid service permission, such as \"*\",
externalResource.read, or externalResource.write. Access to resources
outside the perimeter is allowed for operations that require this permission.
externalResources: lists external resources that clients inside a perimeter
can access. Replace EXTERNAL_RESOURCE_PATH with either a valid
Amazon S3 bucket, such as s3://bucket_name, or a
Blob Storage container path, such as
azure://myaccount.blob.core.windows.net/container_name.
egressFrom: lists allowed service operations on Google Cloud
resources in specified projects within the perimeter.
identityType or identities: defines the identity types that can access the
specified resources outside the perimeter. Replace IDENTITY_TYPE
with one of the following valid values:
ANY_IDENTITY: to allow all identities.
ANY_USER_ACCOUNT: to allow all users.
ANY_SERVICE_ACCOUNT: to allow all service accounts
identities: lists service accounts that can access the specified resources
outside the perimeter.
serviceAccount (optional): replace SERVICE_ACCOUNT with the
service account that can access the specified resources outside the
perimeter.
Examples
The following example is a policy that allows egress operations from inside the
perimeter to the s3://mybucket Amazon S3 location in AWS.
To add the egress policy when you create a new service perimeter, use the
gcloud access-context-manager perimeters create command.
For example, the following command creates a new
perimeter named omniPerimeter that includes the project with project number
12345, restricts the BigQuery API, and adds an egress policy
defined in the egress.yaml file:
To add the egress policy to an existing service perimeter, use the
gcloud access-context-manager perimeters update command.
For example, the following command adds an egress policy defined in the
egress.yaml file to an existing service perimeter named omniPerimeter:
As a BigQuery administrator, you can create an S3 bucket policy to
grant BigQuery Omni access to your Amazon S3 resources.
This ensures that only authorized BigQuery Omni VPCs can interact with
your Amazon S3, enhancing the security of your data.
Apply an S3 bucket policy for BigQuery Omni VPC
To apply an S3 bucket policy, use the AWS CLI or Terraform:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-03 UTC."],[],[]]