Stay organized with collections
Save and categorize content based on your preferences.
Loading externally partitioned data
BigQuery can load data that is stored in Cloud Storage using a Hive
partitioning layout. Hive partitioning means that the external data is
organized into multiple files, with a naming convention to separate files into
different partitions. For more information, see
Supported data layouts.
By default, the data is not partitioned in BigQuery after
you load it, unless you explicitly create a
partitioned table.
Load Hive partitioned data
To load Hive partitioned data, choose one of the following options:
In the Explorer pane, expand your project, click Datasets, and then select a dataset.
Click more_vertActions,
and then click Create table. This opens the Create table pane.
In the Source section, specify the following details:
For Create table from, select Google Cloud Storage.
For Select file from Cloud Storage bucket, enter the path to the
Cloud Storage folder, using wildcards.
For example, my_bucket/my_files*. The Cloud Storage bucket must be in the
same location as the dataset that contains the table you want to create, append, or overwrite.
From the File format list, select the file type.
Select the Source data partitioning checkbox, and then for Select
Source URI Prefix, enter the Cloud Storage URI prefix. For example,
gs://my_bucket/my_files.
In the Partition inference mode section, select one of the following
options:
Automatically infer types: set the partition schema detection mode
to AUTO.
All columns are strings: set the partition schema detection mode to
STRINGS.
Provide my own: set the partition schema detection mode to
CUSTOM and manually enter the schema
information for the partition keys. For more information, see
Provide
a custom partition key schema.
Optional: To require a partition filter on all queries for this table,
select the Require partition filter checkbox. Requiring a partition
filter can reduce cost and improve performance. For more information, see
Requiring
predicate filters on partition keys in queries.
In the Destination section, specify the following details:
For Project, select the project in which you want to create the
table.
For Dataset, select the dataset in which you want to create the
table.
For Table, enter the name of the table that you want to create.
For Table type, select
Native table .
In the Schema section, enter the schema
definition.
To enable the auto detection of schema,
select Auto detect.
To ignore rows with extra column values that do not match the schema,
expand the Advanced options section and select Unknown values.
Click Create table.
SQL
To create an externally partitioned table, use the WITH PARTITION COLUMNS clause of the LOAD DATA statement to specify the partition schema details.
The partition key schema is encoded immediately following the source URI
prefix. Use the following format to specify
--hive_partitioning_source_uri_prefix:
Set the source URI prefix to gs://my_bucket/my_table/
for AUTO or STRINGS Hive partitioning modes. For CUSTOM,
provide gs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}.
Use the URI gs://my_bucket/my_table/dt=2019-10-31/*.
Data is loaded with dt and val columns included, with
values 2019-10-31 and 1, respectively.
To load only data from specific files, do the following:
Set the source URI prefix to gs://my_bucket/my_table/
for AUTO or STRINGS Hive partitioning modes. For CUSTOM,
provide gs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}.
Use the URIs
gs://my_bucket/my_table/dt=2017-10-31/val=3/file3,gs://my_bucket/my_table/dt=2016-10-31/val=4/file4.
Data is loaded from both files with the dt and val
columns filled in.
Hive partition keys appear as normal columns when you query data from
Cloud Storage.
The data must follow a default Hive partitioned layout.
For example, the following files follow the default layout—the
key-value pairs are configured as directories with an equal sign (=) as a separator,
and the partition keys are always in the same order:
The common source URI prefix in this example is gs://my_bucket/my_table.
Unsupported data layouts
If the partition key names are not encoded in the directory path,
partition schema detection fails. For example, consider the following path,
which does not encode the partition key names:
gs://my_bucket/my_table/2019-10-31/en/my_filename
Files where the schema is not in a consistent order also fail detection.
For example, consider the following two files with inverted partition
key encodings:
STRINGS: Key names are automatically converted to STRING type.
CUSTOM: Partition key schema is encoded as specified in the source URI
prefix.
Custom partition key schema
To use a CUSTOM schema, you must specify the schema in the source URI prefix
field. Using a CUSTOM schema lets you specify the type for each partition key.
The values must validly parse as the specified type or the query fails.
For example, if you set the source_uri_prefix flag to
gs://my_bucket/my_table/{dt:DATE}/{val:STRING},
BigQuery treats val as a STRING, dt as a DATE, and
uses gs://my_bucket/my_table as the source URI prefix for the matched files.
Limitations
Hive partitioning support is built assuming a common source URI prefix for
all URIs that ends immediately before partition encoding, as follows:
gs://BUCKET/PATH_TO_TABLE/.
The directory structure of a Hive partitioned table is assumed to have the same
partitioning keys appear in the same order, with a maximum of ten
partition keys per table.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-03 UTC."],[],[]]