Note: This documentation applies to the Standard, Plus, and Frontline editions of Gemini Enterprise. For information about the Business edition, see the Gemini Enterprise - Business edition Help Center.
Import from BigQuery
Stay organized with collections
Save and categorize content based on your preferences.
You can create data stores from BigQuery tables in two ways:
One-time ingestion: You import data from a BigQuery table into a
data store. The data in the data store does not change unless you manually
refresh the data.
Periodic ingestion: You import data from one or more BigQuery
tables, and you set a sync frequency that determines how often the data
stores are updated with the most recent data from the BigQuery
dataset.
The following table compares the two ways that you can import BigQuery
data into Gemini Enterprise data stores.
One-time ingestion
Periodic ingestion
Generally available (GA).
Public preview.
Data must be refreshed manually.
Data updates automatically every 1, 3, or 5 days. Data cannot be
manually refreshed.
Gemini Enterprise creates a single data store from one
table in a BigQuery.
Gemini Enterprise creates a data connector for
a BigQuery dataset and a data store (called an
entity data store) for each table specified. For each data
connector, the tables must have the same data type (for example,
structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first
ingesting data from one table and then more data from another source or
BigQuery table.
Because manual data import is not supported, the data in an entity
data store can only be sourced from one BigQuery table.
Data source access control is supported.
Data source access control is not supported. The imported data can
contain access controls but these controls won't be respected.
You can create a data store using either the
Google Cloud console or the API.
You must use the console to create data connectors and their entity
data stores.
CMEK-compliant.
CMEK-compliant.
Before you begin
To import data from a source Google Cloud project that's different from the
Google Cloud project with the Gemini Enterprise data store, grant the following
Identity and Access Management (IAM) roles to the
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
service account in the project that contains the Gemini Enterprise data store:
On the Select a data source page, select BigQuery.
Select what kind of data you are importing.
Click One time.
In the BigQuery path field, click Browse, select a table that you
have prepared for ingesting, and then click Select.
Alternatively, enter the table location directly in the BigQuery path
field.
Click Continue.
If you are doing one-time import of structured data:
Map fields to key properties.
If there are important fields missing from the schema, use Add new
field to add them.
To check the status of your ingestion, go to the Data Stores page
and click your data store name to see details about it on its Data page.
When the status column on the Activity tab changes from In progress
to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes to several hours.
REST
To use the command line to create a data store and import data from
BigQuery, follow these steps.
DATA_STORE_ID: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.
DATA_STORE_DISPLAY_NAME: the display name of the data store that you want to create.
Optional: If you're uploading unstructured data and want to configure document
parsing or to turn on document chunking for RAG, specify the
documentProcessingConfig
object and include it in your data store creation request. Configuring an
OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how
to configure parsing or chunking options, see Parse and chunk
documents.
Import data from BigQuery.
If you defined a schema, make sure the data conforms to that schema.
If the BigQuery table is not under
PROJECT_ID, you need to give the service account
service-<project
number>@gcp-sa-discoveryengine.iam.gserviceaccount.com
"BigQuery Data Viewer" permission for the
BigQuery table. For example, if you are importing
a BigQuery table from source project "123" to
destination project "456", give
service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com
permissions for the BigQuery table under
project "123".
DATA_SCHEMA: optional. Values are document
and custom. The default is document.
document: the BigQuery table
that you use must conform to the default BigQuery
schema provided in
Prepare data for ingesting.
You can define the ID of each document yourself,
while wrapping all the data in the jsonData string.
custom: Any BigQuery table
schema is accepted, and Gemini Enterprise automatically
generates the IDs for each document that is imported.
ERROR_DIRECTORY: optional. A Cloud Storage directory
for error information about the import—for example,
gs://<your-gcs-bucket>/directory/import_errors. Google recommends
leaving this field empty to let Gemini Enterprise
automatically create a temporary directory.
RECONCILIATION_MODE: optional. Values are FULL and
INCREMENTAL. Default is INCREMENTAL. Specifying INCREMENTAL
causes an incremental refresh of data from BigQuery
to your data store. This does an upsert operation, which adds new
documents and replaces existing documents with updated documents
with the same ID. Specifying FULL causes a full rebase of the
documents in your data store. In other words, new and updated
documents are added to your data store, and documents that are not
in BigQuery are removed from your data store. The
FULL mode is helpful if you want to automatically delete documents
that you no longer need.
AUTO_GENERATE_IDS: optional. Specifies whether to
automatically generate document IDs. If set to true, document IDs
are generated based on a hash of the payload. Note that generated
document IDs might not remain consistent over multiple imports. If
you auto-generate IDs over multiple imports, Google highly
recommends setting reconciliationMode to FULL to maintain
consistent document IDs.
Specify autoGenerateIds only when bigquerySource.dataSchema is
set to custom. Otherwise an INVALID_ARGUMENT error is
returned. If you don't specify autoGenerateIds or set it to
false, you must specify idField. Otherwise the documents fail to
import.
ID_FIELD: optional. Specifies which fields are the
document IDs. For BigQuery source files, idField
indicates the name of the column in the BigQuery
table that contains the document IDs.
Specify idField only when: (1) bigquerySource.dataSchema is set
to custom, and (2) auto_generate_ids is set to false or is
unspecified. Otherwise an INVALID_ARGUMENT error is returned.
The value of the BigQuery column name must be of
string type, must be between 1 and 63 characters, and must conform
to RFC-1034. Otherwise, the
documents fail to import.
The following procedure describes how to create a BigQuery
data store
that periodically syncs data from a BigQuery dataset. If your dataset
has multiple tables, you can add them to the BigQuery data store
you are creating. Each table you add is referred to as an entity.
Gemini Enterprise creates a separate data store for each entity. Therefore,
when you create the data store using the Google Cloud console, you get a
collection of data stores representing these ingested data entities.
Data from the dataset is synced periodically to the entity data stores. You can
specify synchronization daily, every three days, or every five days.
Console
To create a data store that periodically syncs data
from a BigQuery dataset to Gemini Enterprise, follow these
steps:
In the Google Cloud console, go to the Gemini Enterprise page.
Select the Sync frequency, how often you want the
Gemini Enterprise connector to sync with the BigQuery
dataset. You can change the frequency later.
In the BigQuery dataset path field, click Browse, select the dataset
that contains the tables that you have prepared for
ingesting. Alternatively, enter the table location directly
in the BigQuery path field. The format for the path is
projectname.datasetname.
In the Tables to sync field, click Browse, and then select a table
that contains the data that you want for your data store.
If there are additional tables in the dataset that you want to use for
data stores, click Add table and specify those tables too.
Click Continue.
Choose a region for your data store, enter a name for your data connector,
and click Create.
You have now created a data connector, which will periodically sync data
with the BigQuery dataset. And, you have created one or more entity
data stores. The data stores have the same names as the BigQuery
tables.
To check the status of your ingestion, go to the Data Stores page
and click your data connector name to see details about it on its Data
page > Data ingestion activity tab. When the status column on the
Activity tab changes from In progress to succeeded, the first
ingestion is complete.
Depending on the size of your data, ingestion can take several
minutes to several hours.
After you set up your data source and import data the first time, the data store
syncs data from that source at a frequency that you select during setup.
About an hour after the data connector is created, the first sync occurs.
The next sync then occurs around 24 hours, 72 hours,
or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store
following the steps in
Create a search app.
To preview how your search results appear after your app and data store are
set up, see
Preview search results.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-06-09 UTC."],[],[]]