Import from BigQuery

You can create data stores from BigQuery tables in two ways:

One-time ingestion: You import data from a BigQuery table into a data store. The data in the data store does not change unless you manually refresh the data.
Periodic ingestion: You import data from one or more BigQuery tables, and you set a sync frequency that determines how often the data stores are updated with the most recent data from the BigQuery dataset.

The following table compares the two ways that you can import BigQuery data into Gemini Enterprise data stores.

One-time ingestion	Periodic ingestion
Generally available (GA).	Public preview.
Data must be refreshed manually.	Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed.
Gemini Enterprise creates a single data store from one table in a BigQuery.	Gemini Enterprise creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table.	Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table.
Data source access control is supported.	Data source access control is not supported. The imported data can contain access controls but these controls won't be respected.
You can create a data store using either the Google Cloud console or the API.	You must use the console to create data connectors and their entity data stores.
CMEK-compliant.	CMEK-compliant.

Before you begin

To import data from a source Google Cloud project that's different from the Google Cloud project with the Gemini Enterprise data store, grant the following Identity and Access Management (IAM) roles to the service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com service account in the project that contains the Gemini Enterprise data store:

Import once from BigQuery

To ingest data from a BigQuery table, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.

Before importing your data, review Prepare data for ingesting.

REST

To use the command line to create a data store and import data from BigQuery, follow these steps.

ID_FIELD: optional. Specifies which fields are the document IDs. For BigQuery source files, idField indicates the name of the column in the BigQuery table that contains the document IDs.

Specify idField only when: (1) bigquerySource.dataSchema is set to custom, and (2) auto_generate_ids is set to false or is unspecified. Otherwise an INVALID_ARGUMENT error is returned.

The value of the BigQuery column name must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.

Connect to BigQuery with periodic syncing

The following procedure describes how to create a BigQuery data store that periodically syncs data from a BigQuery dataset. If your dataset has multiple tables, you can add them to the BigQuery data store you are creating. Each table you add is referred to as an entity. Gemini Enterprise creates a separate data store for each entity. Therefore, when you create the data store using the Google Cloud console, you get a collection of data stores representing these ingested data entities.

Data from the dataset is synced periodically to the entity data stores. You can specify synchronization daily, every three days, or every five days.

Console

To create a data store that periodically syncs data from a BigQuery dataset to Gemini Enterprise, follow these steps:

In the Google Cloud console, go to the Gemini Enterprise page.

Gemini Enterprise
In the navigation menu, click Data Stores.
Click Create Data Store.
On the Source page, select BigQuery.
Select the kind of data that you are importing.
Click Periodic.
Select the Sync frequency, how often you want the Gemini Enterprise connector to sync with the BigQuery dataset. You can change the frequency later.
In the BigQuery dataset path field, click Browse, select the dataset that contains the tables that you have prepared for ingesting. Alternatively, enter the table location directly in the BigQuery path field. The format for the path is projectname.datasetname.
In the Tables to sync field, click Browse, and then select a table that contains the data that you want for your data store.
Note:
Make sure that the data in the tables matches the kind of data that you selected in step 5.
If there is a mismatch you won't know until one of the following happens:
- You get errors when the connector tries to import data.
- You see unexpected search results. This happens if the selected type was structured but should have been unstructured or structured with metadata. The data is imported but the content URL or metadata is not recognized and is treated as a string.
After a data store is created, you cannot update the selected BigQuery tables. To update the table list, you must delete the existing data store and create a new one.
If there are additional tables in the dataset that you want to use for data stores, click Add table and specify those tables too.
Click Continue.
Choose a region for your data store, enter a name for your data connector, and click Create.

You have now created a data connector, which will periodically sync data with the BigQuery dataset. And, you have created one or more entity data stores. The data stores have the same names as the BigQuery tables.
To check the status of your ingestion, go to the Data Stores page and click your data connector name to see details about it on its Data page > Data ingestion activity tab. When the status column on the Activity tab changes from In progress to succeeded, the first ingestion is complete.

Depending on the size of your data, ingestion can take several minutes to several hours.

After you set up your data source and import data the first time, the data store syncs data from that source at a frequency that you select during setup. About an hour after the data connector is created, the first sync occurs. The next sync then occurs around 24 hours, 72 hours, or 120 hours later.

Import from BigQuery Stay organized with collections Save and categorize content based on your preferences.

Before you begin

Import once from BigQuery

Console

REST

Connect to BigQuery with periodic syncing

Console

Next steps