Create a search data store

This page describes how to create a data store and ingest data for custom search apps in Agent Search. Go to the section for the source you plan to use:

To sync data from a third-party data source instead, see Connect a third-party data source.

For troubleshooting information, see Troubleshoot data ingestion.

To create data stores and connect data for Gemini Enterprise apps, see Introduction to connectors and data stores.

Create a data store using website content

Use the following procedure to create a data store and index websites.

To use a website data store after creating it, you must attach it to an app that has Enterprise features turned on. You can turn on Enterprise Edition for an app when you create it. This incurs additional costs. See Create a search app and About advanced features.

Before you begin

If you use the robots.txt file in your website, update it. For more information, see how to prepare your website's robots.txt file.

Procedure

Console

To use the Google Cloud console to make a data store and index websites, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. In the navigation menu, click Data Stores.

  3. Click Create data store.

  4. On the Source page, select Website Content.

  5. Choose whether to turn on Advanced website indexing for this data store. If you turn advanced website indexing on now, you can't turn it off later.

    Advanced website indexing provides additional features such as search summarization, search with follow-ups, and extractive answers. Advanced website indexing incurs additional cost, and requires that you verify domain ownership for any website that you index. For more information, see Advanced website indexing and Pricing.

  6. In the Sites to include field, enter the URL patterns matching the websites that you want to include in your data store. Include one URL pattern per line, without comma separators. For example, example.com/docs/*

  7. Optional: In the Sites to exclude field, enter URL patterns that you want to exclude from your data store.

    Excluded sites take priority over included sites. So, if you were to include example.com/docs/* but exclude example.com, then no websites would be indexed. For more information, see Website data.

  8. Click Continue.

  9. Select a location for your data store.

    • When you create a basic website search data store, this is always set to global (Global).
    • When you create a data store with advanced website indexing, you can select a location. Because the websites that are indexed must be public, Google strongly recommends that you select global (Global) as your location. This ensures maximum availability of all search and answering services and eliminates the limitations of regional data stores.
  10. Enter a name for your data store.

  11. Click Create. Agent Search creates your data store and displays your data stores on the Data Stores page.

  12. To view information about your data store, click the name of your data store in the Name column. Your data store page appears.

    • If you turned on Advanced website indexing, a warning appears prompting you to verify the domains in your data store.
    • If you have a quota shortfall (the number of pages in the websites that you specified exceeds the "Number of documents per project" quota for your project), an additional warning appears prompting you to upgrade your quota.
  13. To verify the domains for the URL patterns in your data store, follow the instructions on the Verify website domains page.

  14. To upgrade your quota, follow these steps:

    1. Click Upgrade quota. The IAM and Admin page of the Google Cloud console appears.
    2. Follow the instructions at Request a quota adjustment in the Google Cloud documentation. The quota to increase is Number of documents in the Discovery Engine API service.
    3. After submitting your request for a higher quota limit, go back to the AI Applications page and click Data Stores in the navigation menu.
    4. Click the name of your data store in the Name column. The Status column indicates that indexing is in progress for the websites that had surpassed the quota. When the Status column for a URL shows Indexed, advanced website indexing features are available for that URL or URL pattern.

    For more information, see Quota for web page indexing in the "Quotas and limits" page.

Python

For more information, see the Agent Search Python API reference documentation.

To authenticate to Agent Search, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

Create a data store


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

Import websites

#     from google.api_core.client_options import ClientOptions
#
#     from google.cloud import discoveryengine_v1 as discoveryengine
#
#     # TODO(developer): Uncomment these variables before running the sample.
#     # project_id = "YOUR_PROJECT_ID"
#     # location = "YOUR_LOCATION" # Values: "global"
#     # data_store_id = "YOUR_DATA_STORE_ID"
#     # NOTE: Do not include http or https protocol in the URI pattern
#     # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"
#
#     #  For more information, refer to:
#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
#     client_options = (
#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
#         if location != "global"
#         else None
#     )
#
#     # Create a client
#     client = discoveryengine.SiteSearchEngineServiceClient(
#         client_options=client_options
#     )
#
#     # The full resource name of the data store
#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}
#     site_search_engine = client.site_search_engine_path(
#         project=project_id, location=location, data_store=data_store_id
#     )
#
#     # Target Site to index
#     target_site = discoveryengine.TargetSite(
#         provided_uri_pattern=uri_pattern,
#         # Options: INCLUDE, EXCLUDE
#         type_=discoveryengine.TargetSite.Type.INCLUDE,
#         exact_match=False,
#     )
#
#     # Make the request
#     operation = client.create_target_site(
#         parent=site_search_engine,
#         target_site=target_site,
#     )
#
#     print(f"Waiting for operation to complete: {operation.operation.name}")
#     response = operation.result()
#
#     # After the operation is complete,
#     # get information from operation metadata
#     metadata = discoveryengine.CreateTargetSiteMetadata(operation.metadata)
#
#     # Handle the response
#     print(response)
#     print(metadata)

Next steps

Import from BigQuery

Agent Search supports searching across BigQuery data.

You can create data stores from BigQuery tables in two ways:

The following table compares the two ways that you can import BigQuery data into Agent Search data stores.

One-time ingestion Periodic ingestion
Data must be refreshed manually. Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed.
Agent Search creates a single data store from one table in a BigQuery. Agent Search creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset.
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table. Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table.
Data source access control is supported. Data source access control is not supported. The imported data can contain access controls but these controls won't be respected.
You can create a data store using either the Google Cloud console or the API. You must use the console to create data connectors and their entity data stores.
CMEK-compliant. CMEK-compliant.

Before you begin

To import data from a source Google Cloud project that's different from the Google Cloud project with the Agent Search data store, grant the following Identity and Access Management (IAM) roles to the service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com service account in the project that contains the Agent Search data store:

Import once from BigQuery

To ingest data from a BigQuery table, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.

Before importing your data, review Prepare data for ingesting.

Console

To use the Google Cloud console to ingest data from BigQuery, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. Go to the Data Stores page.

  3. Click Create data store.

  4. On the Source page, select BigQuery.

  5. Select the data type you are going to import from the What kind of data are you importing section.

  6. Select One time in the Synchronization frequency section.

  7. In the BigQuery path field, click Browse, select a table that you have prepared for ingesting, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.

  8. Click Continue.

  9. If you are doing one-time import of structured data:

    1. Map fields to key properties.

    2. If there are important fields missing from the schema, use Add new field to add them.

      For more information, see About auto-detect and edit.

    3. Click Continue.

  10. Choose a region for your data store.

  11. Enter a name for your data store.

  12. Click Create.

  13. To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take several minutes to several hours.

REST

To use the command line to create a data store and import data from BigQuery, follow these steps.

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DATA_STORE_DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
    }'
    

    Replace the following:

    Optional: If you're uploading unstructured data and want to configure document parsing or to turn on document chunking for RAG, specify the documentProcessingConfig object and include it in your data store creation request. Configuring an OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how to configure parsing or chunking options, see Parse and chunk documents.

  2. Import data from BigQuery.

    If you defined a schema, make sure the data conforms to that schema.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
    -d '{
      "bigquerySource": {
        "projectId": "PROJECT_ID",
        "datasetId":"DATASET_ID",
        "tableId": "TABLE_ID",
        "dataSchema": "DATA_SCHEMA",
        "aclEnabled": "BOOLEAN"
      },
      "reconciliationMode": "RECONCILIATION_MODE",
      "autoGenerateIds": "AUTO_GENERATE_IDS",
      "idField": "ID_FIELD",
      "errorConfig": {
        "gcsPrefix": "ERROR_DIRECTORY"
      }
    }'
    

    Replace the following:

Console

To use the console to ingest data from Cloud SQL, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. Go to the Data Stores page.

  3. Click New data store.

  4. On the Source page, select Cloud SQL.

  5. Specify the project ID, instance ID, database ID, and table ID of the data that you plan to import.

  6. Click Browse and choose an intermediate Cloud Storage location to export data to, and then click Select. Alternatively, enter the location directly in the gs:// field.

  7. Select whether to turn on serverless export. Serverless export incurs additional cost. For information about serverless export, see Minimize the performance impact of exports in the Cloud SQL documentation.

  8. Click Continue.

  9. Choose a region for your data store.

  10. Enter a name for your data store.

  11. Click Create.

  12. To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take several minutes or several hours.

REST

To use the command line to create a data store and ingest data from Cloud SQL, follow these steps:

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
    }'
    

    Replace the following:

  2. Import data from Cloud SQL.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "cloudSqlSource": {
          "projectId": "SQL_PROJECT_ID",
          "instanceId": "INSTANCE_ID",
          "databaseId": "DATABASE_ID",
          "tableId": "TABLE_ID",
          "gcsStagingDir": "STAGING_DIRECTORY"
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
      }'
    

    Replace the following:

Console

To use the console to ingest data from Spanner, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. Go to the Data Stores page.

  3. Click New data store.

  4. On the Source page, select Cloud Spanner.

  5. Specify the project ID, instance ID, database ID, and table ID of the data that you plan to import.

  6. Select whether to turn on Data Boost. For information about Data Boost, see Data Boost overview in the Spanner documentation.

  7. Click Continue.

  8. Choose a region for your data store.

  9. Enter a name for your data store.

  10. Click Create.

  11. To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take several minutes or several hours.

REST

To use the command line to create a data store and ingest data from Spanner, follow these steps:

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
      "contentConfig": "CONTENT_REQUIRED",
    }'
    

    Replace the following:

  2. Import data from Spanner.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "cloudSpannerSource": {
          "projectId": "SPANNER_PROJECT_ID",
          "instanceId": "INSTANCE_ID",
          "databaseId": "DATABASE_ID",
          "tableId": "TABLE_ID",
          "enableDataBoost": "DATA_BOOST_BOOLEAN"
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
      }'
    

    Replace the following:

Console

To use the console to ingest data from Firestore, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. Go to the Data Stores page.

  3. Click New data store.

  4. On the Source page, select Firestore.

  5. Specify the project ID, database ID, and collection ID of the data that you plan to import.

  6. Click Continue.

  7. Choose a region for your data store.

  8. Enter a name for your data store.

  9. Click Create.

  10. To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take several minutes or several hours.

REST

To use the command line to create a data store and ingest data from Firestore, follow these steps:

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
    }'
    

    Replace the following:

  2. Import data from Firestore.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "firestoreSource": {
          "projectId": "FIRESTORE_PROJECT_ID",
          "databaseId": "DATABASE_ID",
          "collectionId": "COLLECTION_ID",
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
      }'
    

    Replace the following:

REST

To use the command line to create a data store and ingest data from Bigtable, follow these steps:

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
    }'
    

    Replace the following:

  2. Import data from Bigtable.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "bigtableSource ": {
          "projectId": "BIGTABLE_PROJECT_ID",
          "instanceId": "INSTANCE_ID",
          "tableId": "TABLE_ID",
          "bigtableOptions": {
            "keyFieldName": "KEY_FIELD_NAME",
            "families": {
              "key": "KEY",
              "value": {
                "fieldName": "FIELD_NAME",
                "encoding": "ENCODING",
                "type": "TYPE",
                "columns": [
                  {
                    "qualifier": "QUALIFIER",
                    "fieldName": "FIELD_NAME",
                    "encoding": "COLUMN_ENCODING",
                    "type": "COLUMN_VALUES_TYPE"
                  }
                ]
              }
             }
             ...
          }
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
      }'
    

    Replace the following:

Console

To use the console to ingest data from AlloyDB for PostgreSQL, follow these steps:

  1. In the Google Cloud console, go to the AI Applications page.

    AI Applications

  2. In the navigation menu, click Data Stores.

  3. Click Create data store.

  4. On the Source page, select AlloyDB.

  5. Specify the project ID, location ID, cluster ID, database ID, and table ID of the data that you plan to import.

  6. Click Continue.

  7. Choose a region for your data store.

  8. Enter a name for your data store.

  9. Click Create.

  10. To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.

    Depending on the size of your data, ingestion can take several minutes or several hours.

REST

To use the command line to create a data store and ingest data from AlloyDB for PostgreSQL, follow these steps:

  1. Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
    }'
    

    Replace the following:

  2. Import data from AlloyDB for PostgreSQL.

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "alloydbSource": {
          "projectId": "ALLOYDB_PROJECT_ID",
          "locationId": "LOCATION_ID",
          "clusterId": "CLUSTER_ID",
          "databaseId": "DATABASE_ID",
          "tableId": "TABLE_ID",
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
      }'
    

    Replace the following:

REST

To use the command line to create a data store and import structured JSON data, follow these steps.

  • Create a data store.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DATA_STORE_DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
    }'
    

    Replace the following:

  • Import structured data.

    There are a few approaches that you can use to upload data, including:

  • Upload a JSON document.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
    -d '{
      "jsonData": "JSON_DOCUMENT_STRING"
    }'
    

    Replace the following:

  • Upload a JSON object.

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
    -d '{
      "structData": JSON_DOCUMENT_OBJECT
    }'
    

    Replace JSON_DOCUMENT_OBJECT with the JSON document as a JSON object. This must conform to the JSON schema that you provided in the previous step—for example:

     {
       "title": "test title",
       "categories": [
         "cat_1",
         "cat_2"
       ],
       "uri": "test uri"
     }
    
  • Update with a JSON document.

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
    -d '{
      "jsonData": "JSON_DOCUMENT_STRING"
    }'
    
  • Update with a JSON object.

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
    -d '{
      "structData": JSON_DOCUMENT_OBJECT
    }'
    

    Next steps

    Troubleshoot data ingestion

    If you are having problems with data ingestion, review these tips:

    Create a data store using Terraform

    You can use Terraform to create an empty data store. After the empty data store is created, you can ingest data into the data store using the Google Cloud console or API commands.

    To learn how to apply or remove a Terraform configuration, see Basic Terraform commands.

    To create an empty data store using Terraform, see google_discovery_engine_data_store.

    Connect a third-party data source

    Connecting third-party data sources to Agent Search is no longer supported.

    See the instructions on how to Connect a third-party data source with Gemini Enterprise documentation.