Getting Started

This guide will help you quickly get familiar with the basics of the Indexing API by completing a simple task of configuring a datasource, and indexing document and enabling the datasource to show up search.

Authentication

All indexing API endpoints require a Bearer Auth token. You can obtain a token scoped to the datasource you want to interact with. Obtain the API token from your Glean Admin. Store this token in a secure place.

info

Glean admins can manage Indexing API tokens via the API tokens page within Workspace Settings (Workspace > Setup > API tokens > Indexing tokens tab). Please refer to the Managing API tokens tutorial to get more information on API tokens.

Determine the host you need to connect to. This will be the URL of the backend for your Glean deployment, for example, customer-be.glean.com

Using the Python SDK

In the tutorials, we use curl and python for example API requests. You can also use the python SDK for production code.

  1. Install the indexing API python sdk locally
    Copy
    Copied
    pip install https://app.glean.com/meta/indexing_api_client.zip
  2. Set up the API client
    Copy
    Copied
    import glean_indexing_api_client as indexing_api
    # Configure host and Bearer authorization: BearerAuth
    configuration = indexing_api.Configuration(
    host="http://customer-be.glean.com/api/index/v1", access_token="<YOUR_API_TOKEN>"
    )
    api_client = indexing_api.ApiClient(configuration)

You can find documentation for the python SDK under the SDK reference

Set up a datasource

The first step is to create a datasource that you can use to index documents, identity information, etc.

Option 1: Using the Glean custom app setup page
Note

Only Glean admins can set up custom apps using Glean's Workspace Settings.
If you are not a Glean admin, you can work with your Glean admin to set up a custom app or
follow Option 2 to set up a datasource using the /adddatasource API endpoint.

To set up a custom app, Admins can navigate to Workspace Settings, then click on the Data sources section.

Click on the "Add data source" button in the top right corner. In the modal that appears, click on "Custom" at the bottom of the list.

The key fields to quickly set up a datasource are listed under the "Data source basics" section. Some example values are shown in the screenshot below.

Enable search results

Once the values are set, click on the "Publish" button to create the custom app.

Option 2: Using the /adddatasource endpoint

When creating a datasource, the key fields you need to set are the following:

name A unique identifier used to refer to the datasource.

displayName The datasource name shown in search results in the UI.

datasourceCategory The type of this datasource. This affects how results are ranked. More details on how to select a category can be inferred from Datasource category documentation.

urlRegex A regex that captures the view URLs of documents in the datasource as accurately as possible. Avoid regexes that are too broad, and will capture URLs from other datasources, or regexes that are too narrow, and will not capture documents from this datasource.

isUserReferencedByEmail This should be set to true if you want to refer to user identities using emails directly. If you have your own notion of user ids, this can be set to false. This affects how Glean interprets permissions attached to documents.

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
  -H 'Authorization: Basic <Token>' \
  -d '
{
  "name": "gleantest",
  "displayName": "Glean Test",
  "datasourceCategory": "PUBLISHED_CONTENT",
  "urlRegex": "^https://bluesky.test.*",
  "objectDefinitions": [
    {
      "name": "EngineeringDoc",
      "docCategory": "PUBLISHED_CONTENT"
    }
  ],
  "isUserReferencedByEmail": true
}'
Copy
Copied
import glean_indexing_api_client as indexing_api
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
from glean_indexing_api_client.model.object_definition import ObjectDefinition
datasource_api = datasources_api.DatasourcesApi(api_client)
# Datasource config supports many fields for customization, but a bare minimum
# config should be ok to get started.
datasource_config = CustomDatasourceConfig(
  name='gleantest',
  display_name='Glean Test',
  datasource_category='PUBLISHED_CONTENT',
  url_regex='^http://bluesky.test.*',
  object_definitions=[
            ObjectDefinition(
                doc_category='PUBLISHED_CONTENT',
                name='EngineeringDoc'
              )
          ],
  # Permissions will be specified by email addresses instead of a
  # datasource-specific ID.
  is_user_referenced_by_email=True,
)
try:
  datasource_api.adddatasource_post(datasource_config)
except indexing_api.ApiException as e:
  print('Exception when calling DatasourcesApi->adddatasource_post: %s\\n' % e)

You can learn about more datasource customization options at here

Index a document for the datasource

Once the datasource is configured, we can index documents. A document has the following key fields.

id This is a unique identifier for the document within the datasource. The id can only contain alphanumeric characters (underscores are not allowed). The id should be stable, meaning that the same document must keep the same id across uploads. If an id is not provided, we use a hash of the viewURL as the id.

objectType Type of object within the datasource. For example, a drive might have objects of type file and folder.

title Title of the document.

body Searchable document body. This might be shown in search result snippets.

viewURL A unique URL that can used to view the document in a browser. This viewURL must also match the urlRegex specified while creating the datasource.

permissions This can be used to control visibility of the document in search results for different Glean users. For simplicity, in this tutorial, we will only index a document with anonymous access using permissions.allowAnonymousAccess.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/indexdocument \
  -H 'Authorization: Basic <Token>' \
  -d '
{
  "document": {
    "datasource": "gleantest",
    "objectType": "EngineeringDoc",
    "id": "blueskytest-1",
    "title": "Getting started with Blue Sky",
    "body": {
      "mimeType": "text/plain",
      "textContent": "This doc will help you get familiar with Blue Sky API"
    },
    "permissions": {
      "allowAnonymousAccess": true
    },
    "viewURL": "https://bluesky.test/blueskytest-1"
  }
}'
Copy
Copied
import glean_indexing_api_client as indexing_api
from glean_indexing_api_client.api import documents_api
from glean_indexing_api_client.model.index_document_request import IndexDocumentRequest
from glean_indexing_api_client.model.document_definition import DocumentDefinition
from glean_indexing_api_client.model.content_definition import ContentDefinition
from glean_indexing_api_client.model.user_reference_definition import (
    UserReferenceDefinition,
)
from glean_indexing_api_client.model.document_permissions_definition import (
    DocumentPermissionsDefinition,
)

request = IndexDocumentRequest(
    # DocumentDefinition has many fields, we show the usage of a few basic ones.
    document=DocumentDefinition(
        datasource="gleantest",
        object_type="EngineeringDoc",
        title="This doc will help you get familiar with Blue Sky API",
        id="blueskytest-1",
        view_url="https://bluesky.test/blueskytest-1",
        body=ContentDefinition(mime_type="text/plain", text_content="This doc will help you get familiar with Blue Sky API"),
        permissions=DocumentPermissionsDefinition(
          allow_anonymous_access=True
        ),
    )
)
documents_api = documents_api.DocumentsApi(api_client)
try:
    documents_api.indexdocument_post(request)
except indexing_api.ApiException as e:
    print("Exception when calling DocumentsApi->indexdocument_post: %s\n" % e)

To learn about more document customization options here

To learn more about how to set up user identities, and more complex permissions, read Setting permissions

To index documents in bulk, you can use Bulk indexing

For helpful troubleshooting tips, read Troubleshooting

Enable search results for the datasource

To be able to search for the indexed document in Glean, it must be enabled show up in search results.

Glean Admins can go to the 'Overview' tab on the setup page for their custom app to enable the datasource to show up in search results.

Results can be configured to be visible to all teammates or to their configured test group

Note

Document permissions are respected even when results are enabled for All teammates or Test group only. Enabling results for All teammates / Test group only does not mean configured permissions for documents will be overridden.

Enable search results

Success!

Success!

Once these steps are done, you should be able to search for the indexed document in Glean when logged in as the user added above. Note that it can take a few minutes for the document to reflect in the index after an /indexdocument call.

More examples

Take a look at our open-sourced repository for more code examples: Github repository badge

Refer to the Wikipedia toy example which indexes some relevant articles from Wikipedia using the Indexing API.

attention

Help us improve by contributing to the open-sourced repository by suggesting more examples and creating issues with feedback!