Getting Started

The getting started guide will help you quickly get familiar with the basics by completing a simple task of configuring a datasource, and indexing one document.

Authentication

All indexing API endpoints require a Bearer Auth token. You can obtain a token scoped to the datasource you want to interact with. Obtain the API token from your Glean Admin (who can access it under Workspace Settings > Setup > API tokens). Store this token in a secure place.

Determine the host you need to connect to. This will be the URL of the backend for your Glean deployment, for example, customer-be.glean.com

Using the SDK

In the tutorials, we use curl for example API requests, but you can use the python SDK for production code.

Install the indexing API python sdk locally.

Copy
Copied
pip install https://app.glean.com/meta/indexing_api_client.zip

Set up the API client

Copy
Copied
import glean_indexing_api_client as indexing_api

# Configure host and Bearer authorization: BearerAuth
configuration = indexing_api.Configuration(
  host="http://customer-be.glean.com/api/index/v1", access_token="YOUR_API_TOKEN"
)

api_client = indexing_api.ApiClient(configuration)

Set up a datasource

The first step is to create a datasource that you can use to index documents, identity information, etc.

When creating a datasource, the key fields you need to set are the following:

name A unique identifier used to refer to the datasource.

displayName The datasource name shown in search results in the UI.

datasourceCategory The type of this datasource. Affects how results are ranked. More details on how to select this: Selecting datasource category

urlRegex A regex that captures the view URLs of documents in the datasource as accurately as possible. Avoid regexes that too broad, and will capture URLs from other datasources, or regexes that are too narrow, and will not capture documents from this datasource.

isUserReferencedByEmail This should be set to true if you want to refer to user identities using emails directly. If you have your own notion of user ids, this can be set to false.

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
  -H 'Authorization: Basic <Token>' \
  -d '
{
  "name": "gleantest",
  "displayName": "Glean Test",
  "datasourceCategory": "PUBLISHED_CONTENT",
  "urlRegex": "^https://bluesky.test.*",
  "isUserReferencedByEmail": true
}'
Copy
Copied
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
datasource_api = datasources_api.DatasourcesApi(api_client)
# Datasource config supports many fields for customization, but a bare minimum
# config should be ok to get started.
datasource_config = CustomDatasourceConfig(
  name='gleantest',
  display_name='Glean Test',
  datasource_category='PUBLISHED_CONTENT',
  url_regex='^http://bluesky.test.*',
  # Permissions will be specified by email addresses instead of a
  # datasource-specific ID.
  is_user_referenced_by_email=True,
)
try:
  datasource_api.adddatasource_post(datasource_config)
except indexing_api.ApiException as e:
  print('Exception when calling DatasourcesApi->adddatasource_post: %s\\n' % e)

You can learn about more datasource customization options at here

Index a document for the datasource

Once the datasource is configured, we can index documents. A document has the following key fields.

id This is a unique identifier for the document within the datasource. The id can only contain alphanumeric characters (underscores are not allowed). The id should be stable, meaning that the same document must keep the same id across uploads. If an id is not provided, we use a hash of the viewURL as the id.

objectType Type of object within the datasource. For example, a drive might have objects of type file and folder.

title Title of the document.

body Searchable document body. This might be shown in search result snippets.

viewURL A unique URL that can used to view the document in a browser. This viewURL must also match the urlRegex specified while creating the datasource.

permissions This can be used to control visibility of the document in search results for different Glean users. For simplicity, in this tutorial, we will only index a document with anonymous access using permissions.allowAnonymousAccess.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/indexdocument \
  -H 'Authorization: Basic <Token>' \
  -d '
{
  "document": {
    "datasource": "gleantest",
    "objectType": "EngineeringDoc",
    "id": "blueskytest-1",
    "title": "Getting started with Blue Sky",
    "body": {
      "mimeType": "text/plain",
      "textContent": "This doc will help you get familiar with Blue Sky API"
    },
    "permissions": {
      "allowAnonymousAccess": true
    },
    "viewURL": "https://bluesky.test/blueskytest-1",
    
  }
}'
Copy
Copied
from glean_indexing_api_client.api import documents_api
from glean_indexing_api_client.model.index_document_request import IndexDocumentRequest
from glean_indexing_api_client.model.document_definition import DocumentDefinition
from glean_indexing_api_client.model.content_definition import ContentDefinition
from glean_indexing_api_client.model.user_reference_definition import (
    UserReferenceDefinition,
)
from glean_indexing_api_client.model.document_permissions_definition import (
    DocumentPermissionsDefinition,
)

request = IndexDocumentRequest(
    # DocumentDefinition has many fields, we show the usage of a few basic ones.
    document=DocumentDefinition(
        datasource="gleantest",
        object_type="EngineeringDoc",
        title="This doc will help you get familiar with Blue Sky API",
        id="blueskytest-1",
        view_url="https://bluesky.test/blueskytest-1",
        body=ContentDefinition(mime_type="text/plain", text_content="This doc will help you get familiar with Blue Sky API"),
        permissions=DocumentPermissionsDefinition(
          allow_anonymous_access=True
        ),
    )
)
documents_api = documents_api.DocumentsApi(api_client)
try:
    documents_api.indexdocument_post(request)
except indexing_api.ApiException as e:
    print("Exception when calling DocumentsApi->indexdocument_post: %s\n" % e)

To learn about more document customization options here

To learn more about how to set up user identities, and more complex permissions, read Setting permissions

To index documents in bulk, you can use Bulk indexing

For helpful troubleshooting tips, read Troubleshooting

Final steps

For the indexed document to show up in Glean UI, following additional steps are necessary. Please contact your Glean CSM to get these done.

(1) The datasource must be added to the queryapi.datasources config to enable it for search.

(2) There must be a render config for the datasource so that the UI knows how to display it. A basic render config like this should work.

Copy
Copied
  default: {
    icon: {
      primary: {
        iconType: IconType.DATASOURCE
      }
    },
    meta: {
      keys: [DocumentDatumType.AUTHOR]
    }
  }

For now, Glean will set this up internally, but in future this be made configurable via Glean Admin Console.

Once these steps are done, you should be able to search for the indexed document in Glean when logged in as the user added above. Note that it can take a few minutes for the document to reflect in the index after an /indexdocument call.