Getting Started
This guide will help you quickly get familiar with the basics of the Indexing API by completing a simple task of configuring a datasource, and indexing document and enabling the datasource to show up search.
Authentication
All indexing API endpoints require a Bearer Auth token. You can obtain a token scoped to the datasource you want to interact with. Obtain the API token from your Glean Admin. Store this token in a secure place.
info
Glean admins can manage Indexing API tokens via the API tokens page within Workspace Settings (Workspace > Setup > API tokens > Indexing tokens tab). Please refer to the Managing API tokens tutorial to get more information on API tokens.
Determine the host you need to connect to. This will be the URL of the backend for your Glean deployment, for example, customer-be.glean.com
Using the Python SDK
In the tutorials, we use curl and python for example API requests. You can also use the python SDK for production code.
-
Install the indexing API python sdk locally
pip install https://app.glean.com/meta/indexing_api_client.zip
-
Set up the API client
import glean_indexing_api_client as indexing_api # Configure host and Bearer authorization: BearerAuth configuration = indexing_api.Configuration( host="http://customer-be.glean.com/api/index/v1", access_token="<YOUR_API_TOKEN>" ) api_client = indexing_api.ApiClient(configuration)
You can find documentation for the python SDK under the SDK reference
Set up a datasource
The first step is to create a datasource that you can use to index documents, identity information, etc.
Option 1: Using the Glean custom app setup page
Note
Only Glean admins can set up custom apps using Glean's Workspace Settings.
If you are not a Glean admin, you can work with your Glean admin to set up a custom app or
follow Option 2 to set up a datasource using the /adddatasource
API endpoint.
To set up a custom app, Admins can navigate to Workspace Settings, then click on the Data sources section.
Click on the "Add data source" button in the top right corner. In the modal that appears, click on "Custom" at the bottom of the list.
The key fields to quickly set up a datasource are listed under the "Data source basics" section. Some example values are shown in the screenshot below.
Once the values are set, click on the "Publish" button to create the custom app.
Option 2: Using the /adddatasource endpoint
When creating a datasource, the key fields you need to set are the following:
name
A unique identifier used to refer to the datasource.
displayName
The datasource name shown in search results in the UI.
datasourceCategory
The type of this datasource. This affects how results are ranked. More details on how
to select a category can be inferred from Datasource
category documentation.
urlRegex
A regex that captures the view URLs of documents in the datasource as accurately
as possible. Avoid regexes that are too broad, and will capture URLs from other
datasources, or regexes that are too narrow, and will not capture documents from this
datasource.
isUserReferencedByEmail
This should be set to true if you want to refer to user identities using emails
directly. If you have your own notion of user ids, this can be set to false. This affects how Glean interprets permissions attached to documents.
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
-H 'Authorization: Basic <Token>' \
-d '
{
"name": "gleantest",
"displayName": "Glean Test",
"datasourceCategory": "PUBLISHED_CONTENT",
"urlRegex": "^https://bluesky.test.*",
"objectDefinitions": [
{
"name": "EngineeringDoc",
"docCategory": "PUBLISHED_CONTENT"
}
],
"isUserReferencedByEmail": true
}'
import glean_indexing_api_client as indexing_api
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
from glean_indexing_api_client.model.object_definition import ObjectDefinition
datasource_api = datasources_api.DatasourcesApi(api_client)
# Datasource config supports many fields for customization, but a bare minimum
# config should be ok to get started.
datasource_config = CustomDatasourceConfig(
name='gleantest',
display_name='Glean Test',
datasource_category='PUBLISHED_CONTENT',
url_regex='^http://bluesky.test.*',
object_definitions=[
ObjectDefinition(
doc_category='PUBLISHED_CONTENT',
name='EngineeringDoc'
)
],
# Permissions will be specified by email addresses instead of a
# datasource-specific ID.
is_user_referenced_by_email=True,
)
try:
datasource_api.adddatasource_post(datasource_config)
except indexing_api.ApiException as e:
print('Exception when calling DatasourcesApi->adddatasource_post: %s\\n' % e)
You can learn about more datasource customization options at here
Index a document for the datasource
Once the datasource is configured, we can index documents. A document has the following key fields.
id
This is a unique identifier for the document within the datasource. The id can
only contain alphanumeric characters (underscores are not allowed). The id
should be stable, meaning that the same document must keep the same id across
uploads. If an id is not provided, we use a hash of the viewURL
as the id.
objectType
Type of object within the datasource. For example, a drive might have objects of
type file and folder.
title
Title of the document.
body
Searchable document body. This might be shown in search result snippets.
viewURL
A unique URL that can used to view the document in a browser. This viewURL must
also match the urlRegex specified while creating the datasource.
permissions
This can be used to control visibility of the document in search results for
different Glean users. For simplicity, in this tutorial, we will only index a
document with anonymous access using permissions.allowAnonymousAccess
.
curl -X POST https://customer-be.glean.com/api/index/v1/indexdocument \
-H 'Authorization: Basic <Token>' \
-d '
{
"document": {
"datasource": "gleantest",
"objectType": "EngineeringDoc",
"id": "blueskytest-1",
"title": "Getting started with Blue Sky",
"body": {
"mimeType": "text/plain",
"textContent": "This doc will help you get familiar with Blue Sky API"
},
"permissions": {
"allowAnonymousAccess": true
},
"viewURL": "https://bluesky.test/blueskytest-1"
}
}'
import glean_indexing_api_client as indexing_api
from glean_indexing_api_client.api import documents_api
from glean_indexing_api_client.model.index_document_request import IndexDocumentRequest
from glean_indexing_api_client.model.document_definition import DocumentDefinition
from glean_indexing_api_client.model.content_definition import ContentDefinition
from glean_indexing_api_client.model.user_reference_definition import (
UserReferenceDefinition,
)
from glean_indexing_api_client.model.document_permissions_definition import (
DocumentPermissionsDefinition,
)
request = IndexDocumentRequest(
# DocumentDefinition has many fields, we show the usage of a few basic ones.
document=DocumentDefinition(
datasource="gleantest",
object_type="EngineeringDoc",
title="This doc will help you get familiar with Blue Sky API",
id="blueskytest-1",
view_url="https://bluesky.test/blueskytest-1",
body=ContentDefinition(mime_type="text/plain", text_content="This doc will help you get familiar with Blue Sky API"),
permissions=DocumentPermissionsDefinition(
allow_anonymous_access=True
),
)
)
documents_api = documents_api.DocumentsApi(api_client)
try:
documents_api.indexdocument_post(request)
except indexing_api.ApiException as e:
print("Exception when calling DocumentsApi->indexdocument_post: %s\n" % e)
To learn about more document customization options here
To learn more about how to set up user identities, and more complex permissions, read Setting permissions
To index documents in bulk, you can use Bulk indexing
For helpful troubleshooting tips, read Troubleshooting
Enable search results for the datasource
To be able to search for the indexed document in Glean, it must be enabled show up in search results.
Glean Admins can go to the 'Overview' tab on the setup page for their custom app to enable the datasource to show up in search results.
Results can be configured to be visible to all teammates or to their configured test group
Note
Document permissions are respected even when results are enabled for All teammates
or Test group only
.
Enabling results for All teammates
/ Test group only
does not mean configured permissions for documents will be overridden.
Success!
Success!
Once these steps are done, you should be able to search for the indexed document in Glean when logged in as the user added above. Note that it can take a few minutes for the document to reflect in the index after an /indexdocument call.
More examples
Take a look at our open-sourced repository for more code examples:
Refer to the Wikipedia toy example which indexes some relevant articles from Wikipedia using the Indexing API.
attention
Help us improve by contributing to the open-sourced repository by suggesting more examples and creating issues with feedback!