Getting Started
The getting started guide will help you quickly get familiar with the basics by completing a simple task of configuring a datasource, and indexing one document.
Authentication
All indexing API endpoints require a Bearer Auth token. You can obtain a token scoped to the datasource you want to interact with. Obtain the API token from your Glean Admin (who can access it under Workspace Settings > Setup > API tokens). Store this token in a secure place.
Determine the host you need to connect to. This will be the URL of the backend for your Glean deployment, for example, customer-be.glean.com
Using the SDK
In the tutorials, we use curl for example API requests, but you can use the python SDK for production code.
Install the indexing API python sdk locally.
pip install https://app.glean.com/meta/indexing_api_client.zip
Set up the API client
import glean_indexing_api_client as indexing_api
# Configure host and Bearer authorization: BearerAuth
configuration = indexing_api.Configuration(
host="http://customer-be.glean.com/api/index/v1", access_token="YOUR_API_TOKEN"
)
api_client = indexing_api.ApiClient(configuration)
Set up a datasource
The first step is to create a datasource that you can use to index documents, identity information, etc.
When creating a datasource, the key fields you need to set are the following:
name
A unique identifier used to refer to the datasource.
displayName
The datasource name shown in search results in the UI.
datasourceCategory
The type of this datasource. Affects how results are ranked. More details on how
to select this: Selecting datasource
category
urlRegex
A regex that captures the view URLs of documents in the datasource as accurately
as possible. Avoid regexes that too broad, and will capture URLs from other
datasources, or regexes that are too narrow, and will not capture documents from this
datasource.
isUserReferencedByEmail
This should be set to true if you want to refer to user identities using emails
directly. If you have your own notion of user ids, this can be set to false.
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
-H 'Authorization: Basic <Token>' \
-d '
{
"name": "gleantest",
"displayName": "Glean Test",
"datasourceCategory": "PUBLISHED_CONTENT",
"urlRegex": "^https://bluesky.test.*",
"isUserReferencedByEmail": true
}'
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
datasource_api = datasources_api.DatasourcesApi(api_client)
# Datasource config supports many fields for customization, but a bare minimum
# config should be ok to get started.
datasource_config = CustomDatasourceConfig(
name='gleantest',
display_name='Glean Test',
datasource_category='PUBLISHED_CONTENT',
url_regex='^http://bluesky.test.*',
# Permissions will be specified by email addresses instead of a
# datasource-specific ID.
is_user_referenced_by_email=True,
)
try:
datasource_api.adddatasource_post(datasource_config)
except indexing_api.ApiException as e:
print('Exception when calling DatasourcesApi->adddatasource_post: %s\\n' % e)
You can learn about more datasource customization options at here
Index a document for the datasource
Once the datasource is configured, we can index documents. A document has the following key fields.
id
This is a unique identifier for the document within the datasource. The id can
only contain alphanumeric characters (underscores are not allowed). The id
should be stable, meaning that the same document must keep the same id across
uploads. If an id is not provided, we use a hash of the viewURL
as the id.
objectType
Type of object within the datasource. For example, a drive might have objects of
type file and folder.
title
Title of the document.
body
Searchable document body. This might be shown in search result snippets.
viewURL
A unique URL that can used to view the document in a browser. This viewURL must
also match the urlRegex specified while creating the datasource.
permissions
This can be used to control visibility of the document in search results for
different Glean users. For simplicity, in this tutorial, we will only index a
document with anonymous access using permissions.allowAnonymousAccess
.
curl -X POST https://customer-be.glean.com/api/index/v1/indexdocument \
-H 'Authorization: Basic <Token>' \
-d '
{
"document": {
"datasource": "gleantest",
"objectType": "EngineeringDoc",
"id": "blueskytest-1",
"title": "Getting started with Blue Sky",
"body": {
"mimeType": "text/plain",
"textContent": "This doc will help you get familiar with Blue Sky API"
},
"permissions": {
"allowAnonymousAccess": true
},
"viewURL": "https://bluesky.test/blueskytest-1",
}
}'
from glean_indexing_api_client.api import documents_api
from glean_indexing_api_client.model.index_document_request import IndexDocumentRequest
from glean_indexing_api_client.model.document_definition import DocumentDefinition
from glean_indexing_api_client.model.content_definition import ContentDefinition
from glean_indexing_api_client.model.user_reference_definition import (
UserReferenceDefinition,
)
from glean_indexing_api_client.model.document_permissions_definition import (
DocumentPermissionsDefinition,
)
request = IndexDocumentRequest(
# DocumentDefinition has many fields, we show the usage of a few basic ones.
document=DocumentDefinition(
datasource="gleantest",
object_type="EngineeringDoc",
title="This doc will help you get familiar with Blue Sky API",
id="blueskytest-1",
view_url="https://bluesky.test/blueskytest-1",
body=ContentDefinition(mime_type="text/plain", text_content="This doc will help you get familiar with Blue Sky API"),
permissions=DocumentPermissionsDefinition(
allow_anonymous_access=True
),
)
)
documents_api = documents_api.DocumentsApi(api_client)
try:
documents_api.indexdocument_post(request)
except indexing_api.ApiException as e:
print("Exception when calling DocumentsApi->indexdocument_post: %s\n" % e)
To learn about more document customization options here
To learn more about how to set up user identities, and more complex permissions, read Setting permissions
To index documents in bulk, you can use Bulk indexing
For helpful troubleshooting tips, read Troubleshooting
Final steps
For the indexed document to show up in Glean UI, following additional steps are necessary. Please contact your Glean CSM to get these done.
(1) The datasource must be added to the queryapi.datasources config to enable it for search.
(2) There must be a render config for the datasource so that the UI knows how to display it. A basic render config like this should work.
default: {
icon: {
primary: {
iconType: IconType.DATASOURCE
}
},
meta: {
keys: [DocumentDatumType.AUTHOR]
}
}
For now, Glean will set this up internally, but in future this be made configurable via Glean Admin Console.
Once these steps are done, you should be able to search for the indexed document in Glean when logged in as the user added above. Note that it can take a few minutes for the document to reflect in the index after an /indexdocument call.