To get started with the Indexing API, you first need to create a datasource that will store all the data you will be indexing.

Using the Glean Custom App Setup Page

Only Glean admins can set up custom apps using Glean’s admin console.
If you are not a Glean admin, you can work with your Glean admin to set up a custom app or
follow Option 2 to set up a datasource using the /adddatasource API endpoint.

To set up a custom app, admins can navigate to the admin console, then click on the Data sources section.

Click on the “Add data source” button in the top right corner. In the modal that appears, click on “Custom” at the bottom of the list.

The key fields to quickly set up a datasource are listed under the “Data source basics” section. Some example values are shown in the screenshot below.

Once the values are set, click on the “Publish” button to create the custom app.

Using the /adddatasource API Endpoint

When creating a datasource, the key fields you need to set are the following:

FieldDescription
nameA unique identifier used to refer to the datasource.
displayNameThe datasource name shown in search results in the UI.
datasourceCategoryThe type of this datasource. This affects how results are ranked. More details on how to select a category can be inferred from Datasource Categories documentation.
urlRegexA regex that captures the view URLs of documents in the datasource as accurately as possible. Avoid regexes that are too broad, and will capture URLs from other datasources, or regexes that are too narrow, and will not capture documents from this datasource.
isUserReferencedByEmailThis should be set to true if you want to refer to user identities using emails directly. If you have your own notion of user ids, this can be set to false. This affects how Glean interprets permissions attached to documents.
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
  -H 'Authorization: Basic <Token>' \
  -d '
    {
      "name": "gleantest",
      "displayName": "Glean Test",
      "datasourceCategory": "PUBLISHED_CONTENT",
      "urlRegex": "^https://bluesky.test.*",
      "objectDefinitions": [
        {
          "name": "EngineeringDoc",
          "docCategory": "PUBLISHED_CONTENT"
        }
      ],
      "isUserReferencedByEmail": true
    }'

You can learn about more datasource customization options at here

For more information about datasource categories, see Datasource Categories.