Debugging / Troubleshooting using the Indexing API

These endpoints help you get data about the state of your documents and their permissions, metadata about your datasource, etc. which help you debug indexing API related issues..

Get datasource config

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/getdatasourceconfig \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.get_datasource_config_request import GetDatasourceConfigRequest
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
from pprint import pprint
datasource_api = datasources_api.DatasourcesApi(api_client)
get_datasource_config_request = GetDatasourceConfigRequest(
    datasource="gleantest"
) 

try:
    api_response = datasource_api.getdatasourceconfig_post(get_datasource_config_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling DatasourcesApi->getdatasourceconfig_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "name": "CUSTOM_GLEANTEST",
  "displayName": "Glean Test Datasource",
  "homeUrl": "https://www.gleantest.com",
  "objectDefinitions": [
    {
      "name": "EngineeringDoc",
      "propertyDefinitions": [
        {
          "name": "Org",
          "displayLabel": "Organization",
          "propertyType": "TEXT",
          "uiOptions": "SEARCH_RESULT",
          "hideUiFacet": false
        }
      ]
    }
  ],
  "urlRegex": "https://www.gleantest.com/.*",
  "datasourceCategory": "PUBLISHED_CONTENT",
  "isOnPrem": false,
  "isUserReferencedByEmail": true,
  "isEntityDatasource": false,
  "isTestDatasource": false
}
  • /getdatasourceconfig endpoint returns the current state of the datasource config.
  • This can be used to double-check if the config is correct. If you need to make any changes to the config, you can use the /adddatasource endpoint again to override the previous config.

Get document count

Fetches document count for the specified custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getdocumentcount \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_count_request = GetDocumentCountRequest(
    datasource="gleantest"
)

try:
    api_response = troubleshoot_api.getdocumentcount_post(get_document_count_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getdocumentcount_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "documentCount": 100000
}
  • Note that /getdocumentcount returns the number of uploaded documents. These may or may not be equal to the number of documents visible to you on Glean UI. The documents are processed asynchronously to populate proper permissions and content before they become visible to you on Glean UI. Any error in processing like invalid permissions, etc. renders the document not visible on Glean UI
  • To see if a particular document has been indexed or not, use the /getdocumentstatus endpoint as described below.

Get document upload and indexing status

Fetches the current upload and indexing status of documents.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getdocumentstatus \
  -H 'Authorization: Basic <Token>' \
  -d '
{
  "datasource": "gleantest",
  "objectType": "EngineeringDoc",
  "docId": "eng-doc-1"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_status_request = GetDocumentStatusRequest(
    datasource="gleantest",
    object_type="EngineeringDoc",
    doc_id="eng-doc-1",
)

try:
    api_response = troubleshoot_api.getdocumentstatus_post(get_document_status_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getdocumentstatus_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "uploadStatus": "UPLOADED",
  "lastUploadedAt": 1663882181,
  "indexingStatus": "INDEXED",
  "lastIndexedAt": 1664834539
}

The following are the possible responses you can get -:

NOT_UPLOADED and NOT_INDEXED

  • The document specified in the request is not uploaded ie. it has never been successfully pushed to Glean.

UPLOADED and NOT_INDEXED

  • The document has been uploaded but not indexed in our elastic search index. The following are the two major reasons for a document being uploaded but not indexed:
    • The document has not been picked up by our processing framework since enough time has not passed since document upload. Document processing usually takes 15-20 minutes, and if the lastUploadedAt timestamp is less than 20 mins from the current time, you can assume that the document will be processed and be visible on Glean UI at some point.
    • There has been an error in processing the document. This is when enough time has passed after the lastUploadedAt timestamp. This is usually due to invalid permission and/or invalid document content. You should contact your Glean representative for further details if you encounter this.

UPLOADED and INDEXED

  • Your document has been successfully uploaded and indexed into our system.
  • If you are still not able to view the document, please make sure that you have the right permissions to the document. You can use the /checkdocumentaccess endpoint to debug permissions (as described below).

UNKNOWN_STATUS

  • This can occur if either SQL or elastic instances are unavailable. It is recommended that you retry the request in some time..
  • Contact your Glean representative if this is a recurring issue.

Check document access

Check if a given user has access to access a document in a custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/checkdocumentaccess \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
    "datasource": "gleantest",
    "objectType": "EngineeringDoc",
    "docId": "eng-doc-1",
    "userEmail": "myuser@bluesky.test"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint

troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
check_document_access_request = CheckDocumentAccessRequest(
    datasource="gleantest",
    object_type="EngineeringDoc",
    doc_id="eng-doc-1",
    user_email="myuser@bluesky.test",
) 

try:
    api_response = troubleshoot_api.checkdocumentaccess_post(check_document_access_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->checkdocumentaccess_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "hasAccess": true
}
  • /checkdocumentaccess endpoint returns true if the user corresponding to the specified email has access to the specified document. Returns false otherwise.
  • If you expect a user to have access to the document, but the API reports otherwise, make sure document permissions are set properly (refer here (add link here) for documentation on the permissions API).
  • If proper permissions were specified during document indexing and the user still doesn't have access to the document, make sure the user was indexed before document indexing.
  • If the above steps are unable to resolve the issue, contact your Glean representative for help.

Get user count

Fetches user count for the specified custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getusercount \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint

troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_user_count_request = GetUserCountRequest(
    datasource="gleantest",
)

try:
    api_response = troubleshoot_api.getusercount_post(get_user_count_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getusercount_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "userCount": 2500
}
  • /getusercount returns the number of uploaded users for the specified datasource. Note that this can be different from the actual number of users who have access to the datasource since some users might not be indexed due to invalid data.