Debugging / Troubleshooting using the Indexing API

These endpoints help you get data about the state of your documents and their permissions, metadata about your datasource, etc. which help you debug indexing API related issues..

Get datasource config

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/getdatasourceconfig \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.get_datasource_config_request import GetDatasourceConfigRequest
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
from pprint import pprint
datasource_api = datasources_api.DatasourcesApi(api_client)
get_datasource_config_request = GetDatasourceConfigRequest(
    datasource="gleantest"
) 

try:
    api_response = datasource_api.getdatasourceconfig_post(get_datasource_config_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling DatasourcesApi->getdatasourceconfig_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "name": "CUSTOM_GLEANTEST",
  "displayName": "Glean Test Datasource",
  "homeUrl": "https://www.gleantest.com",
  "objectDefinitions": [
    {
      "name": "EngineeringDoc",
      "propertyDefinitions": [
        {
          "name": "Org",
          "displayLabel": "Organization",
          "propertyType": "TEXT",
          "uiOptions": "SEARCH_RESULT",
          "hideUiFacet": false
        }
      ]
    }
  ],
  "urlRegex": "https://www.gleantest.com/.*",
  "datasourceCategory": "PUBLISHED_CONTENT",
  "isOnPrem": false,
  "isUserReferencedByEmail": true,
  "isEntityDatasource": false,
  "isTestDatasource": false
}
  • /getdatasourceconfig endpoint returns the current state of the datasource config.
  • This can be used to double-check if the config is correct. If you need to make any changes to the config, you can use the /adddatasource endpoint again to override the previous config.

Debug datasource status

Gather information about the datasource's overall status.

Sample request:

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/debug/gleantest/status \
  -H 'Authorization: Bearer <Token>' \
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
try:
    api_response = troubleshoot_api.debug_datasource_status_post("gleantest")
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->debug_datasource_status_post: %s\n" % e)

documents contains information about counts, bulk upload history and processing history for documents in the datasource.

identity contains information about counts and processing history for users, groups, and memberships in the datasource.

Sample response
Copy
Copied
{
  "documents": {
    "bulkUploadHistory": [
      {
        "uploadId": "upload-id-1234567890",
        "startTime": "2024-02-08T12:00:00.000Z",
        "endTime": "2024-02-08T12:05:00.000Z",
        "status": "SUCCESSFUL"
      }
    ],
    "counts": {
      "uploaded": [
        {
          "objectType": "Article",
          "count": 15
        }
      ],
      "indexed": [
        {
          "objectType": "Article",
          "count": 15
        }
      ]
    },
    "processingHistory": [
      {
        "startTime": "2024-02-08T12:00:00.000Z",
        "endTime": "2024-02-08T12:00:05.000Z"
      }
    ]
  },
  "identity": {
    "processingHistory": [
      {
        "startTime": "2024-02-08T12:00:00.000Z",
        "endTime": "2024-02-08T12:05:00.000Z"
      }
    ],
    "users": {
      "bulkUploadHistory": [
        {
          "uploadId": "upload-users-1234567890",
          "startTime": "2024-02-08T12:00:00.000Z",
          "endTime": "2024-02-08T12:05:00.000Z",
          "status": "SUCCESSFUL"
        }
      ],
      "counts": {
        "uploaded": 5
      }
    },
    "groups": {
      "bulkUploadHistory": [],
      "counts": {
        "uploaded": 3
      }
    },
    "memberships": {
      "bulkUploadHistory": [],
      "counts": {
        "uploaded": 2
      }
    }
  }
}

Debug datasource document

Gives various information that would help in debugging issues related to a particular document. For example, it gives information about the document's upload and indexing status, the document's permissions, etc.

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/debug/{datasource}/document
  -H 'Authorization : Bearer <Token>'
  -d '{
    "objectType": "Article",
    "docId": "art123"
  }'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.debug_document_request import DebugDocumentRequest
from glean_indexing_api_client.model.debug_document_response import DebugDocumentResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
debug_document_request = DebugDocumentRequest(
    datasource="gleantest",
    object_type="Article",
    doc_id="art123"
)
try:
    api_response = troubleshoot_api.debug_datasource_document_post(debug_document_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->debug_datasource_document_post: %s\n" % e)
  • status contains information about the document's upload and indexing status.
  • uploadedPermissions - Exact permissions that were uploaded for the document. This is useful to debug if the document is not visible to the user due to incorrect uploaded permissions. Schema is identical to the permissions schema for a document.
Sample response
Copy
Copied
{
  "status": {
    "uploadStatus": "UPLOADED",
    "lastUploadedAt": "2024-02-08T12:00:00.000Z",
    "indexingStatus": "INDEXED",
    "lastIndexedAt": "2024-02-08T12:00:00.000Z"
  },
  "uploadedPermissions": {
    "allowedUsers": [
      {
        "email": "u1@gleantest.com",
        "name": "User 1"
      },
      {
        "email": "u2@gleantest.com",
        "name": "User 2"
      }
    ]
  }
}

Interpreting the status field

NOT_UPLOADED and NOT_INDEXED

The document specified in the request is not uploaded ie. it has never been successfully pushed to Glean. Please check your upload code and make sure that the document has been successfully uploaded and make sure that there are no non-200 responses from the API while uploading documents.

UPLOADED and NOT_INDEXED

The document has been uploaded but not indexed in our search index. This indicates that the document has not been processed by Glean yet.

How to proceed?

To monitor processing history for documents, you can use the /debug/{datasource}/status endpoint as described above.
Note: Document processing for small-sized datasources may be fast-tracked by Glean.

warning

In rare cases, if the processing history indicates that the document should have been processed, it is possible that there was an error in processing the document. You should contact your Glean representative for further details if you encounter this.

UPLOADED and INDEXED

  • Your document has been successfully uploaded and indexed into our system.
  • If a user is still not able to view the document, please make sure that you have the right permissions to the document.
How to proceed?
  • Make sure permissions are set accurately ( refer here to know more about configuring permissions).
  • /checkdocumentaccess to validate document access for a user, as described below .
  • Look at the uploadedPermissions field to validate expected permissions for the document.
  • /debug/{datasource}/status to check for identity processing history for the datasource, as described above .
  • /debug/{datasource}/user endpoint to debug permissions for a given user, as described below .

UNKNOWN_STATUS

  • This can occur if either Glean is transiently, partially unavailable. It is recommended that you retry the request in some time.
  • Contact your Glean representative if this is a recurring issue.

Debug datasource user

Gives various information that would help in debugging issues related to a particular user. For example, it gives information about the user's permissions, groups, memberships, etc.

cURLpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/debug/gleantest/user
  -H 'Authorization : Bearer <Token>'
  -d '{
    "email": "u1@gleantest.com"
  }'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.debug_user_request import DebugUserRequest
from glean_indexing_api_client.model.debug_user_response import DebugUserResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
debug_user_request = DebugUserRequest(
    datasource="gleantest",
    email="u1@gleantest.com")
try:
    api_response = troubleshoot_api.debug_datasource_user_post(debug_user_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->debug_datasource_user_post: %s\n" % e)
Sample response
Copy
Copied
{
  "status": {
    "isActiveUser": true,
    "uploadStatus": "UPLOADED",
    "lastUploadedAt": "2024-02-08T12:00:00.000Z",
  },
  "uploadedGroups": [
    {
      "name": "group1"
    }
  ]
}
  • status contains information about the user's upload status and whether the user is active or not.
  • uploadedGroups contains information about the groups for which the user was uploaded as a member of.

Check document access

Check if a given user has access to access a document in a custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/checkdocumentaccess \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
    "datasource": "gleantest",
    "objectType": "EngineeringDoc",
    "docId": "eng-doc-1",
    "userEmail": "myuser@bluesky.test"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint

troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
check_document_access_request = CheckDocumentAccessRequest(
    datasource="gleantest",
    object_type="EngineeringDoc",
    doc_id="eng-doc-1",
    user_email="myuser@bluesky.test",
) 

try:
    api_response = troubleshoot_api.checkdocumentaccess_post(check_document_access_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->checkdocumentaccess_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "hasAccess": true
}
  • /checkdocumentaccess endpoint returns true if the user corresponding to the specified email has access to the specified document. Returns false otherwise.
How to proceed?
  • Make sure permissions are set accurately ( refer here to know more about configuring permissions).
  • /debug/{datasource}/document to validate expected permissions for the document, as described above .
  • /debug/{datasource}/status to check for identity processing history for the datasource, as described above .
  • /debug/{datasource}/user endpoint to debug permissions for a given user, as described below .
  • If proper permissions were specified during document indexing and the user still doesn't have access to the document, make sure the user was indexed before document indexing.
warning

If you still feel that the user should have access to the document but is not able to view it, please contact your Glean representative for further details.

Get document count

attention

Make use of the /debug/{datasource}/status endpoint to get richer information about the documents in the datasource.

Fetches document count for the specified custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getdocumentcount \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_count_request = GetDocumentCountRequest(
    datasource="gleantest"
)

try:
    api_response = troubleshoot_api.getdocumentcount_post(get_document_count_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getdocumentcount_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "documentCount": 100000
}
  • Note that /getdocumentcount returns the number of uploaded documents. These may or may not be equal to the number of documents visible to you on Glean UI. The documents are processed asynchronously to populate proper permissions and content before they become visible to you on Glean UI. Any error in processing like invalid permissions, etc. renders the document not visible on Glean UI
  • To see if a particular document has been indexed or not, use the /getdocumentstatus endpoint as described below.

Get document upload and indexing status

attention

Make use of the /debug/{datasource}/document endpoint to get richer information about the document.

Fetches the current upload and indexing status of documents.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getdocumentstatus \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest",
  "objectType": "EngineeringDoc",
  "docId": "eng-doc-1"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_status_request = GetDocumentStatusRequest(
    datasource="gleantest",
    object_type="EngineeringDoc",
    doc_id="eng-doc-1",
)

try:
    api_response = troubleshoot_api.getdocumentstatus_post(get_document_status_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getdocumentstatus_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "uploadStatus": "UPLOADED",
  "lastUploadedAt": 1663882181,
  "indexingStatus": "INDEXED",
  "lastIndexedAt": 1664834539
}

The following are the possible responses you can get -:

NOT_UPLOADED and NOT_INDEXED

  • The document specified in the request is not uploaded ie. it has never been successfully pushed to Glean.

UPLOADED and NOT_INDEXED

  • The document has been uploaded but not indexed in our elastic search index. The following are the two major reasons for a document being uploaded but not indexed:
    • The document has not been picked up by our processing framework since enough time has not passed since document upload. Document processing usually takes 15-20 minutes, and if the lastUploadedAt timestamp is less than 20 mins from the current time, you can assume that the document will be processed and be visible on Glean UI at some point.
    • There has been an error in processing the document. This is when enough time has passed after the lastUploadedAt timestamp. This is usually due to invalid permission and/or invalid document content. You should contact your Glean representative for further details if you encounter this.

UPLOADED and INDEXED

  • Your document has been successfully uploaded and indexed into our system.
  • If you are still not able to view the document, please make sure that you have the right permissions to the document. You can use the /checkdocumentaccess endpoint to debug permissions (as described below).

UNKNOWN_STATUS

  • This can occur if either SQL or elastic instances are unavailable. It is recommended that you retry the request in some time..
  • Contact your Glean representative if this is a recurring issue.

Get user count

attention

Make use of the /debug/{datasource}/status endpoint to get richer information about the users in the datasource.

Fetches user count for the specified custom datasource.

cURLpython
Copy
Copied
curl -X POST   https://customer-be.glean.com/api/index/v1/getusercount \
  -H 'Authorization: Bearer <Token>' \
  -d '
{
  "datasource": "gleantest"
}'
Copy
Copied
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint

troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_user_count_request = GetUserCountRequest(
    datasource="gleantest",
)

try:
    api_response = troubleshoot_api.getusercount_post(get_user_count_request)
    pprint(api_response)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling TroubleshootingApi->getusercount_post: %s\n" % e)

Sample response :

Copy
Copied
{
  "userCount": 2500
}
  • /getusercount returns the number of uploaded users for the specified datasource. Note that this can be different from the actual number of users who have access to the datasource since some users might not be indexed due to invalid data.