Debugging / Troubleshooting using the Indexing API
These endpoints help you get data about the state of your documents and their permissions, metadata about your datasource, etc. which help you debug indexing API related issues..
Get datasource config
cURLpython
curl -X POST https://customer-be.glean.com/api/index/v1/getdatasourceconfig \
-H 'Authorization: Bearer <Token>' \
-d '
{
"datasource": "gleantest"
}'
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.get_datasource_config_request import GetDatasourceConfigRequest
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
from pprint import pprint
datasource_api = datasources_api.DatasourcesApi(api_client)
get_datasource_config_request = GetDatasourceConfigRequest(
datasource="gleantest"
)
try:
api_response = datasource_api.getdatasourceconfig_post(get_datasource_config_request)
pprint(api_response)
except glean_indexing_api_client.ApiException as e:
print("Exception when calling DatasourcesApi->getdatasourceconfig_post: %s\n" % e)
Sample response :
{
"name": "CUSTOM_GLEANTEST",
"displayName": "Glean Test Datasource",
"homeUrl": "https://www.gleantest.com",
"objectDefinitions": [
{
"name": "EngineeringDoc",
"propertyDefinitions": [
{
"name": "Org",
"displayLabel": "Organization",
"propertyType": "TEXT",
"uiOptions": "SEARCH_RESULT",
"hideUiFacet": false
}
]
}
],
"urlRegex": "https://www.gleantest.com/.*",
"datasourceCategory": "PUBLISHED_CONTENT",
"isOnPrem": false,
"isUserReferencedByEmail": true,
"isEntityDatasource": false,
"isTestDatasource": false
}
-
/getdatasourceconfig
endpoint returns the current state of the datasource config. -
This can be used to double-check if the config is correct. If you need to make any changes to the config, you can use the
/adddatasource
endpoint again to override the previous config.
Get document count
Fetches document count for the specified custom datasource.
cURLpython
curl -X POST https://customer-be.glean.com/api/index/v1/getdocumentcount \
-H 'Authorization: Bearer <Token>' \
-d '
{
"datasource": "gleantest"
}'
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_count_request = GetDocumentCountRequest(
datasource="gleantest"
)
try:
api_response = troubleshoot_api.getdocumentcount_post(get_document_count_request)
pprint(api_response)
except glean_indexing_api_client.ApiException as e:
print("Exception when calling TroubleshootingApi->getdocumentcount_post: %s\n" % e)
Sample response :
{
"documentCount": 100000
}
-
Note that
/getdocumentcount
returns the number of uploaded documents. These may or may not be equal to the number of documents visible to you on Glean UI. The documents are processed asynchronously to populate proper permissions and content before they become visible to you on Glean UI. Any error in processing like invalid permissions, etc. renders the document not visible on Glean UI -
To see if a particular document has been indexed or not, use the
/getdocumentstatus
endpoint as described below.
Get document upload and indexing status
Fetches the current upload and indexing status of documents.
cURLpython
curl -X POST https://customer-be.glean.com/api/index/v1/getdocumentstatus \
-H 'Authorization: Basic <Token>' \
-d '
{
"datasource": "gleantest",
"objectType": "EngineeringDoc",
"docId": "eng-doc-1"
}'
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.get_document_count_request import GetDocumentCountRequest
from glean_indexing_api_client.model.get_document_count_response import GetDocumentCountResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_document_status_request = GetDocumentStatusRequest(
datasource="gleantest",
object_type="EngineeringDoc",
doc_id="eng-doc-1",
)
try:
api_response = troubleshoot_api.getdocumentstatus_post(get_document_status_request)
pprint(api_response)
except glean_indexing_api_client.ApiException as e:
print("Exception when calling TroubleshootingApi->getdocumentstatus_post: %s\n" % e)
Sample response :
{
"uploadStatus": "UPLOADED",
"lastUploadedAt": 1663882181,
"indexingStatus": "INDEXED",
"lastIndexedAt": 1664834539
}
The following are the possible responses you can get -:
NOT_UPLOADED and NOT_INDEXED
- The document specified in the request is not uploaded ie. it has never been successfully pushed to Glean.
UPLOADED and NOT_INDEXED
-
The document has been uploaded but not indexed in our elastic search index. The following are the two major reasons for a document being uploaded but not indexed:
- The document has not been picked up by our processing framework since enough time has not passed since document upload. Document processing usually takes 15-20 minutes, and if the lastUploadedAt timestamp is less than 20 mins from the current time, you can assume that the document will be processed and be visible on Glean UI at some point.
- There has been an error in processing the document. This is when enough time has passed after the lastUploadedAt timestamp. This is usually due to invalid permission and/or invalid document content. You should contact your Glean representative for further details if you encounter this.
UPLOADED and INDEXED
- Your document has been successfully uploaded and indexed into our system.
-
If you are still not able to view the document, please make sure that you have the right permissions to the document. You can use the
/checkdocumentaccess
endpoint to debug permissions (as described below).
UNKNOWN_STATUS
- This can occur if either SQL or elastic instances are unavailable. It is recommended that you retry the request in some time..
- Contact your Glean representative if this is a recurring issue.
Check document access
Check if a given user has access to access a document in a custom datasource.
cURLpython
curl -X POST https://customer-be.glean.com/api/index/v1/checkdocumentaccess \
-H 'Authorization: Bearer <Token>' \
-d '
{
"datasource": "gleantest",
"objectType": "EngineeringDoc",
"docId": "eng-doc-1",
"userEmail": "myuser@bluesky.test"
}'
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
check_document_access_request = CheckDocumentAccessRequest(
datasource="gleantest",
object_type="EngineeringDoc",
doc_id="eng-doc-1",
user_email="myuser@bluesky.test",
)
try:
api_response = troubleshoot_api.checkdocumentaccess_post(check_document_access_request)
pprint(api_response)
except glean_indexing_api_client.ApiException as e:
print("Exception when calling TroubleshootingApi->checkdocumentaccess_post: %s\n" % e)
Sample response :
{
"hasAccess": true
}
-
/checkdocumentaccess
endpoint returns true if the user corresponding to the specified email has access to the specified document. Returns false otherwise. - If you expect a user to have access to the document, but the API reports otherwise, make sure document permissions are set properly (refer here (add link here) for documentation on the permissions API).
- If proper permissions were specified during document indexing and the user still doesn't have access to the document, make sure the user was indexed before document indexing.
- If the above steps are unable to resolve the issue, contact your Glean representative for help.
Get user count
Fetches user count for the specified custom datasource.
cURLpython
curl -X POST https://customer-be.glean.com/api/index/v1/getusercount \
-H 'Authorization: Bearer <Token>' \
-d '
{
"datasource": "gleantest"
}'
from glean_indexing_api_client.api import troubleshooting_api
from glean_indexing_api_client.model.check_document_access_request import CheckDocumentAccessRequest
from glean_indexing_api_client.model.check_document_access_response import CheckDocumentAccessResponse
from pprint import pprint
troubleshoot_api = troubleshooting_api.TroubleshootingApi(api_client)
get_user_count_request = GetUserCountRequest(
datasource="gleantest",
)
try:
api_response = troubleshoot_api.getusercount_post(get_user_count_request)
pprint(api_response)
except glean_indexing_api_client.ApiException as e:
print("Exception when calling TroubleshootingApi->getusercount_post: %s\n" % e)
Sample response :
{
"userCount": 2500
}
-
/getusercount
returns the number of uploaded users for the specified datasource. Note that this can be different from the actual number of users who have access to the datasource since some users might not be indexed due to invalid data.