Bulk Indexing
Bulk index documents endpoint is used to index all the documents of a custom
datasource using a series of /bulkindexdocuments
requests with a common
uploadId
. Bulk indexing fully replaces the entire list of documents stored
in Glean. After a successful bulk upload, all documents that were not a part of
the most recent upload are deleted asynchronously.
There are similar bulk indexing endpoints for other objects like users, groups, employees, and teams as well.
Choosing /indexdocuments
vs /bulkindexdocuments
When deciding between /indexdocuments
and /bulkindexdocuments
, it’s
important to understand their primary functions and use cases:
/bulkindexdocuments
: This endpoint is designed for completely refreshing the datasource. It deletes all existing documents and replaces them with the new ones provided. Use this endpoint when you need to replace the existing corpus and upload all documents anew./indexdocuments
: This endpoint is intended for incremental updates. It allows you to add a batch of new documents or update existing ones without affecting the other documents in the index. Choose this option when you want to keep the existing documents intact while adding or updating specific documents.
When to use each endpoint:
- Use
/bulkindexdocuments
:- When you need to perform a full refresh of the datasource.
- When all existing documents need to be replaced with a new set of documents.
- Use
/indexdocuments
:- When you need to add new documents to the existing index.
- When you need to update specific documents while keeping the rest of the index unchanged.
By selecting the appropriate endpoint based on your needs, you can efficiently manage your document indexing process.
Making your first successful request to /bulkindexdocuments
Here is a sample request to the /bulkindexdocuments
endpoint.
Let’s look at the different fields you need to successfully index documents to Glean. Note that this is just a sample request with minimal fields required to index content. For exhaustive list of fields, please refer here.
Next steps
- You can check the status of your document using our debugging/troubleshooting APIs. Please refer here for documentation on how to use these APIs.
- For the indexed document to show up in Glean UI, the datasource must be enabled for search. For now, Glean will need to enable it internally, but in future this will be made available via Glean Admin Console. Once these steps are done, you should be able to search for the indexed document in Glean when logged in as the user having permissions to view the documents.
- Note that it takes around 15-20 minutes for the documents to be indexed and appear on your Glean UI.
Was this page helpful?