Setting a test datasource

Before going live with a production datasource you might want to set up a test/staging datasource. You can use a test datasource to experiment with the indexing API, or to stage your changes. This test datasource lives in the production environment but doesn’t affect it. You can do this by ensuring that:

  1. All ranking signals are turned off from a test datasource. This ensures that the production is not affected by a test datasource.
  2. The test datasource is only visible to select users so that a developer can check results and permissions easily.

Setting up a test datasource is quite easy, you have to go through almost the same steps as you would go through setting up a normal datasource, but there are some differences. The steps are:

1. Setting isTestDatasource to true

When setting up the datasource using /adddatasource(link), remember to set the isTestDatasource field to true.

curlpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/adddatasource \
...
	"isTestDatasource" : "true”
...
}
Copy
Copied
from glean_indexing_api_client.api import datasources_api
from glean_indexing_api_client.model.custom_datasource_config import CustomDatasourceConfig
datasource_api = datasources_api.DatasourcesApi(api_client)

datasource_config = CustomDatasourceConfig(
  ...,  # other fields
  is_test_datasource=True
)
try:
  datasource_api.adddatasource_post(datasource_config)
except indexing_api.ApiException as e:
  print('Exception when calling DatasourcesApi->adddatasource_post: %s\\n' % e)

Glean turns off ranking signals from a test datasource. For example, Glean uses document references(link to a document) as one of its ranking signals. A document that is referenced more in other documents will be given more importance then a less referenced one. Glean stops all such signals for a test datasource so that it does not pollute production rankings.

2. Using /betausers endpoint to give select visibilty to users.

At this stage your test datasource is not affecting the production and it is neither visible to anyone. Now you would like to give selective visibility to it. This can be done using the /betausers(link) endpoint. This endpoint allows you to control the people who can have visibility to the test datasource. An example command would look like :

curlpython
Copy
Copied
curl -X POST https://customer-be.glean.com/api/index/v1/betausers \
-H 'Authorization: Bearer <Token>' \
-d '
{
    "datasource": “gleantest”,
    "emails": [
        "user1@example.com",
        "user2@example.com"
    ]
}
Copy
Copied
from glean_indexing_api_client.api import permissions_api
from glean_indexing_api_client.model.greenlist_users_request import GreenlistUsersRequest
permission_api = permissions_api.PermissionsApi(api_client)
greenlist_users_request = GreenlistUsersRequest(
    datasource=“gleantest”,
    emails=[
        "user1@example.com",
        "user2@example.com"
    ]
) 

try:
    permission_api.betausers_post(greenlist_users_request)
except glean_indexing_api_client.ApiException as e:
    print("Exception when calling PermissionsApi->betausers_post: %s\n" % e)

This command would give visibility of the test datasource to user1 and user2. This would allow you to test your custom datasource among a small number of users.

Note: For a document to be visible to beta users, they must also have appropriate permissions to view it. For more details about setting permissions, follow the Setting Permissions tutorial.