Search API: Filters walkthrough

Filters are a powerful tool to narrow down your search results. We support filters that are generalized to every document, as well as filters that are specific to different datasources.

This guide will focus on our general filters and how to utilize them.

General Filters- How to use

In order to filter search results to show documents that match certain fields via the /search REST API, you need to construct a list of facetFilter objects and pass it into the requestOptions object.

Each facetFilter object has the following relevant fields:

  1. fieldName - the name of the field we are filtering by (eg “from” to facet by user, “type” for document type, etc). fieldName should be unique in the list of facetFilter objects.
  2. values - a list of facetFilterValue objects. All values are OR’d between the same field name (we AND between different field names).
  3. groupName - not relevant, don’t use.

A facetFilterValue object has the following relevant fields:

  1. value - string value that results are being filtered to.
  2. relationType - can take on the values below:
    1. “LT” - Less than.
    2. “GT” - Greater than.
    3. “EQUALS” - default value.
      “LT” and “GT” can only be used for time filters (see examples below). Every other filter should use “EQUALS”.
  3. isNegated - not supported, don’t use.

Basic Example

To filter to only the document type pdf search results, you would send the following in the facetFilter field. This is the equivalent of adding “type:pdf” to your search query.

Copy
Copied
[
  {
    "fieldName": "type",
    "values": [
      {
        "relationType": "EQUALS",
        "value": "pdf"
      }
    ]
  }
]

Universal Field Names

Topbar facet field names

Copy
Copied
last_updated_at:
from:
my:history
collection:
has:golink
type:

Entity field names

Copy
Copied
businessunit:
city:
country:
industry:
location:
region:
roletype:
startafter:
startbefore:
state:
title:
reportsto:

Exceptions to the basic example

Time filters

Time filters are the only exception to the rule. The fieldName is always “lastupdatedat”, and we use different relationTypes to specify different time ranges.

We support 2 types of values: specific dates and special values.

Specific dates

Use the “GT” and “LT” relationTypes to specify a date range. The ranges can also be open-ended (only include a GT or an LT). Each date value should be in the form YYYY-MM-DD passed in as a string. Note that when using GT and LT, the values are noninclusive (eg using {relationType=”GT”, value=”2023-06-17”} will include dates from 2023-06-18 and later).

All dates provided will begin with the “start of the day” (12:00 am). Dates will end at the end of the day (11:59:59 pm).

Closed date range example for filtering to documents from dates 6/16, 6/17, 6/18, 6/19:

Copy
Copied
[
   {
      "fieldName":"last_updated_at",
      "values":[
         {
            "relationType":"GT",
            "value":"2023-06-15"
         },
         {
            "relationType":"LT",
            "value":"2023-06-20"
         }
      ]
   }
]

Open date range example for filtering to documents from dates 6/11 onwards:

Copy
Copied
[
   {
      "fieldName":"last_updated_at",
      "values":[
         {
            "relationType":"GT",
            "value":"2023-06-10"
         },
      ]
   }
]

Special Values

For special values, we allow the values past_day, past_week, past_month, yesterday, today, past_n_days, past_n_weeks, past_n_months, past_n_yearsfor the relation type EQUALS, where n is a number, ie 5 in past_5_days. For all past_ prefixed values, we also support the last_ prefix, they mean the same thing (ie last_week is a viable substitute for past_week).

We allow the values past_day, past_week, past_month, yesterday,todayfor the relation type LT.

We allow the values yesterdayfor the relation type GT.

If you pass in an invalid special value, you will get an 422 error letting you know you have an invalid operator. Invalid special values for time filters are the only case in which a 422 error is returned.

If you are used to using operators and values in the query string, here are some examples of translations of query string value to REST API value.

Sample:

updated:today becomes

Copy
Copied
[{fieldName: "last_updated_at", values: [{relationType: "EQUALS", value: "today"}]}]

before:past_week becomes

Copy
Copied
[{fieldName: "last_updated_at", values: [{relationType: "LT", value: "past_week"}]}]

after:yesterday becomes

Copy
Copied
[{fieldName: "last_updated_at", values: [{relationType: "GT", value: "yesterday"}]}]

Timezone considerations

We factor in user’s timezone for all queries except when you use the value past_week, past_year, past_month and past_day with the relationType EQUALS. (these are handled by built-in elastic date aggregation that doesn’t account for timezone).

History filter

my: facet will only ever have the value “history”. It filters to show only documents the user has viewed before. The object always looks like this:

Copy
Copied
{fieldName: "suggested", groupName: "", values: [{relationType: "EQUALS", value: "my history"}]}

From filter (or any user filter):

If you would like to specify a specific user, for example, if there are 2 people with the same name, “User one”, you can specify which one you mean by using the email address they authenticated with Glean as the value . IE user@glean.com and userone@glean.com would facet by different “User One”s even if they have the same name.

Sample query:

Copy
Copied
from:"User one" updated:today type:document
requestOptions.facetFilters: 
[{ "fieldName": "from",
    "values": [ {
        "relationType": "EQUALS",
        "value": "userone@glean.com"
      }] },
  {
    "fieldName": "last_updated_at",
    "values": [{
        "relationType": "EQUALS",
        "value": "today"
      }] },
  {
    "fieldName": "type",
    "values": [{
        "relationType": "EQUALS",
        "value": "document"
      }]}
]

Datasource-specific filters

Apart from the general filters we’ve discussed, some filters are specific to one datasource- for example, Confluence has the author facet, and Slack has the channel facet. There are also custom facets that are defined for custom datasources pushed via api.

To uncover these datasource-specific facets, you can use our Glean UI to filter by the datasource you’re curious about. You will be able to find a list of facets for your search results, and facet values on the sidebar of the Glean UI (see image below).

Datasource specific facets

Getting possible facets via Search API

If you would like to curl to get the facets, you can use a /search request to get the values that we use to populate the sidebar with a request like this:

Copy
Copied
{
"query":"test", 
"pageSize":10, 
"requestOptions":
{
"facetBucketSize" : 3000,
"facetFilters": [
    {
        "fieldName": "app",
        "values": [
            {
                "value": "confluence",
                "relationType": "EQUALS"
            }
        ]
    },
      ]
    }
}

This will return a top-level field, facetResults facetResults has the following relevant fields:

  1. sourceName - same as the facet fieldName in the facetFilter s object
  2. operatorName - not relevant
  3. buckets - a list of facet bucket objects corresponding to a facet value
    1. The facet bucket object has the following relevant fields:
      1. count - the number of search results that would be returned if filtering by the facet value
      2. value - the facet value (ie “engineering” for the space facet)
        1. stringValue - the string value
        2. intValue - the integer value (not common)
        3. displayLabel - alternative value used for display in the UI
        4. iconConfig - optional image used to represent the facet value, such as a profile picture for people facet values.
Copy
Copied
{
    `"sourceName":` `"space",`

    `"operatorName":` `"SelectMultiple",`

    `"buckets":` `[`

      `{`

        `"count":` `3,`

        `"value":` `{`

          `"stringValue":` `"engineering"`

        `}`

      `}`
  ]
}

To get all of the facets you can use with a particular datasource, you can look at all of the sourceNames in the facetsResults returned.

If latency is a concern, and you only want to receive facetResults, you can send a request with pageSize = 0, and searchInitiator = FACETS to not retrieve any documents and only retrieve facetResults. See sample request body below, which would gather facetResults for all confluence specific facets:

Copy
Copied
{
"query":"test", 
"pageSize":0, 
"requestOptions":
{
"facetBucketSize":3000,
"facetFilters": [
    {
        "fieldName": "app",
        "values": [
            {
                "value": "confluence",
                "relationType": "EQUALS"
            }
        ]
    },
      ]
    },
    "sourceInfo":
    {
        "initiator": "FACETS",
    }
}