Custom search with ElasticPress: How to limit results to full-text matches

by

If you’re finding that your ElasticPress search results are returning too many posts with low relevance, you might want to consider some customization of the results. By default, ElasticPress will first return matches for all words and then add on results that only matched a subset of the search term (down to even a single word). In this post, we will address how to customize search in ElasticPress to look only for posts that match all searched words.

The ep_formatted_args filter

Under the hood, ElasticPress integrates with the WP_Query object and translates a normal WordPress search request to one that Elasticsearch can understand (in JSON format).

In other words, when you search with ElasticPress, your requests don’t look like this:

$query = new WP_Query( array( 'author' => 123 ) );

But more like this:

{
    "query" : {
        "term" : { "author" : "123" }
    }
}

Fortunately, ElasticPress’ job is to translate this for you (and many other things!), so you can focus on what matters to you.

However, one size does not fit all, and foreseeing this problem, ElasticPress has built a handful of actions and filters to customize your search to pretty much the extent you want! In this case, the filter that matters to us is the ep_formatted_args filter.

The ep_formatted_args filter hooks in exactly when ElasticPress is translating a WordPress Query into one that Elasticsearch understands.

Tweaking ElasticPress’ default query

Let’s start by checking how the query would be done in Elasticsearch JSON, referring to this link: https://www.elastic.co/guide/en/elasticsearch/guide/current/match-multi-word.html

{
    "query": {
        "match": {
            "title": {      
                "query":    "BROWN DOG!",
                "operator": "and"
            }
        }
    }
}

We can see that the important part is incorporating the operator “AND.”

Let’s take a peek at the default query ElasticPress does when searching for terms out of the box:

"query": {
    "function_score": {
        "query": {
            "bool": {
                "should": [
                        {
                            "multi_match": {
                            "query": "brown dog",
                                "type": "phrase",
                                "fields": [
                                "post_title",
                                "post_content",
                                "post_excerpt",
                                "terms.post_tag.name",
                                "terms.category.name",
                                "post_author.login"
                            ],
                                "boost": 4
                            }
                        },

We see a couple of issues here. First, the query is in boosting mode, meaning that it is adding all partial results rather than limiting to exact results, so we have to change the query type from should to must.

Next, we can see the query is looking for a phrase. This can be a problem if the words we are looking for are not concatenated one after another. Remember, we’re looking for all posts that contain “brown” and “dog,” not necessarily just posts where “dog” follows “brown.”

Lastly, the operator “AND” is missing in this query, which gives us the “full-text match” effect.

Back to the ep_formatted_args filter

Now that we know what we need to tweak in the final query, we have to go back to the point before the query was translated into a JSON object

If we analyze what the $formatted_args argument looks like at this point, we will notice that it is an array:

So, to inject the tweaks we want to make, we have to edit this array. Let’s dive into the solution and comment what are we doing!

function set_to_exact( $formatted_args, $args ) {
    if (
        ! empty( $formatted_args['query']['bool']['should'] ) &&
        is_array( $formatted_args['query']['bool']['should'] )
    ) {
        foreach ( $formatted_args['query']['bool']['should'] as $key => $value ) {
            if (
                ! empty( $value['multi_match']['fuzziness'] ) &&
                'auto' === $value['multi_match']['fuzziness']
            ) {
                $formatted_args['query']['bool']['should'][ $key ]['multi_match']['operator'] = 'and';
            }
        }
    }
    return $formatted_args;
}
add_filter( 'ep_formatted_args', 'set_to_exact', 10, 2 );

When you search again, you should see the functionality you want in action!

Conclusion

The ep_formatted_args filter is a very powerful hook in ElasticPress, and to take advantage of its power the best thing to do is to have a great understanding of how search is done through JSON in Elasticsearch. After you know what are you trying to achieve, it is extremely easy to apply the customizations you need.

Happy searching!

Is there another search case you would like us to address? Reach out to us in the GitHub repository: https://github.com/10up/ElasticPress/issues or send us a Tweet.

Looking for professionals who know ElasticPress? Reach out to one of our specialists and we’ll be in touch right away!