Understanding the default ElasticPress/Elasticsearch search query

For both the Post search and Autosuggest Features, ElasticPress performs the same default Elasticsearch query. This query can be filtered by changing the settings in the ElasticPress -> Search Fields & Weighting menu in the WordPress Dashboard or by using the numerous hooks and filters available as the query is built.

Breaking Down the Default Query

The default query generated by ElasticPress in the event that the plugin is installed on a generic WordPress site with Posts and Pages available for search, and without any changes made via filter or in the Weighting Engine, looks like this:

JSON
{
	"from": 0,
	"size": 10,
	"post_filter": {
		"bool": {
			"must": [
				{
					"terms": {
						"post_type.raw": [
							"post",
							"page"
						]
					}
				},
				{
					"terms": {
						"post_status": [
							"publish"
						]
					}
				},
				{
					"bool": {
						"must_not": [
							{
								"terms": {
									"meta.ep_exclude_from_search.raw": [
										"1"
									]
								}
							}
						]
					}
				}
			]
		}
	},
	"query": {
		"function_score": {
			"query": {
				"bool": {
					"should": [
						{
							"bool": {
								"must": [
									{
										"bool": {
											"should": [
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "phrase",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"terms.ep_custom_result.name^9999"
														],
														"boost": 3
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"post_title.suggest^1",
															"term_suggest^1"
														],
														"operator": "and",
														"boost": 1,
														"fuzziness": "auto"
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "cross_fields",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"terms.ep_custom_result.name^1"
														],
														"boost": 1,
														"analyzer": "standard",
														"tie_breaker": 0.5,
														"operator": "and"
													}
												}
											]
										}
									}
								],
								"filter": [
									{
										"match": {
											"post_type.raw": "post"
										}
									}
								]
							}
						},
						{
							"bool": {
								"must": [
									{
										"bool": {
											"should": [
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "phrase",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.ep_custom_result.name^9999"
														],
														"boost": 3
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"post_title.suggest^1"
														],
														"operator": "and",
														"boost": 1,
														"fuzziness": "auto"
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "cross_fields",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.ep_custom_result.name^1"
														],
														"boost": 1,
														"analyzer": "standard",
														"tie_breaker": 0.5,
														"operator": "and"
													}
												}
											]
										}
									}
								],
								"filter": [
									{
										"match": {
											"post_type.raw": "page"
										}
									}
								]
							}
						}
					]
				}
			},
			"functions": [
				{
					"exp": {
						"post_date_gmt": {
							"scale": "14d",
							"decay": 0.25,
							"offset": "7d"
						}
					}
				},
				{
					"weight": 0.001
				}
			],
			"score_mode": "sum",
			"boost_mode": "multiply"
		}
	},
	"sort": [
		{
			"_score": {
				"order": "desc"
			}
		}
	]
}

Query Paging

To begin with, we have some parameters that specify the number of records to return along with the offset. In other words, basic pagination:

JSON
{
	...
	"from": 0, //This is the offset, used for paging results.
	"size": 10, //The total number of results to return
	...
}

Post Filter

The post_filter section applies all the restrictions we need for the query. These constraints could be applied in the following section as well (query) but because the Filters feature also applies the same filters to build its aggregations, we use this way instead.

In the default query we apply three different filters: a post_type list, a post_status list, and the ep_exclude_from_search metadata, the one related to the Exclude from search results checkbox.

JSON
{
	...
	"post_filter": {
		"bool": {
			"must": [
				{
					"terms": {
						"post_type.raw": [
							"post",
							"page"
						]
					}
				},
				{
					"terms": {
						"post_status": [
							"publish"
						]
					}
				},
				{
					"bool": {
						"must_not": [
							{
								"terms": {
									"meta.ep_exclude_from_search.raw": [
										"1"
									]
								}
							}
						]
					}
				}
			]
		}
	},
	...
}

After this section, you’ll see the main query portion–as you might suspect, this is where the magic happens! You’ll see a series of nested bool statements that can seem quite confusing, but there’s a good reason they’re there. If you look closely, you’ll see in the query section we actually have two almost identical queries within the should clause.

Per-Post Type Queries

The only difference between these two query clauses is the post type filter–the first query handles Posts, and the second query is for Pages. Having two separate queries might seem a bit redundant, but if you end up applying separate weighting for each post type, it can be really useful! Here’s the filter from the first sub-query (for Posts):

JSON
{
	...
	"query": {
		"function_score": {
			"query": {
				"bool": {
					"should": [
						{
							"bool": {
								...
								"filter": [
									{
										"match": {
											"post_type.raw": "post"
										}
									}
								]
							}
						},
						...
					]
				}
			},
			...
		}
	},
	...
}

Optional date decay function (weight by date)

Within the Posts Feature, ElasticPress offers the option to weight results by date. If this option is selected, the following function is applied to all result scores:

JSON
{
	...
	"query": {
		"function_score": {
			...
			"functions": [
				{
					"exp": {
						"post_date_gmt": {
							"scale": "14d",
							"decay": 0.25,
							"offset": "7d"
						}
					}
				},
				{
					"weight": 0.001
				}
			],
			"score_mode": "sum",
			"boost_mode": "multiply"
		}
	},
	...
}

This function reduces the scores of posts that are more than a week old, with the score reduction maxing out at 14 days. This does not exclude posts older than 14 days, but it significantly decreases their scores to make them less relevant, and therefore likely to appear lower in search results. If your site is news-based or otherwise favors new content, you will likely want to leave this feature enabled. If you have more evergreen content, however, you should disable the feature to remove date-based weighting.

If you want to adjust any of those values, use one or more of the epwr_boost_mode, epwr_offset, epwr_scale, and epwr_decay filters.

Finding Our Results: The multi_match Query

This section ensures that all the weighting and configuration in the must query (nested in the top-level should query) applies only to the Post post type. While it’s identical (under default configuration settings) to the Page post type query, this could always be different once weightings are applied to each post type.

JSON
{
	...
	"query": {
		"function_score": {
			"query": {
				"bool": {
					"should": [
						{
							"bool": {
								"must": [
									{
										"bool": {
											"should": [
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "phrase",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"terms.ep_custom_result.name^9999"
														],
														"boost": 3
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"post_title.suggest^1",
															"term_suggest^1"
														],
														"operator": "and",
														"boost": 1,
														"fuzziness": "auto"
													}
												},
												{
													"multi_match": {
														"query": "elasticpress for wordpress search",
														"type": "cross_fields",
														"fields": [
															"post_title^1",
															"post_excerpt^1",
															"post_content^1",
															"post_author.display_name^1",
															"terms.post_tag.name^1",
															"terms.category.name^1",
															"terms.ep_custom_result.name^1"
														],
														"boost": 1,
														"analyzer": "standard",
														"tie_breaker": 0.5,
														"operator": "and"
													}
												}
											]
										}
									}
								],
								...
							}
						},
						...
					]
				}
			},
			...
		}
	},
	...
}

In this part of the query, you can see we perform three separate multi_match queries. Multi_match is a feature of Elasticsearch that allows you to search through a set of fields, which by default in ElasticPress means the Post’s title, excerpt, content, author’s name, post tags and categories, and if set, custom search results.

For the first clause, the search type is phrase, which means Elasticsearch will look for the entire phrase as typed out in the search query, in this case “elasticpress for wordpress search.”

Finally, we see a boost value of 3, which means that any exact match of the phrase results in a score that is then multiplied by 3. In the sort order, the last clause of the entire query, the _score parameter is built from these queries, so an exact phrase match of “elasticpress for wordpress search” within a single post field will get a nice high score and appear near the top of our results.

The second multi_match query is similar to the first, but its type is Elasticsearch default: best_fields. When used with the operator as "and", all words need to be in a field to match. The main difference from the first clause is the distance between words. For the second clause, “elasticpress something for something wordpress something search” would also be a match.

This second query also sets fuzziness as auto. Check the dedicated section on this document about that functionality.

Lastly, the third multi_match is a cross_field search. If one field has “elasticpress”, a second one has “wordpress” and a third one has “search”, this document is considered a match.

Sort Order

JSON
{
	...
    "sort": [ //Here we set the sort order by _score, aka the best results are first
        {
            "_score": {
                "order": "desc"
            }
        }
    ],
    ...
}

Each search result receives a score from Elasticsearch, depending on the field weight, the date decay (if applied), and the boost applied to the multi_match clauses. This clause makes Elasticsearch order results using that score. Setting it to post_date, for example, would ignore the score and simply order results by date.

ElasticPress versions, search algorithm evolution, and fuzziness

As time passed, the ElasticPress plugin used different search algorithms: until 3.4 we used a less restrictive algorithm, version 3.5 implemented a version without fuzziness, and in 4.0 we implemented a more balanced version. To use different search algorithms, please check this article.

One big difference between versions is the use of fuzziness. Depending on how your users search, you may or may not want fuzzy matching enabled. While fuzzy matching seems like a great feature (and it can be), it can create a situation where there are many, many items at the end of your search results that are loosely, if at all, related to the search query. Consider, for example, this (partial) list of matches for the search term dogs with a fuzziness value of 1: dog, does, cogs, hogs, logs, pogs, digs. A lot of those words aren’t even remotely related to the term dogs, but they’ll appear (later) in your search results, or even right on the top if your site doesn’t actually have anything related to the word(s) searched for.

If you want, it is possible to disable fuzziness entirely, as explained in this article.