Aggregations API for Grouping Data

by

Publishers and eCommerce store owners often need to display the count of items (products, posts, etc.) within a category. In the context of a store, a store owner may want to show color filters where each clickable filter shows the number of products inside of it: “there are 15 shirts available in red.” A blogger may show the blog’s categories in a widget and want to show the number of posts within each category. Using WordPress core functionality, one would have to execute a number of slow MySQL queries to calculate those totals. Using Elasticsearch, we can find this information performantly using the Aggregations API.

ElasticPress integrates with the Aggregations API using special WP_Query parameters. Here’s an example query that will find all published posts containing the phrase “ElasticPress” as well as aggregate and find group totals for all categories in the returned posts. Meaning, if five posts containing the phrase “ElasticPress” were assigned to the category “summer”, the query’s aggregation data would include the category “summer” with an aggregation total of 5.

$query = new WP_Query( [
  's'                      => ‘ElasticPress’,
  'post_status'    => 'publish',
  'ep_integrate'   => true,
  'posts_per_page' => 10,
  'aggs'           => [
     'name' => 'category_stats',
     'aggs' => [
        'terms' => [
           'field' => 'terms.category.slug',
        ],

     ],
  ],
  'orderby'        => '_score',
] );

The query itself will return matching posts, as usual. To retrieve aggregations results, we hook into an action fired by ElasticPress called ep_retrieve_aggregations. That action hook should be setup before executing the query, like so:

add_action( 'ep_retrieve_aggregations', function ( array $aggregations ) {
    $agg_response = isset( $aggregations['category_stats']['buckets'] ) ? $aggregations['category_stats']['buckets'] : [];
}, 10, 1 );

As you can see, ep_retrieve_aggregations is passed an array of aggregations, which looks like this:

"category_stats": {
  "doc_count_error_upper_bound": 0,
  "sum_other_doc_count": 5,
  "buckets": [
     {
       "key": "category1",
       "doc_count": 717
     },
     {
       "key": "category2",
       "doc_count": 443
     },
     {
       "key": "category3",
       "doc_count": 389
     },
     {
       "key": "category4",
       "doc_count": 278
     },
     {
       "key": "category5",
       "doc_count": 112
     }
    ...
  ]
}

Our aggregation data contains three keys.

The most useful one is “buckets”. It contains our statistics in descending order of document (post) count. We can see, in our above scenario, that for all published posts matching “elasticpress”, the category named “category1” is assigned to 717 of those posts, “category2” is used on 443 posts, and so on.

The second key, sum_other_doc_count, is the total number of documents that don’t fall into a bucket (category). In our example scenario it means that there are 5 documents (posts) that aren’t categorized.

The third key, doc_count_error_upper_bound, gives us additional statistical information about potential document count errors as document counts in aggregations are approximate.

Using this data, we can output category totals for visitors to see.

As always, 10up provides professional consultation services for ElasticPress and tricky features such as aggregations. Head over to our contact form and tell us how we can help.