How can I use a filter in connection with an aggregate in elasticsearch?
The official documentation gives only trivial examples for filter and for aggregations and no formal description of the query dsl - compare it e.g. with postgres documentation.
Through trying out I found following query, which is accepted by elasticsearch (no parsing errors), but ignores the given filters:
{   "filter": {     "and": [       {         "term": {           "_type": "logs"         }       },       {         "term": {           "dc": "eu-west-12"         }       },       {         "term": {           "status": "204"         }       },       {         "range": {           "@timestamp": {             "from": 1398169707,             "to": 1400761707           }         }       }     ]   },   "size": 0,   "aggs": {     "time_histo": {       "date_histogram": {         "field": "@timestamp",         "interval": "1h"       },       "aggs": {         "name": {           "percentiles": {             "field": "upstream_response_time",             "percents": [               98.0             ]           }         }       }     }   } } Some people suggest using query instead of filter. But the official documentation generally recommends the opposite for filtering on exact values. Another issue with query: while filters offer an and, query does not.
Can somebody point me to documentation, a blog or a book, which describe writing non-trivial queries: at least an aggregate plus multiple filters.
With Aggregated Filters, you are now able to filter on an aggregated column value. Aggregated Filters can be applied to string, amount, or date columns and are available to be used everywhere Filters are applied: Inside the Analyzer.
This allows you to set up a range of criteria and sub-criteria with buckets, then place metrics to calculate values for your reports about each criteria.
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
I ended up using a filter aggregation - not filtered query. So now I have 3 nested aggs elements.
I also use bool filter instead of and as recommended by @alex-brasetvik because of http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
My final implementation:
{   "aggs": {     "filtered": {       "filter": {         "bool": {           "must": [             {               "term": {                 "_type": "logs"               }             },             {               "term": {                 "dc": "eu-west-12"               }             },             {               "term": {                 "status": "204"               }             },             {               "range": {                 "@timestamp": {                   "from": 1398176502000,                   "to": 1400768502000                 }               }             }           ]         }       },       "aggs": {         "time_histo": {           "date_histogram": {             "field": "@timestamp",             "interval": "1h"           },           "aggs": {             "name": {               "percentiles": {                 "field": "upstream_response_time",                 "percents": [                   98.0                 ]               }             }           }         }       }     }   },   "size": 0 } If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With