Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

query vs filter and usage of correct expression in the query or filter

I've seen lots of questions in S.O as well as reading the documentation about "filters are cached" while queries are not cached and that "queries are applied on all values" and "filters are applied after queries if outside the query object" etc.

Bottom line is that the documentation sucks and the DSL is very difficult to grasp. I'm trying to optimize some queries and using the kibana dev tools search profiler, but my local data sets must be too small to measure the actual performance difference (I'm getting results in both directions) and I don't have a test cluster with multiple nodes to work against on a real and large data set.

In this trivial case, all queries will return the same results. I want to understand the difference and why would you prefer a query rather than a filter in any cases that would allow to place the clause in a filter instead

GET foo11/_search
{
  "query": {
    "bool": {
      "filter": {
        "match" : {
          "in_stock" : true
        }
      }
    }
  }
}

GET foo11/_search
{
  "query": {
    "bool": {
      "filter": {
        "term" : {
          "in_stock" : true
        }
      }
    }
  }
}


GET foo11/_search
{
  "query": {
    "bool": {
      "must": {
        "match" : {
          "in_stock" : true
        }
      }
    }
  }
}

what is the difference in these 3 cases in the performance? Can I actually prove that one is better/worse than the other?

What is the difference between:

"match" : {
  "in_stock" : true
}

vs

"term" : {
  "in_stock" : true
}
like image 919
Avner Barr Avatar asked Oct 18 '25 14:10

Avner Barr


1 Answers

There are a couple different questions and concepts there to unpack.

Match vs Term

A match query performs analysis (removing common stop words, stemming to remove trailing "ing", "es", etc) on the search values you provide, before looking for it in the index. The goal of analysis is to make words that mean roughly the same thing match, for example if you search for "bananas" but you indexed "banana" it would still find it. It's worth noting that for this to work analysis ALSO has to happen on the field when you index the data, which is what a text type field in Elasticsearch does.

A term query, is an exact match without any analysis performed. This is more like what you would be used to in an relational database. These are preformed against keyword fields and other data type fields (numeric, boolean, dates). If you need to match both ways, you can index the field using both types.

Query vs Filter

A query in elasticsearch is a series of search clauses that will be scored and ranked against each other based upon their relevance. In other words, based upon the words you asked me to search for, which documents seem the most relevant.

A filter in elasticsearch restricts the set of records against which a query runs and does not preform scoring. You can think of it as a first pass that determines what records to check before you do the more expensive computations of determining how relevant your search terms are to each document.

The other important difference that you mentioned is filters are cached, but queries are not. Usually if you have broad conditions you want to apply you can make those filters, and make the "human text" search portion a query. Generally speaking if you have broad restrictions you can do to limit the searchable set of documents you can put that in a filter to take advantage of caching and time savings from avoiding scoring. For example something like: filter only the products in cookbooks, then query for titles with the word bananas.

Performance Measurements

It can be difficult to measure query performance because there are a lot of moving parts in the mix. The best approach if you have the time to do is to index a representative (and somewhat large) amount of data to a single node and then do your initial testing against that before scaling out. You might also want to look at Elasticsearch's tool for performance testing, called Rally.

https://github.com/elastic/rally

Putting it all together

For your example above, since the field you are searching is a boolean you would want to do a term query, not an match query. Also, you can do it in a filter clause because there is no relevance scoring to do for a single boolean. If you wanted to combine it with other text searching you might add a match clause in a query context to your json body.

like image 192
Ryan Widmaier Avatar answered Oct 21 '25 06:10

Ryan Widmaier