Elasticsearch partitioned indices skipped versus match no docs query

Question

We're having indices that are partitioned by year, e.g.:

items-2019
items-2020

Consider the following data:

POST items-2019/_doc
{
  "@timestamp": "2019-01-01"
}

POST items-2020/_doc
{
  "@timestamp": "2020-01-01"
}


POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "items-*",
        "alias": "items"
      }
    }
  ]
}

Now when I query data and explicitly sort results, it would skip items-2020 shards:

GET items/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "lt": "2020-01-01"
      }
    }
  },
  "sort": {
    "@timestamp": "desc"
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 1,    <--- skipped
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "items-2019",
        "_type" : "_doc",
        "_id" : "BTdSb3UBRFH0Yqe1vm_W",
        "_score" : null,
        "_source" : {
          "@timestamp" : "2019-01-01"
        },
        "sort" : [
          1546300800000
        ]
      }
    ]
  }
}

However when I don't sort results explicitly, it wouldn't skip the shards, however ES would issue a MatchNoDocsQuery:

GET items/_search
{
  "profile": "true",
  "query": {
    "range": {
      "@timestamp": {
        "lt": "2020-01-01"
      }
    }
  }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,    <--- nothing skipped
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "items-2019",
        "_type" : "_doc",
        "_id" : "BTdSb3UBRFH0Yqe1vm_W",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2019-01-01"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[Axyv60mYQEGAREa2TwbgMQ][items-2019][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "ConstantScoreQuery",
                "description" : "ConstantScore(DocValuesFieldExistsQuery [field=@timestamp])",
                "time_in_nanos" : 69525,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 3766,
                  "match" : 0,
                  "next_doc_count" : 1,
                  "score_count" : 1,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 4123,
                  "advance_count" : 1,
                  "score" : 1123,
                  "build_scorer_count" : 2,
                  "create_weight" : 29745,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 30768
                },
                "children" : [
                  {
                    "type" : "DocValuesFieldExistsQuery",
                    "description" : "DocValuesFieldExistsQuery [field=@timestamp]",
                    "time_in_nanos" : 18317,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 1474,
                      "match" : 0,
                      "next_doc_count" : 1,
                      "score_count" : 0,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 1541,
                      "advance_count" : 1,
                      "score" : 0,
                      "build_scorer_count" : 2,
                      "create_weight" : 1184,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 14118
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 4660,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 22374
              }
            ]
          }
        ],
        "aggregations" : [ ]
      },
      {
        "id" : "[Axyv60mYQEGAREa2TwbgMQ][items-2020][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "MatchNoDocsQuery",
                "description" : """MatchNoDocsQuery("User requested "match_none" query.")""", <-- here
                "time_in_nanos" : 4166,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 0,
                  "match" : 0,
                  "next_doc_count" : 0,
                  "score_count" : 0,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 0,
                  "advance_count" : 0,
                  "score" : 0,
                  "build_scorer_count" : 1,
                  "create_weight" : 1791,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 2375
                }
              }
            ],
            "rewrite_time" : 4353,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 12887
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

So there are couple of questions here:

Does skipping truly skip shards?
How are skipped shards and MatchNoDocsQuery different?
What's the cost of MatchNoDocsQuery?
How does sorting allow shards to be skipped?
If we sort results, do we really completely skip shards and not even touch them during search?

Val · Accepted Answer

That's a great deal of questions bundled into one, but here's my attempt:

Does skipping truly skip shards?

How does sorting allow shards to be skipped?

If we sort results, do we really completely skip shards and not even touch them during search?

Yes, ES tries to be smart enough to figure out which shards to hit before actually sending the query to those shards. The _search_shards API helps here but not only as can be seen from the explanation in this issue.

If you search issues for the keywords can_match, skip and shard you'll find plenty of other optimizations implemented all over the place that aim at making ES execution plan smarter and faster.

If you want to see how this is coded, you can start in the SearchService.canMatch() method. That's where the service can decide whether the query can be rewritten to MatchNoDocsQuery. If you add a suggest or global aggregation (which must visit all documents no matter what), you'll see that shards are not skipped any more, even with the sort present.

What's the cost of MatchNoDocsQuery?

I wouldn't worry about it, as it's not only negligible, but out of your hands.

How does sorting allow shards to be skipped?

As explained in the issue #51852 I linked above, This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard. In other words, ES is smart enough to know which will contain valid hits or not depending on the sort value. In your case, since the sort on the timestamp excludes all values from 2020, ES knows that the shard(s) from the 2020 index can be excluded since none will match.

Another possibility is to leverage index sorting so that terms are sorted at indexing time. Terms are sorted in each segment of the index but also every time segments are merged, the new merged set of terms needs to be resorted again, so this can have performance implications. Test before use!

Elasticsearch partitioned indices skipped versus match no docs query

Tags:

elasticsearch

Evaldas Buinauskas

1 Answers

Val

Recent Activity

Donate For Us

Elasticsearch partitioned indices skipped versus match no docs query

Tags:

elasticsearch

Evaldas Buinauskas

1 Answers

Val

Related questions

Recent Activity

Donate For Us