Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch: Highlights every word in phrase query

How can I get Elastic Search to only highlight words that caused the document to be returned?

I have the following index

{
  "mappings": {
    "document": {
      "properties": {
        "content": {
          "type": "string",
          "fields": {
            "english": {
              "type": "string",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}

Let say I have indexed:

Nuclear power is the use of nuclear reactions that release nuclear energy[5] to generate heat, which most frequently is then used in steam turbines to produce electricity in a nuclear power station. The term includes nuclear fission, nuclear decay and nuclear fusion. Presently, the nuclear fission of elements in the actinide series of the periodic table produce the vast majority of nuclear energy in the direct service of humankind, with nuclear decay processes, primarily in the form of geothermal energy, and radioisotope thermoelectric generators, in niche uses making up the rest.

And search for "nuclear elements"~2

I only want "nuclear fission of elements" or parts of "nuclear fission of elements" to be highlighted but every single occurrence of nuclear is now highlighted.

This is my query if it helps:

{
  "fields": [
  ],
  "query": {
    "query_string": {
      "query": "\"nuclear elements\"~2",
      "fields": [
        "content.english"
      ]
    }
  },
  "highlight": {
    "pre_tags": [
      "<em class='h'>"
    ],
    "post_tags": [
      "</em>"
    ],
    "fragment_size": 500,
    "number_of_fragments": 20,
    "fields": {
      "content.english": {}
    }
  }
} 
like image 944
Alex Lyman Avatar asked Aug 31 '25 03:08

Alex Lyman


1 Answers

There is a highlighting bug in ES 2.1, which was caused due to this change. This has been fixed by this Pull Request.

According to ES developer

This is a bug that I introduced in #13239 while thinking that the differences were due to changes in Lucene: extractUnknownQuery is also called when span extraction already succeeded, so we should only fall back to Weight.extractTerms if no spans have been extracted yet.

It works in older versions till 2.0 and would work as expected in future versions.

like image 129
ChintanShah25 Avatar answered Sep 02 '25 19:09

ChintanShah25