How can I get Elastic Search to only highlight words that caused the document to be returned?
I have the following index
{
"mappings": {
"document": {
"properties": {
"content": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
}
}
Let say I have indexed:
Nuclear power is the use of nuclear reactions that release nuclear energy[5] to generate heat, which most frequently is then used in steam turbines to produce electricity in a nuclear power station. The term includes nuclear fission, nuclear decay and nuclear fusion. Presently, the nuclear fission of elements in the actinide series of the periodic table produce the vast majority of nuclear energy in the direct service of humankind, with nuclear decay processes, primarily in the form of geothermal energy, and radioisotope thermoelectric generators, in niche uses making up the rest.
And search for "nuclear elements"~2
I only want "nuclear fission of elements" or parts of "nuclear fission of elements" to be highlighted but every single occurrence of nuclear is now highlighted.
This is my query if it helps:
{
"fields": [
],
"query": {
"query_string": {
"query": "\"nuclear elements\"~2",
"fields": [
"content.english"
]
}
},
"highlight": {
"pre_tags": [
"<em class='h'>"
],
"post_tags": [
"</em>"
],
"fragment_size": 500,
"number_of_fragments": 20,
"fields": {
"content.english": {}
}
}
}
There is a highlighting bug in ES 2.1, which was caused due to this change. This has been fixed by this Pull Request.
According to ES developer
This is a bug that I introduced in #13239 while thinking that the differences were due to changes in Lucene: extractUnknownQuery is also called when span extraction already succeeded, so we should only fall back to Weight.extractTerms if no spans have been extracted yet.
It works in older versions till 2.0 and would work as expected in future versions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With