Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch n-gram example clarification

With reference to the example quoted here https://www.elastic.co/guide/en/elasticsearch/guide/current/ngrams-compound-words.html

looking for "Adler" returns results. A search for “Adler” becomes a query for the three terms adl, dle, and ler:

But why is the query for "Zdler" returning results even though zdl is not one of the terms ?

GET /my_index/my_type/_search
{
    "query": {
        "match": {
            "text": {
            "query": "zdler"
            }
          }
        }
}

Applying match query for search on "Adler" returns the record -- expected.

However, match query on "Zdler" also returns the record (because dle and ler match). Even setting "minimum_should_match": "100%" returns the record - not expected

Applying term query for search on "Adler" returns nothing -- not expected

POST /my_index/my_type/_search
    {
    "query": {
        "term": {
          "text": {
            "value": "Adler"
          }
        }
      }
    }

How do I achieve returning the record only for search on "Adler" and not on "Zdler" ?

 "settings": {
  "index": {
    "number_of_shards": "5",
    "provided_name": "my_index",
    "creation_date": "1501069624443",
    "analysis": {
      "filter": {
        "trigrams_filter": {
          "type": "ngram",
          "min_gram": "3",
          "max_gram": "3"
        }
      },
      "analyzer": {
        "trigrams": {
          "filter": [
            "lowercase",
            "trigrams_filter"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    },
    "number_of_replicas": "1",
    "uuid": "Z5BXi_RjTACzTsR_-Nu9tw",
    "version": {
      "created": "5040099"
    }
  }
}

and these are the mappings

{
 "my_index": {
"mappings": {
  "my_type": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "trigrams"
      }
    }
  }
}
like image 664
Akshata Avatar asked Dec 03 '25 08:12

Akshata


2 Answers

The solution is to apply standard analyser for search. below query returns a record and search for "zdler" will not return any result.

GET /my_index_2/my_type/_search
{
"query": {
    "match": {
        "text": {
            "query": "adler",
            "analyzer": "standard"
        }
    }
  }
}
like image 73
Akshata Avatar answered Dec 06 '25 17:12

Akshata


match query applies the field analyzer on the input query before throwing the query. this similarly produces tokens for input ("zdler") which are then matched again inverted index. But the same will not be the case with terms query as it doesn't apply field analyzer on input value

Match query breaks "adler" into -> "a", "d", "l", "e" ..... so on which are then matched against the inverted index.

Try to understand follow two queries

POST index5/_search
{
  "query": {
    "match": {
      "text": "zdler"
    }
  }
}


POST index5/_search
{
  "query": {
    "term": {
      "text": {
        "value": "zdler"
      }
    }
  }
}
like image 44
user3775217 Avatar answered Dec 06 '25 16:12

user3775217