Ignoring Apostrophes (Possessive) In ElasticSearch

Question

I'm trying to get user submitted queries for "Joe Frankles", "Joe Frankle", "Joe Frankle's" to match the original text "Joe Frankle's". Right now we're indexing the field this text is in with (Tire / Ruby Format):

{ :type => 'string', :analyzer => 'snowball' }

and searching with:

query { string downcased_query, :default_operator => 'AND' }

I tried this unsuccessfully:

          create :settings => {
              :analysis => {
                :char_filter => {
                   :remove_accents => {
                     :type => "mapping",
                     :mappings => ["`=>", "'=>"]
                   }
                },
                :analyzer => {
                  :myanalyzer => {
                    :type => 'custom',
                    :tokenizer => 'standard',
                    :char_filter => ['remove_accents'],
                    :filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram']
                  }
                },
                :default => {
                  :type => 'myanalyzer'
                }
            }
          },

Simon Steinberger · Accepted Answer

There's two official ways of handling possessive apostrophes:

1) Use the "possessive_english" stemmer as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Example:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_stemmer"]
            }
        },
        "filter" : {
            "my_stemmer" : {
                "type" : "stemmer",
                "name" : "possessive_english"
            }
        }
    }
  }
}

Use other stemmers or snowball in addition to the "possessive_english" filter if you like. Should/Must work, but it's untested code.

2) Use the "word_delimiter" filter:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_word_delimiter"]
            }
        },
        "filter" : {
            "my_word_delimiter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }
        }
    }
  }
}

Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Both will cut off "'s".

Ignoring Apostrophes (Possessive) In ElasticSearch

Tags:

elasticsearch

tire

LMH

1 Answers

Simon Steinberger

Recent Activity

Donate For Us

Ignoring Apostrophes (Possessive) In ElasticSearch

Tags:

elasticsearch

tire

LMH

1 Answers

Simon Steinberger

Related questions

Recent Activity

Donate For Us