I'm trying to get user submitted queries for "Joe Frankles", "Joe Frankle", "Joe Frankle's" to match the original text "Joe Frankle's". Right now we're indexing the field this text is in with (Tire / Ruby Format):
{ :type => 'string', :analyzer => 'snowball' }
and searching with:
query { string downcased_query, :default_operator => 'AND' }
I tried this unsuccessfully:
create :settings => {
:analysis => {
:char_filter => {
:remove_accents => {
:type => "mapping",
:mappings => ["`=>", "'=>"]
}
},
:analyzer => {
:myanalyzer => {
:type => 'custom',
:tokenizer => 'standard',
:char_filter => ['remove_accents'],
:filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram']
}
},
:default => {
:type => 'myanalyzer'
}
}
},
There's two official ways of handling possessive apostrophes:
1) Use the "possessive_english" stemmer as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html
Example:
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "possessive_english"
}
}
}
}
}
Use other stemmers or snowball in addition to the "possessive_english" filter if you like. Should/Must work, but it's untested code.
2) Use the "word_delimiter" filter:
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_word_delimiter"]
}
},
"filter" : {
"my_word_delimiter" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
}
}
}
}
Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
Both will cut off "'s".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With