Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene analyzer for first name

Is there a Lucene analyzer out there that tokenizes name parts with their short name equivalents (e.g. Mike and Michael, Rich and Richard, Suzie and Susan), etc?

Fuzzy match on Levenshtein distance is a solution I know, and some implementors seem to pair fuzzy match with the soundex algorithm. Surely somebody has made a swipe at just plain listing all of these short names somewhere?

EDIT: The toughest part of this question is where to get the synonym data from?

like image 487
Jonathan Schneider Avatar asked Dec 11 '25 09:12

Jonathan Schneider


1 Answers

I am not aware of any specific nickname filter out there.

A SynonymFilter would make it reasonably easy to generate though, if you had a data source for it. This appears to be a good source of nickname data:

https://code.google.com/p/nickname-and-diminutive-names-lookup/

You would need to generate the SynonymMap to pass into the SynonymFilter ctor, which should look something like this (I think):

SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("Mike"), new CharsRef("Michael"), false);
builder.add(new CharsRef("Rich"), new CharsRef("Richard"), false);
builder.add(new CharsRef("Suzie"), new CharsRef("Susan"), false);
SynonymMap map = builder.build();
like image 132
femtoRgon Avatar answered Dec 14 '25 06:12

femtoRgon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!