I use nltk 3.0.4 and notice that lemmas for words boss and bosses are different.
from nltk.stem.wordnet import WordNetLemmatizer
wnl = WordNetLemmatizer()
print wnl.lemmatize("boss", "n")
# returns "bos"
print wnl.lemmatize("bosses", "n")
# returns "boss"
From my point of view it's a weird behavior especially that boss is a known word in WordNet and there is a rule to keep ss.
Does anyone have an explanation or this is just a bug? How I should deal with it?
_morphy()) that generates the possible analyses for a given word, I found that there is no rule included to keep ss.Bos is also a base form in wordnet.Substitution rules:
MORPHOLOGICAL_SUBSTITUTIONS = {
NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
('men', 'man'), ('ies', 'y')],
VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
ADV: []}
Calling print wnl.lemmatize("boss", "n"):
Since a suitable base form (Bos) can be found when applying the substitution rules, it is returned. If this had not been included in wordnet the the lemma for boss would be boss since no shorter form can be found.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With