How can I get lemmas for Arabic words? I tried the ISRI Arabic Stemmer from NLTK but it returns roots of words:
from nltk.stem.isri import ISRIStemmer
st = ISRIStemmer()
print st.stem(u'اعلاميون')
It returns the root علم
and i want the lemma اعلامي
The state-of-the-art is Farasa Lemmatizer.
Farasa Lemmatizer outperforms MADAMIRA Lemmatizer based on accuracy. It gives +7% relative gain in accuracy above MADAMIRA in lemmatization task.
You can read more about Farasa Lemmatizer from the following link: https://arxiv.org/pdf/1710.06700.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With