Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What lemmatizer can i use for arabic text using python?

How can I get lemmas for Arabic words? I tried the ISRI Arabic Stemmer from NLTK but it returns roots of words:

from nltk.stem.isri import ISRIStemmer
st = ISRIStemmer()
print st.stem(u'اعلاميون')

It returns the root علم and i want the lemma اعلامي

like image 916
msm Avatar asked Oct 21 '25 11:10

msm


1 Answers

The state-of-the-art is Farasa Lemmatizer.

Farasa Lemmatizer outperforms MADAMIRA Lemmatizer based on accuracy. It gives +7% relative gain in accuracy above MADAMIRA in lemmatization task.

You can read more about Farasa Lemmatizer from the following link: https://arxiv.org/pdf/1710.06700.pdf

like image 76
disooqi Avatar answered Oct 23 '25 00:10

disooqi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!