Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Arabic Diacritics using Python

Tags:

python

arabic

I want to filter my text by removing Arabic diacritics using Python.

For example:

Context Text
Before filtering اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا
After filtering اللهم اغفر لنا ولوالدينا

I have found that this can be done using CAMeL Tools but I am not sure how.

like image 754
user15295803 Avatar asked Apr 18 '26 08:04

user15295803


2 Answers

You can use the library pyArabic like this:

import pyarabic.araby as araby

before_filter="اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا"
after_filter = araby.strip_diacritics(before_filter)

print(after_filter)
# will print : اللهم اغفر لنا ولوالدينا

You can try different strip filters:

araby.strip_harakat(before_filter)  # 'اللّهمّ اغفر لنا ولوالدينا'
araby.strip_lastharaka(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
araby.strip_shadda(before_filter)  # 'اللَهمَ اغْفِرْ لنَا ولوالدِينَا'
araby.strip_small(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
araby.strip_tashkeel(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
araby.strip_tatweel(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
like image 196
Seddik Mekki Avatar answered Apr 20 '26 21:04

Seddik Mekki


You really don't need to use any library for this, just plain regex:

import re
text = 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا '    
output=re.sub(u'[\u064e\u064f\u0650\u0651\u0652\u064c\u064b\u064d\u0640\ufc62]','',text)
print(output)
#اللهم اغفر لنا ولوالدينا 
like image 37
hmghaly Avatar answered Apr 20 '26 22:04

hmghaly