Separating between Hebrew and English strings

Question

So I have this huge list of strings in Hebrew and English, and I want to extract from them only those in Hebrew, but couldn't find a regex example that works with Hebrew.

I have tried the stupid method of comparing every character:

import string
data = []
for s in slist:
    found = False
    for c in string.ascii_letters:
        if c in s:
            found = True
    if not found:
        data.append(s)

And it works, but it is of course very slow and my list is HUGE. Instead of this, I tried comparing only the first letter of the string to string.ascii_letters which was much faster, but it only filters out those that start with an English letter, and leaves the "mixed" strings in there. I only want those that are "pure" Hebrew.

I'm sure this can be done much better... Help, anyone?

P.S: I prefer to do it within a python program, but a grep command that does the same would also help

Błotosmętek · Accepted Answer

To check if a string contains any ASCII letters (ie. non-Hebrew) use:

re.search('[' + string.ascii_letters + ']', s)

If this returns true, your string is not pure Hebrew.

Sufian Latif · Answer

This one should work:

import re
data = [s for s in slist if re.match('^[a-zA-Z ]+$', s)]

This will pick all the strings that consist of lowercase and uppercase English letters and spaces. If the strings are allowed to contain digits or punctuation marks, the allowed characters should be included into the regex.

Edit: Just noticed, it filters out the English-only strings, but you need it do do the other way round. You can try this instead:

data = [s for s in slist if not re.match('^.*[a-zA-Z].*$', s)]

This will discard any string that contains at least one English letter.

Separating between Hebrew and English strings

Tags:

python

regex

hebrew

Ofer Sadan

2 Answers

Błotosmętek

Sufian Latif

Recent Activity

Donate For Us

Separating between Hebrew and English strings

Tags:

python

regex

hebrew

Ofer Sadan

2 Answers

Błotosmętek

Sufian Latif

Related questions

Recent Activity

Donate For Us