Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding letters in string, not followed by a number... possibly using RE?

I am trying to extract letters from a string, which do not follow, or are not followed directly by a number.

Here's an example string:

string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"

This is what I have so far:

re.findall(r"[a-z]+", string.lower())

which gives this result:

['ts', 'lod', 'lr', 'billboards', 'rgba', 'over', 's', 'd', 'lf', 'v', 'kdciufa', 'lnh']

... but the result I am looking for is something more like this:

['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']

Is there a way of achieving this using regular expressions?

Many thanks,

like image 815
iGwok Avatar asked Oct 16 '25 12:10

iGwok


2 Answers

Use negative look-arounds:

re.findall(r"(?<![\da-z])[a-z]+(?![\da-z])", string.lower())

This matches lower-case letters that are not immediately preceded or followed by more letters or digits.

Demo:

>>> import re
>>> string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"
>>> re.findall(r"(?<![\da-z])[a-z]+(?![\da-z])", string.lower())
['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']
like image 66
Martijn Pieters Avatar answered Oct 18 '25 01:10

Martijn Pieters


An alternative to using findall is to split the string into individual words, and then filter out any words containing non-alphabetical characters.

import re

string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"

#split on non-alphanumeric characters
words = re.split("[^a-z0-9]", string.lower())
print "words:", words

filtered_words = filter(str.isalpha, words)
print "filtered words:", filtered_words

Result:

words: ['ts0060', 'lod', '70234', 'lr2', 'billboards', 'rgba', 'over', 's3d', 'lf', 'v5', '2kdciufa', 'lnh']
filtered words: ['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']
like image 26
Kevin Avatar answered Oct 18 '25 02:10

Kevin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!