I have a string that looks like this
24 (prem)-42-48 (6 ext)
and what I want to get out of it is
['24 prem', '42', '48', '6 ext']
I can get the numbers like this:
import re
MyString = r'24 (prem)-42-48 (6 ext)'
Splits = re.findall( r'(\d+)', MyString) # ['24','42','48','6']
but I lose the succeeding text.
I can also do this:
import re
MyString = r'24 (prem)-42-48 (6 ext)'
Splits = re.findall( r'[\\s:\\-]', MyString) # ['24 (prem)','42', '48 (6 ext)']
but that misses the (6 ext) item.
EDIT after seeing responses:
I think perhaps the simplest way for me to handle this would be to split on numbers and then just use str.replace to get rid of the "(" and " " characters.
So, is there a simple regex statement to split the string before the first character of a number?
The result from performing it on
'24 (prem)-42-48 (6 ext)'
would be
['24 (prem)-','42-',48 (', '6 ext)]
to get that result, you do not need regexps, all you need to do is remove the unwanted chars by replacing them with spaces and split the string on spaces:
>>> s ="24 (prem)-42-48 (6 ext)"
>>> l = s.replace('(',' ').replace('-',' ').replace('(',' ').replace(')',' ').split()
>>> l
['24', 'prem', '42', '48', '6', 'ext']
here's a version using translate for python3:
>>> s.translate(s.maketrans("()-", " ")).split()
['24', 'prem', '42', '48', '6', 'ext']
here's a version using regexps:
>>> list(filter(lambda x: x is not '', re.findall('[^-() ]*', s)))
['24', 'prem', '42', '48', '6', 'ext']
though, I'm considering that the '24 prem' and '6 ext' in the result list is a typo you made, otherwise there's no generic way to do what you want, though you can achieve this by doing:
>>> [" ".join(l[:2])] + l[2:-2] + [" ".join(l[-2:])]
['24 prem', '42', '48', '6 ext']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With