a very green python user here, so go easy on me and the docs haven't helped me understand what I'm missing. Similar to RE split multiple arguments | (or) returns none python, I need to split a string on multiple delimiters. The above question only allows either keeping none or keeping both delimiters - I need to keep only one of them. Note that the above question was from 2012, so likely a much earlier version of Python that 3.6, which I'm using.
My data:
line = 'APPLE,ORANGE CHERRY APPLE'
I want a list returned that looks like:
['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
I need to keep the comma so I can remove duplicate components later. I have that part working if I could just get the list created properly. Here's what I've got.
list = re.split(r'\s|(,)',line)
print(list)
My logic here is split on space and comma but only keep the comma - makes sense to me. Nope:
['APPLE', ',', 'ORANGE', None, 'CHERRY', None, 'APPLE']
I've also tried what is mentioned in the above linked question, to put the entire group in a capture:
re.split(r'(\s|(,))',line)
Nope again:
['APPLE', ',', ',', 'ORANGE', ' ', None, 'CHERRY', ' ', None, 'APPLE']
What am I missing? I know it's related to how my capture groups are set up but I can't figure it out. Thanks in advance!
I suggest using a matching approach with
re.findall(r'[^,\s]+|,', line)
See the regex demo. The [^,\s]+|, pattern matches
[^,\s]+ - one or more chars other than a comma and whitespace| - or, - a comma.See a Python demo:
import re
line = 'APPLE,ORANGE CHERRY APPLE'
l = re.findall(r'[^,\s]+|,', line)
print(l) # => ['APPLE', ',', 'ORANGE', 'CHERRY', 'APPLE']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With