I'm trying to split up a string of characters into a list while excluding certain substrings.
For example:
>>> sentences = '<s>I like dogs.</s><s>It\'s Monday today.</s>'
>>> substring1 = '<s>'
>>> substring2 = '</s>'
>>> print(split_string(sentences))
['<s>', 'I', ' ', 'l', 'i', 'k', 'e', ' ', 'd', 'o', 'g', 's',
'.', '</s>', '<s>', 'I', 't', "'", 's', ' ', 'M', 'o', 'n', 'd',
'a', 'y', ' ', 't', 'o', 'd', 'a', 'y', '.', '</s>']
As you can see, the string is split up into characters, except for the listed substrings. How can I do this in Python?
You could use re.findall for this. :)
import re
sentences = '<s>I like dogs.</s><s>It\'s Monday today.</s>'
print(re.findall(r'<\/?s>|.',sentences))
OUTPUT
['<s>', 'I', ' ', 'l', 'i', 'k', 'e', ' ', 'd', 'o', 'g', 's', '.', '</s>', '<s>', 'I', 't', "'", 's', ' ', 'M', 'o', 'n', 'd', 'a', 'y', ' ', 't', 'o', 'd', 'a', 'y', '.', '</s>']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With