I've been searching around for a while now, but I can't seem to find the answer to this small problem.
I have this code that is supposed to split the string after every three words:
import re
def splitTextToTriplet(Text):
x = re.split('^((?:\S+\s+){2}\S+).*',Text)
return x
print(splitTextToTriplet("Do you know how to sing"))
Currently the output is as such:
['', 'Do you know', '']
But I am actually expecting this output:
['Do you know', 'how to sing']
And if I print(splitTextToTriplet("Do you know how to")), it should also output:
['Do you know', 'how to']
how can I change the regex so it produces the expected output?
I believe re.split might not be the best approach for this since look-behind cannot take variable-length patterns.
Instead, you could use str.split and then join back words together.
def splitTextToTriplet(string):
words = string.split()
grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
return grouped_words
splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']
splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to']
Although be advised that with this solution, if some of your white spaces are linebreaks, that information will be lost in the process.
I used re.findall for the output you expected. To get more generic split function, I replaced splitTextToTriplet on splitTextonWords with numberOfWords as a param:
import re
def splitTextonWords(Text, numberOfWords=1):
if (numberOfWords > 1):
text = Text.lstrip()
pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
x =re.findall(pattern,text)
elif (numberOfWords == 1):
x = Text.split()
else:
x = None
return x
print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this code will fail at ", 3))
print(splitTextonWords(" A sentence this code will fail at s", 3))
print(splitTextonWords(" A sentence this code will fail at s", 4))
print(splitTextonWords(" A sentence this code will fail at s", 2))
print(splitTextonWords(" A sentence this code will fail at s", 1))
print(splitTextonWords(" A sentence this code will fail at s", 0))
output:
['Do you know', 'how to sing']
['Do you know', 'how to']
['Do you know', 'how to sing', 'how to dance', 'how to']
['A sentence this', 'code will fail', 'at']
['A sentence this', 'code will fail', 'at']
['A sentence this', 'code will fail', 'at s']
['A sentence this code', 'will fail at s']
['A sentence', 'this code', 'will fail', 'at s']
['A', 'sentence', 'this', 'code', 'will', 'fail', 'at', 's']
None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With