Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : Split string every three words

I've been searching around for a while now, but I can't seem to find the answer to this small problem.

I have this code that is supposed to split the string after every three words:

import re

def splitTextToTriplet(Text):
    x = re.split('^((?:\S+\s+){2}\S+).*',Text)
    return x


print(splitTextToTriplet("Do you know how to sing"))

Currently the output is as such:

['', 'Do you know', '']

But I am actually expecting this output:

['Do you know', 'how to sing'] 

And if I print(splitTextToTriplet("Do you know how to")), it should also output:

['Do you know', 'how to'] 

how can I change the regex so it produces the expected output?

like image 451
Lieberta Avatar asked May 04 '26 20:05

Lieberta


2 Answers

I believe re.split might not be the best approach for this since look-behind cannot take variable-length patterns.

Instead, you could use str.split and then join back words together.

def splitTextToTriplet(string):
    words = string.split()
    grouped_words = [' '.join(words[i: i + 3]) for i in range(0, len(words), 3)]
    return grouped_words

splitTextToTriplet("Do you know how to sing")
# ['Do you know', 'how to sing']

splitTextToTriplet("Do you know how to")
# ['Do you know', 'how to'] 

Although be advised that with this solution, if some of your white spaces are linebreaks, that information will be lost in the process.

like image 166
Olivier Melançon Avatar answered May 07 '26 09:05

Olivier Melançon


I used re.findall for the output you expected. To get more generic split function, I replaced splitTextToTriplet on splitTextonWords with numberOfWords as a param:

import re

def splitTextonWords(Text, numberOfWords=1):
    if (numberOfWords > 1):
        text = Text.lstrip()
        pattern = '(?:\S+\s*){1,'+str(numberOfWords-1)+'}\S+(?!=\s*)'
        x =re.findall(pattern,text)
    elif (numberOfWords == 1):
        x = Text.split()
    else: 
        x = None
    return x

print(splitTextonWords("Do you know how to sing", 3))
print(splitTextonWords("Do you know how to", 3))
print(splitTextonWords("Do you know how to sing how to dance how to", 3))
print(splitTextonWords("A sentence this code will fail at", 3))
print(splitTextonWords("A sentence this             code will fail at ", 3))
print(splitTextonWords("   A sentence this code will fail at s", 3))
print(splitTextonWords("   A sentence this code will fail at s", 4))
print(splitTextonWords("   A sentence this code will fail at s", 2))
print(splitTextonWords("   A sentence this code will fail at s", 1))
print(splitTextonWords("   A sentence this code will fail at s", 0))

output:

['Do you know', 'how to sing']
['Do you know', 'how to']
['Do you know', 'how to sing', 'how to dance', 'how to']
['A sentence this', 'code will fail', 'at']
['A sentence this', 'code will fail', 'at']
['A sentence this', 'code will fail', 'at s']
['A sentence this code', 'will fail at s']
['A sentence', 'this code', 'will fail', 'at s']
['A', 'sentence', 'this', 'code', 'will', 'fail', 'at', 's']
None

like image 30
TigerTV.ru Avatar answered May 07 '26 09:05

TigerTV.ru