I need split strings into words, then join each consecutive word in pairs, like so:
"This is my subject string"
Would go to:
"This is"
"is my"
"my subject"
"subject string"
The strings would be anywhere from 5 words to 250 words. Also, it would be doing this on a lot of data, 1GB or so. Is there an efficient way to do this in Python?
I've seen lots advice about which methods for things are most efficient, so wanted to ask first.
You could do it with the split method and list comprehensions:
text = "This is my subject string"
words = text.split() #note that split without arguments splits on whitespace
pairs = [words[i]+' '+words[i+1] for i in range(len(words)-1)]
print(pairs)
There's an itertools recipe called pairwise built exactly for this! You'd be crazy not to use it too.
>>> from itertools import tee, izip
>>> def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
>>> list(pairwise(text.split()))
[('This', 'is'), ('is', 'my'), ('my', 'subject'), ('subject', 'string')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With