I am trying to read a file which contains one single 2.9 GB long line separated by commas. This code would read the file line by line, with each print stopping at '\n':
with open('eggs.txt', 'rb') as file:
for line in file:
print(line)
How can I instead iterate over "lines" that stop at ', ' (or any other character/string)?
I don't think there is a built-in way to achieve this. You will have to use file.read(block_size) to read the file block by block, split each block at commas, and rejoin strings that go across block boundaries manually.
Note that you still might run out of memory if you don't encounter a comma for a long time. (The same problem applies to reading a file line by line, when encountering a very long line.)
Here's an example implementation:
def split_file(file, sep=",", block_size=16384):
last_fragment = ""
while True:
block = file.read(block_size)
if not block:
break
block_fragments = iter(block.split(sep))
last_fragment += next(block_fragments)
for fragment in block_fragments:
yield last_fragment
last_fragment = fragment
yield last_fragment
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With