Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I choose the line separator when reading a file?

Tags:

python

I am trying to read a file which contains one single 2.9 GB long line separated by commas. This code would read the file line by line, with each print stopping at '\n':

with open('eggs.txt', 'rb') as file:
    for line in file:
        print(line)

How can I instead iterate over "lines" that stop at ', ' (or any other character/string)?

like image 466
RetroCode Avatar asked Jan 24 '26 06:01

RetroCode


1 Answers

I don't think there is a built-in way to achieve this. You will have to use file.read(block_size) to read the file block by block, split each block at commas, and rejoin strings that go across block boundaries manually.

Note that you still might run out of memory if you don't encounter a comma for a long time. (The same problem applies to reading a file line by line, when encountering a very long line.)

Here's an example implementation:

def split_file(file, sep=",", block_size=16384):
    last_fragment = ""
    while True:
        block = file.read(block_size)
        if not block:
            break
        block_fragments = iter(block.split(sep))
        last_fragment += next(block_fragments)
        for fragment in block_fragments:
            yield last_fragment
            last_fragment = fragment
    yield last_fragment
like image 166
Sven Marnach Avatar answered Jan 26 '26 20:01

Sven Marnach



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!