How can I choose the line separator when reading a file?

Question

I am trying to read a file which contains one single 2.9 GB long line separated by commas. This code would read the file line by line, with each print stopping at ' ':

with open('eggs.txt', 'rb') as file:
    for line in file:
        print(line)

How can I instead iterate over "lines" that stop at ', ' (or any other character/string)?

Sven Marnach · Accepted Answer

I don't think there is a built-in way to achieve this. You will have to use file.read(block_size) to read the file block by block, split each block at commas, and rejoin strings that go across block boundaries manually.

Note that you still might run out of memory if you don't encounter a comma for a long time. (The same problem applies to reading a file line by line, when encountering a very long line.)

Here's an example implementation:

def split_file(file, sep=",", block_size=16384):
    last_fragment = ""
    while True:
        block = file.read(block_size)
        if not block:
            break
        block_fragments = iter(block.split(sep))
        last_fragment += next(block_fragments)
        for fragment in block_fragments:
            yield last_fragment
            last_fragment = fragment
    yield last_fragment

How can I choose the line separator when reading a file?

Tags:

python

RetroCode

1 Answers

Sven Marnach

Recent Activity

Donate For Us

How can I choose the line separator when reading a file?

Tags:

python

RetroCode

1 Answers

Sven Marnach

Related questions

Recent Activity

Donate For Us