Python: efficiently join chunks of bytes into one big chunk?

Question

I'm trying to jury-rig the Amazon S3 python library to allow chunked handling of large files. Right now it does a "self.body = http_response.read()", so if you have a 3G file you're going to read the entire thing into memory before getting any control over it.

My current approach is to try to keep the interface for the library the same but provide a callback after reading each chunk of data. Something like the following:

data = []
while True:
    chunk = http_response.read(CHUNKSIZE)
    if not chunk:
        break
    if callback:
        callback(chunk)
    data.append(chunk)

Now I need to do something like:

self.body = ''.join(data)

Is join the right way to do this or is there another (better) way of putting all the chunks together?

Devin Jeanpierre · Accepted Answer

''join() is the best method for joining chunks of data. The alternative boils down to repeated concatenation, which is O(n**2) due to the immutability of strings and the need to create more at every concatenation. Given, this repeated concatenation is optimized by recent versions of CPython if used with += to become O(n), but that optimization only gives it a rough equivalent to ''.join() anyway, which is explicitly O(n) over the number of bytes.

Python: efficiently join chunks of bytes into one big chunk?

Tags:

python

amazon-s3

Parand

1 Answers

Devin Jeanpierre

Recent Activity

Donate For Us

Python: efficiently join chunks of bytes into one big chunk?

Tags:

python

amazon-s3

Parand

1 Answers

Devin Jeanpierre

Related questions

Recent Activity

Donate For Us