Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: efficiently join chunks of bytes into one big chunk?

I'm trying to jury-rig the Amazon S3 python library to allow chunked handling of large files. Right now it does a "self.body = http_response.read()", so if you have a 3G file you're going to read the entire thing into memory before getting any control over it.

My current approach is to try to keep the interface for the library the same but provide a callback after reading each chunk of data. Something like the following:

data = []
while True:
    chunk = http_response.read(CHUNKSIZE)
    if not chunk:
        break
    if callback:
        callback(chunk)
    data.append(chunk)

Now I need to do something like:

self.body = ''.join(data)

Is join the right way to do this or is there another (better) way of putting all the chunks together?

like image 249
Parand Avatar asked Dec 06 '25 06:12

Parand


1 Answers

''join() is the best method for joining chunks of data. The alternative boils down to repeated concatenation, which is O(n**2) due to the immutability of strings and the need to create more at every concatenation. Given, this repeated concatenation is optimized by recent versions of CPython if used with += to become O(n), but that optimization only gives it a rough equivalent to ''.join() anyway, which is explicitly O(n) over the number of bytes.

like image 72
Devin Jeanpierre Avatar answered Dec 07 '25 21:12

Devin Jeanpierre