I'm trying to jury-rig the Amazon S3 python library to allow chunked handling of large files. Right now it does a "self.body = http_response.read()", so if you have a 3G file you're going to read the entire thing into memory before getting any control over it.
My current approach is to try to keep the interface for the library the same but provide a callback after reading each chunk of data. Something like the following:
data = []
while True:
chunk = http_response.read(CHUNKSIZE)
if not chunk:
break
if callback:
callback(chunk)
data.append(chunk)
Now I need to do something like:
self.body = ''.join(data)
Is join the right way to do this or is there another (better) way of putting all the chunks together?
''join() is the best method for joining chunks of data. The alternative boils down to repeated concatenation, which is O(n**2) due to the immutability of strings and the need to create more at every concatenation. Given, this repeated concatenation is optimized by recent versions of CPython if used with += to become O(n), but that optimization only gives it a rough equivalent to ''.join() anyway, which is explicitly O(n) over the number of bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With