I have an array of byte-strings in python3 (it's an audio chunks). I want to make one big byte-string from it. Simple implementation is kind of slow. How to do it better?
chunks = []
while not audio.ends():
chunks.append( bytes(audio.next_buffer()) )
do_some_chunk_processing()
all_audio=b''
for ch in chunks:
all_audio += ch
How to do it faster?
Use bytearray()
from time import time
c = b'\x02\x03\x05\x07' * 500 # test data
# Method-1 with bytes-string
bytes_string = b''
st = time()
for _ in range(10**4):
bytes_string += c
print("string concat -> took {} sec".format(time()-st))
# Method-2 with bytes-array
bytes_arr = bytearray()
st = time()
for _ in range(10**4):
bytes_arr.extend(c)
# convert byte_arr to bytes_string via
bytes_string = bytes(bytes_arr)
print("bytearray extend/concat -> took {} sec".format(time()-st))
benchmark in my Win10|Corei7-7th Gen shows:
string concat -> took 67.28 sec
bytearray extend/concat -> took 0.089 sec
the code is pretty self-explanatory. instead of using string+=next_block, use bytearray.extend(next_block). After building bytearray you can use bytes(bytearray) to get the bytes-string.
One approach you could try and measure would be to use bytes.join:
all_audio = b''.join(chunks)
The reason this might be faster is that this does a pre-pass over the chunks to find out how big all_audio needs to be, allocates exactly the right size once, then concatenates it in one go.
Reference
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With