I am using the PIGZ library. https://zlib.net/pigz/
I compressed large files using multiple threads per file with this library and now I want to decompress those files using multiple threads per file too. As per the documentation:
Decompression can’t be parallelized, at least not without specially prepared deflate streams for that purpose.
However, the documentation doesn't specify how to do that, and I'm finding it difficult to find information on this.
How would I create these "specically prepared deflate streams" that PIGZ can utilise for decompression?
pigz does not currently support parallel decompression, so it wouldn't help to specially prepare such a deflate stream.
The main reason this has not been implemented is that, in most situations, decompression is fast enough to be i/o bound, not processor bound. This is not the case for compression, which can be much slower than decompression, and where parallel compression can speed things up quite a bit.
You could write your own parallel decompressor using zlib and pthread. pigz 2.3.4 and later will in fact make a specially prepared stream for parallel decompression by using the --independent (-i) option. That makes the blocks independently decompressible, and puts two sync markers in front of each to make it possible to find them quickly by scanning the compressed data.The uncompressed size of a block is set with --blocksize or -b. You might want to make that size larger than the default, e.g. 1M instead of 128K, to reduce the compression impact of using -i. Some testing will tell you how much your compression is reduced by using -i.
(By the way, pigz is not a library, it is a command-line utility.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With