The gzip file format contains the (uncompressed/original) file size encoded in the last 4 bytes of the compressed file. The "gzip -l" command reports the compressed and uncompressed sizes, the compression ratio, the original filename.
Looking around stackoverflow, there are a couple of mentions of decoding the size encoded in the last 4 bytes.
What is the encoding of the size? Big-endian (most significant byte first), Little-endian (least significant byte first), and is the value signed or unsigned?
This code snippet seems to be working for me,
FILE* fh; //assume file handle opened
unsigned char szbuf[4];
struct stat statbuf;
fstat(fn,&statbuf);
unsigned long clen=statbuf.st_size;
fseek(fh,clen-4,SEEK_SET);
int count=fread(szbuf,1,4,fh);
unsigned long ulen = ((((((szbuf[4-1] << 8) | szbuf[3-1]) << 8) | szbuf[2-1]) << 8) | szbuf[1-1]);
Here are a couple of related posts, which seem to imply little-endian, and unsigned long (0..4GB-1).
Determine uncompressed size of GZIP file
GZIPOutputStream not updating Gzip size bytes
Determine size of file in gzip
Gzip.org has more information about Gzip
RFC says it's modulo 2^32 which means uint32_t, and experimentation using a .Net GZipStream gives it as little-endian.
RFC 1952
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With