How is the gzip file size encoded?

Question

The gzip file format contains the (uncompressed/original) file size encoded in the last 4 bytes of the compressed file. The "gzip -l" command reports the compressed and uncompressed sizes, the compression ratio, the original filename.

Looking around stackoverflow, there are a couple of mentions of decoding the size encoded in the last 4 bytes.

What is the encoding of the size? Big-endian (most significant byte first), Little-endian (least significant byte first), and is the value signed or unsigned?

This code snippet seems to be working for me,

FILE* fh; //assume file handle opened
unsigned char szbuf[4];
struct stat statbuf;
fstat(fn,&statbuf);
unsigned long clen=statbuf.st_size;
fseek(fh,clen-4,SEEK_SET);
int count=fread(szbuf,1,4,fh);
unsigned long ulen = ((((((szbuf[4-1] << 8) | szbuf[3-1]) << 8) | szbuf[2-1]) << 8) | szbuf[1-1]);

Here are a couple of related posts, which seem to imply little-endian, and unsigned long (0..4GB-1).

Determine uncompressed size of GZIP file

GZIPOutputStream not updating Gzip size bytes

Determine size of file in gzip

Gzip.org has more information about Gzip

Medinoc · Accepted Answer

RFC says it's modulo 2^32 which means uint32_t, and experimentation using a .Net GZipStream gives it as little-endian.

RFC 1952

How is the gzip file size encoded?

Tags:

c++

c

encoding

gzip

ChuckCottrill

1 Answers

Medinoc

Recent Activity

Donate For Us

How is the gzip file size encoded?

Tags:

c++

c

encoding

gzip

ChuckCottrill

1 Answers

Medinoc

Related questions

Recent Activity

Donate For Us