Incremental Checksums

Tags:

checksum

I am looking for a checksum algorithm where for a large block of data the checksum is equal to the sum of checksums from all the smaller component blocks. Most of what I have found is from RFCs 1624/1141 which do provide this functionality. Does anyone have any experience with these checksumming techniques or a similar one?

759

asked Jul 23 '09 18:07

Steve Severance

2 Answers

If it's just a matter of quickly combining the checksums of the smaller blocks to get to the checksums of the larger message (not necessarily by a plain summation) you can do this with a CRC-type (or similar) algorithm.

The CRC-32 algorithm is as simple as this:

uint32_t update(uint32_t state, unsigned bit)
{
    if (((state >> 31) ^ bit) & 1) state = (state << 1) ^ 0x04C11DB7;
    else                           state = (state << 1);
    return state;
}

Mathematically, the state represents a polynomial over the field GF2 that is always reduced modulo the generator polynomial. Given a new bit b the old state is transformed into the new state like this

state --> (state * x^1 + b * x^32) mod G

where G is the generator polynomial and addition is done in GF2 (xor). This checksum is linear in the sense that you can write the message M as a sum (xor) of messages A,B,C,... like this

  10110010 00000000 00000000 = A =    a     00000000 00000000
  00000000 10010001 00000000 = B = 00000000    b     00000000
  00000000 00000000 11000101 = C = 00000000 00000000    c
-------------------------------------------------------------
= 10110010 10010001 11000101 = M =    a        b        c

with the following properties

         M  =          A  +          B  +          C
checksum(M) = checksum(A) + checksum(B) + checksum(C)

Again, I mean the + in GF2 which you can implement with a binary XOR.

Finally, it's possible to compute checksum(B) based on checksum(b) and the position of the subblock b relative to B. The simple part is leading zeros. Leading zeros don't affect the checksum at all. So checksum(0000xxxx) is the same as checksum(xxxx). If you want to compute the checksum of a zero-padded (to the right -> trailing zeros) message given the checksum of the non-padded message it is a bit more complicated. But not that complicated:

zero_pad(old_check_sum, number_of_zeros)
  := ( old_check_sum *  x^{number_of_zeros}        ) mod G
   = ( old_check_sum * (x^{number_of_zeros} mod G) ) mod G

So, getting the checksum of a zero-padded message is just a matter of multiplying the "checksum polynomial" of the non-padded message with some other polynomial (x^{number_of_zeros} mod G) that only depends on the number of zeros you want to add. You could precompute this in a table or use the square-and-multiply algorithm to quickly compute this power.

Suggested reading: Painless Guide to CRC Error Detection Algorithms

141

answered Nov 09 '22 06:11

sellibitze

I have only used Adler/Fletcher checksums which work as you describe.

There is a nice comparison of crypto++ hash/checksum implementations here.

answered Nov 09 '22 06:11

Rob Elliott

Related questions
                            
                                MD5 checksums of Android APK files differ. Why?
                            
                                ICMP echo checksum
                            
                                rsync how to do a checksum on rsync
                            
                                BouncyCastle updated pgp key now getting checksum mismatch error
                            
                                Can rsync verify contents before syncing
                            
                                Single character signing scheme (minimal security)
                            
                                Error correction on a short decimal number
                            
                                A suitable hash function to detect data corruption / check for data integrity?
                            
                                Checksum verification in Hadoop
                            
                                Good choice for a lightweight checksum algorithm?
                            
                                Nvm node install checksums do not match because of a forward slash
                            
                                EAN 8 : How to calculate checksum digit?
                            
                                How should I compute files hash(md5 & SHA1) in C#
                            
                                How to calculate crc32 checksum from a string on linux bash
                            
                                Checksum in HTTP response header - why not?
                            
                                Generating Luhn Checksums
                            
                                Python, checksum of a dict
                            
                                Combining MD5 hash values
                            
                                Calculate checksum for Laboratory Information System (LIS) frames
                            
                                OpenCV 3.0 Trouble with Installation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With