Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine sha hashes?

I have ordered chunks of data, each hashed individually with sha256. I want to combine those hashes into one sha256 hash. Should I just feed the hashes into sha256 as data, or is there another way that's better from a math/crypto standpoint? It might seem like a trivial question, but intuitions are often wrong when it comes to crypto.

edit: The purpose of this is to form a sort of a blockchain although that term is pretty overloaded these days. It's for integrity purposes, not proof of work. The idea is hash the blocks at the follower nodes, combine the hashes into one on the cluster leader to have a hash representing the chain as a whole, and then prepend that to the new blocks to be hashed.

It's a little odd in that it's a distributed system so "whole chain hash" is usually a little stale so I know what the hash representing the chain, as known to that node, when the block was created at that node, but there could be several blocks that "hook onto the chain" at that particular hash, and then those are ordered and combined into the system hash which gets prepended to new blocks eventually.

I'm using Go, if that matters.

like image 488
Eloff Avatar asked Oct 14 '25 15:10

Eloff


1 Answers

If you are trying to recreate the hash of a large payload (e.g. 1GB file) that has been split into chunks (e.g. 10MBs in size), the hash (MD5, SHA-256 etc.) needs to be computed on the entire collection. So using this example, you cannot add the 100 chunked hashes to recreate the hash of the original file. However...

You could send 2 values with each chunk:

  • the individual chunk's hash (like you are doing now)
  • the intermediate hash state, as your service sweeps through the file to create each chunk payload: at the beginning and end of the chunk

As the chunks are streamed in, one can verify the seams of the hash state at the end of chunk N matches that of the hash state at the beginning of chunk N+1.

The final hash state of the final chunk will be the hash for the entire payload.

Why do it like this? Because the hash can be computed in realtime as the file chunks are received - rather as a separate time-consuming process - after all the file chunks have been received.


Edit: based on comments:

Here's a crude state hash state solution:

Create a large random file (100MB):

dd if=/dev/urandom of=large.bin bs=1048576 count=100

Using an external tool to verify hash:

$ shasum -a 256 large.bin 
4cc76e41bbd82a05f97fc03c7eb3d1f5d98f4e7e24248d7944f8caaf8dc55c5c  large.bin

Running this playground code on the above file.

...
...
...
offset: 102760448   hash: 8ae7928735716a60ae0c4e923b8f0db8f33a5b89f6b697093ea97f003c85bb56  state: 736861032a24f8927fc4aa17527e1919aba8ea40c0407d5452c752a82a99c06149fd8d35000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006200000
offset: 103809024   hash: fbbfd2794cd944b276a04a89b49a5e2c8006ced9ff710cc044bed949fee5899f  state: 73686103bdde167db6a5b09ebc69a5abce51176e635add81e190aa64edceb280f82d6c08000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006300000
offset: 104857600   hash: 4cc76e41bbd82a05f97fc03c7eb3d1f5d98f4e7e24248d7944f8caaf8dc55c5c  state: 73686103c29dbc4aaaa7aa1ce65b9dfccbf0e3a18a89c95fd50c1e02ac1c73271cfdc3e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006400000

the final hash matches.

Trying with an offset and intermediate hash-state. The file will be seeked to this offset, resuming the hash calculation from that point:

$ ./hash -o 102760448 -s "736861032a24f8927fc4aa17527e1919aba8ea40c0407d5452c752a82a99c06149fd8d35000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006200000"
offset:  103809024  hash: fbbfd2794cd944b276a04a89b49a5e2c8006ced9ff710cc044bed949fee5899f  state: 73686103bdde167db6a5b09ebc69a5abce51176e635add81e190aa64edceb280f82d6c08000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006300000
offset:  104857600  hash: 4cc76e41bbd82a05f97fc03c7eb3d1f5d98f4e7e24248d7944f8caaf8dc55c5c  state: 73686103c29dbc4aaaa7aa1ce65b9dfccbf0e3a18a89c95fd50c1e02ac1c73271cfdc3e0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006400000

we get the same final hash as before.

Note: this does expose the hash internal state, so be mindful of the security implications this may entail. With a large chunk-size, this should not be an issue.

like image 128
colm.anseo Avatar answered Oct 17 '25 11:10

colm.anseo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!