I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out.
I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel.
Are any of the mainstream hash algorithms parallelizable?
 Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)?  
As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?
Some common hashing algorithms include MD5, SHA-1, SHA-2, NTLM, and LANMAN. MD5: This is the fifth version of the Message Digest algorithm. MD5 creates 128-bit outputs. MD5 was a very commonly used hashing algorithm.
Probably the one most commonly used is SHA-256, which the National Institute of Standards and Technology (NIST) recommends using instead of MD5 or SHA-1. The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits.
SHA-256 is the most famous of all cryptographic hash functions because it's used extensively in blockchain technology. SHA-256 Hashing algorithm was developed by the National Security Agency (NSA) in 2001.
Regardless of use, NIST encourages application and protocol designers to use the SHA-2 family of hash functions for all new applications and protocols.
There is actually a lot of research going on in this area. The US National Institute of Standards and Technology is currently holding a competition to design the next-generation of government-grade hash function. Most of the proposals for that are parallelizable.
One example: http://www.schneier.com/skein1.2.pdf
Wikipedia's description of current status of the contest: http://en.wikipedia.org/wiki/SHA-3
What kind of SSD do you have ? My C implementation of MD5 runs at 400 MB/s on a single Intel Core2 core (2.4 GHz, not the latest Intel). Do you really have SSD which support a bandwidth of 1.6 GB/s ? I want the same !
Tree hashing can be applied on any hash function. There are a few subtleties and the Skein specification tries to deal with them, integrating some metadata in the function itself (this does not change much things for performance), but the "tree mode" of Skein is not "the" Skein as submitted to SHA-3. Even if Skein is selected as SHA-3, the output of a tree-mode hash would not be the same as the output of "plain Skein".
Hopefully, a standard will be defined at some point, to describe generic tree hashing. Right now there is none. However, some protocols have been defined with support for a custom tree hashing with the Tiger hash function, under the name "TTH" (Tiger Tree Hash) or "THEX" (Tree Hash Exchange Format). The specification for TTH appears to be a bit elusive; I find some references to drafts which have either moved or disappeared for good.
Still, I am a bit dubious about the concept. It is kind of neat, but provides a performance boost only if you can read data faster than what a single core can process, and, given the right function and the right implementation, a single core can hash quite a lot of data per second. A tree hash spread over several cores requires having the data sent to the proper cores, and 1.6 GB/s is not the smallest bandwidth ever.
SHA-256 and SHA-512 are not very fast. Among the SHA-3 candidates, assuming an x86 processor in 64-bit mode, some of them achieve high speed (more than 300 MB/s on my 2.4 GHz Intel Core2 Q6600, with a single core -- that's what I can get out of SHA-1, too), e.g. BMW, SHABAL or Skein. Cryptographically speaking, these designs are a bit too new, but MD5 and SHA-1 are already cryptographically "broken" (quite effectively in the case of MD5, rather theoretically for SHA-1) so any of the round-2 SHA-3 candidates should be fine.
When I put my "seer" cap, I foresee that processors will keep on becoming faster than RAM, to the point that hashing cost will be dwarfed out by memory bandwidth: the CPU will have clock cycles to spare while it waits for the data from the main RAM. At some point, the whole threading model (one big RAM for many cores) will have to be amended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With