I am using Java to generate the MD5 hash for some files. I need to generate one MD5 for several files with a total size of about 1 gigabyte. Here's my code:
private String generateMD5(SequenceInputStream inputStream){
    if(inputStream==null){
        return null;
    }
    MessageDigest md;
    try {
        int read =0;
        byte[] buf = new byte[2048];
        md = MessageDigest.getInstance("MD5");
        while((read = inputStream.read(buf))>0){
            md.update(buf,0,read);
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    } catch (NoSuchAlgorithmException e) {
        return null;
    } catch (IOException e) {
        return null;
    }finally{
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {
            // ...
        }
    } 
}
This seems to run forever. How can I make it more efficient?
It generally takes 3-4 hours to transfer via NC and then 40 minutes to get the md5sum. The security of the hash is not an issue in this case.
Although originally designed as a cryptographic message authentication code algorithm for use on the internet, MD5 hashing is no longer considered reliable for use as a cryptographic checksum because security experts have demonstrated techniques capable of easily producing MD5 collisions on commercial off-the-shelf ...
You may want to use the Fast MD5 library. It's much faster than Java's built-in MD5 provider and getting a hash is as simple as:
String hash = MD5.asHex(MD5.getHash(new File(filename)));
Be aware that the slow speed may also be due to slow File I/O.
I rewrite your code with nio, the code is somewhat like below:
private static String generateMD5(FileInputStream inputStream){
    if(inputStream==null){
        return null;
    }
    MessageDigest md;
    try {
        md = MessageDigest.getInstance("MD5");
        FileChannel channel = inputStream.getChannel();
        ByteBuffer buff = ByteBuffer.allocate(2048);
        while(channel.read(buff) != -1)
        {
            buff.flip();
            md.update(buff);
            buff.clear();
        }
        byte[] hashValue = md.digest();
        return new String(hashValue);
    }
    catch (NoSuchAlgorithmException e)
    {
        return null;
    } 
    catch (IOException e) 
    {
        return null;
    }
    finally
    {
        try {
            if(inputStream!=null)inputStream.close();
        } catch (IOException e) {
        }
    } 
}
On my machine, it takes about 30s to generate md5 code for a large file, and of course i test your code as well, the result indicates that nio doesn't improve the performance of the program.
Then, i try to get the time for io and md5 respectively, the statistics indicates that the slow file io is the bottleneck because about 5/6 of time is taken for io.
By using the Fast MD5 library mentioned by @Sticky, it takes only 15s to generate md5 code, the improvement is remarkable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With