I use that bash command to search files and execute md5dsum on my local system. In my opinion this command has bad performance on large vendor directories. Is there a better style instead of using pipe after pipe with higher performance?
find ./vendor -type f -print0 | sort -z | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore > MD5sums
sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.
You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:
find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums
md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.
You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:
stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums
The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches (and may be remove both stdbuf -oL completely), for large files - smaller.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With