I need to do a find on roughly 1500 file names and was wondering if there is a way to execute simultaneous find commands at the same time.
Right now I do something like
for fil in $(cat my_file)
do
find . -name $fil >> outputfile
done
is there a way to spawn multiple instances of find to speed up the process. Right now it takes about 7 hours to run this loop one file at a time.
Given the 7-hour runtime you mention, I presume the file system has some millions of files in it so that OS disk buffers loaded in one query are being reused before the next query begins. You can test this hypothesis by timing the same find
a few times, as in following example.
tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx 1 omg omg 9732338 Aug 1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x 1 omg omg 5144339 Apr 22 2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x 1 omg omg 2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG
real 0m15.823s
user 0m0.908s
sys 0m1.608s
tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx 1 omg omg 9732338 Aug 1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x 1 omg omg 5144339 Apr 22 2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x 1 omg omg 2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG
real 0m0.715s
user 0m0.340s
sys 0m0.368s
In the example, the second find
ran much faster because the OS still had buffers in RAM from the first find
. [On my small Linux 3.2.0-32 system, according to top
at the moment 2.5GB of RAM is buffers, 0.3GB is free, and 3.8GB in use (ie about 1.3GB for programs and OS).]
Anyhow, to speed up processing, you need to find a way to make better use of OS disk buffering. For example, double or quadruple your system memory. For an alternative, try the locate
command. The query
time locate IMG_0772.JPG
consistently takes under a second on my system. You may wish to run updatedb
just before starting the job that finds the 1500 file names. See man updatedb
. If directory .
in your find
's gives only a small part of the overall file system, so that the locate
database includes numerous irrelevant files, use various prune
options when you run updatedb
, to minimize the size of the locate
database that is accessed when you run locate
; and afterwards, run a plain updatedb
to restore other filenames to the locate
database. Using locate
you probably can cut the run time to 20 minutes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With