I have a list of configuration files:
cfg1.cfg
cfg2.cfg
cfg3.cfg
cfg4.cfg
cfg5.cfg
cfg6.cfg
cfg7.cfg
...
that serve as input for two scripts:
script1.sh
script2.sh
which I run sequentially as follows:
script1.sh cfgX.cfg && script2.sh cfgX.cfg
where X=1, 2, 3, ...
These scripts are not parallelised and take a long time to run. How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?
For just one script I tried a brute force approach similar to:
export COUNTER_LIMIT=4
export COUNTER=1
for each in $(ls *.cfg)
do
INSTRUCTION="./script1.sh $each "
if (($COUNTER >= $COUNTER_LIMIT)) ;
then
$INSTRUCTION &&
export COUNTER=$(($COUNTER-$COUNTER_LIMIT));
echo
sleep 600s
else
$INSTRUCTION &
sleep 5s
fi
echo $COUNTER
export COUNTER=$(($COUNTER+1));
done
(the sleeps are because for some reason the scripts cannot be initiated at the same time...)
So, ho can I do so that the double ampersands in
script1.sh cfgX.cfg && script2.sh cfgX.cfg
dont' block the brute force parallelisation?
I also accept better and simpler approaches ;)
Cheers jorge
UPDATE
I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.
parallel --jobs 4 \
--load 50% \
--bar \
--eta "( echo 1st-for-{}; echo 2nd-for-{} )" < aListOfAdHocArguments.txt
0% 0:5=0s
1st-for-Abraca
2nd-for-Abraca
20% 1:4=0s
1st-for-Dabra
2nd-for-Dabra
40% 2:3=0s
1st-for-Hergot
2nd-for-Hergot
60% 3:2=0s
1st-for-Fagot
2nd-for-Fagot
80% 4:1=0s
100% 5:0=0s
Q : How can I launch them in parallel, let's say 4 at the time, so I do not kill the server where I run them?
A lovely task for GNU parallel.
First let's check the localhost ecosystem ( exosystems, executing parallel-jobs over ssh-connected remote-hosts possible, yet exceed the scope of this post ) :
parallel --number-of-cpus
parallel --number-of-cores
parallel --show-limits
For more configuration details beyond the --jobs 4, potentially --memfree or --noswap, --load <max-load> or --keep-order and --results <aFile> or --output-as-files :
man parallel
parallel --jobs 4 \
--bar \
--eta "( script1.sh cfg{}.cfg; script2.sh cfg{}.cfg )" ::: {1..123}
Here,
emulated by a just pair of tandem echo-s for down-counted indexes, so progress-bars are invisible and Estimated-Time-of-Arrival --eta indications are almost instant... :
parallel --jobs 4 \
--load 50% \
--bar \
--eta "( echo 1st-for-cfg-{}; echo 2nd-for-cfg-{} )" ::: {10..0}
0% 0:11=0s 7
1st-for-cfg-10
2nd-for-cfg-10
9% 1:10=0s 6
1st-for-cfg-9
2nd-for-cfg-9
18% 2:9=0s 5
1st-for-cfg-8
2nd-for-cfg-8
27% 3:8=0s 4
1st-for-cfg-7
2nd-for-cfg-7
36% 4:7=0s 3
1st-for-cfg-6
2nd-for-cfg-6
45% 5:6=0s 2
1st-for-cfg-5
2nd-for-cfg-5
54% 6:5=0s 1
1st-for-cfg-4
2nd-for-cfg-4
63% 7:4=0s 0
1st-for-cfg-3
2nd-for-cfg-3
72% 8:3=0s 0
1st-for-cfg-2
2nd-for-cfg-2
81% 9:2=0s 0
1st-for-cfg-1
2nd-for-cfg-1
90% 10:1=0s 0
1st-for-cfg-0
2nd-for-cfg-0
You added:
I should have mentioned that the config files are not necessarily sequentially named and can have any name, I just made them like this to make the example as simple as possible.
The < list_of_arguments solves this ex-post changed problem definition:
parallel [options] [command [arguments]] < list_of_arguments
This would be fairly simple with find and xargs. This would run four processes in parallel, and for any given config file will complete script1.sh before running script2.sh:
find . -name '*.cfg' -print0 | xargs -0 -P 4 -iCFG sh -c 'script1.sh CFG && script2.sh CFG'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With