I have the following GNU parallel command.
parallel --gnu --jobs 4 \
normalize-by-median.py \
-k 20 -C 20 --paired -N 4 -x 6e9 \
--out pdom-{}-diginorm.fq \
pdom-{}.fq.gz \
::: 200bp 500bp 1kb 3kb 8kb
I would like to compress the output before writing to disk. Normally I would just pipe this to gzip -c, but unfortunately this particular Python script does not have the option to send output to stdout. I then thought I could use process substitution instead. I tried the following.
parallel --gnu --jobs 4 \
normalize-by-median.py \
-k 20 -C 20 --paired -N 4 -x 6e9 \
--out >(gzip -c - > pdom-{}-diginorm.fq.gz) \
pdom-{}.fq.gz \
::: 200bp 500bp 1kb 3kb 8kb
However, the curly braces in this latter example get interpreted literally by the subprocess, rather than as a placeholder for the GNU parallel arguments. Is there any way I can get this to work?
Your problem is that >() is interpreted before GNU Parallel even starts. So you need to quote that to give it to GNU Parallel:
parallel --gnu --jobs 4 \
normalize-by-median.py \
-k 20 -C 20 --paired -N 4 -x 6e9 \
--out '>(gzip -c - > pdom-{}-diginorm.fq.gz)' \
pdom-{}.fq.gz \
::: 200bp 500bp 1kb 3kb 8kb
With version >20140822 you can do this:
parallel --plus --gnu --jobs 4 \
normalize-by-median.py \
-k 20 -C 20 --paired -N 4 -x 6e9 \
--out '>(gzip > {..}-diginorm.fq.gz)' \
{} \
::: pdom-*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With