We use a script that prints bash commands into a file that is then run on an HPC system. It is supposed to run through a large text file containing geographic coordinates separated by whitespace and extract a specific region from that file (e.g. extract all lines with an x coordinate between xmin and xmax and an y coordinate between ymin and ymax).
Ideally, I'd like to use awk for that like so (from memory since I don't have my computer available at the moment):
awk -v xmin=-13000 -v xmax=13000 -v ymin=-500 -v ymax=500 -F ' ' {if ($1 > xmin && $1 < xmin && $2 > ymin && $2 < ymin) print $1 $2} $infile > $outfile
That would probably execute fine. However, as suggested by the title, we save this line indirectly for 25 regions, each with their own xmin, xmax etc. There are more operations following after that (using GMT calls etc). Here's a little snippet:
xmin=-13000
xmax=13000
ymin=-500
ymax=500
infile=./full_file.txt
outfile=./filtered_file.yxy
srcfile=./region_1.txt
echo """awk -v xmin=$xmin -v xmax=$xmax -v ymin=$ymin -v ymax=$ymax -F ' ' {if ($1 > $xmin && $1 < $xmin && $2 > $ymin && $2 < $ymin) print $1 $2} $infile > $outfile""" >> $srcfile
Obviously, this raises errors when running due to variable expansion. I've tried escaping the awk column identifiers but to no avail or didn't understand the pattern correctly. Could someone point me to a solution that allows us to keep the indirect approach?
IIUC, you have to either escape each dollar sign like that:
{if (\$1 > xmin && \$1 < xmin
or temporarily close a double quote and put a dollar sign in a single quote:
"{if ("'$1'" > xmin && "'$1'" < xmin"
or use Bash specific %q
printf specifier:
$ read
awk -v xmin=-13000 -v xmax=13000 -v ymin=-500 -v ymax=500 -F ' ' {if ($1 > xmin && $1 < xmin && $2 > ymin && $2 < ymin) print $1 $2} $infile > $outfile
$ printf "%q\n" "$REPLY"
awk\ -v\ xmin=-13000\ -v\ xmax=13000\ -v\ ymin=-500\ -v\ ymax=500\ -F\ \'\ \'\ \{if\ \(\$1\ \>\ xmin\ \&\&\ \$1\ \<\ xmin\ \&\&\ \$2\ \>\ ymin\ \&\&\ \$2\ \<\ ymin\)\ print\ \$1\ \$2\}\ \$infile\ \>\ \$outfile
$ echo awk\ -v\ xmin=-13000\ -v\ xmax=13000\ -v\ ymin=-500\ -v\ ymax=500\ -F\ \'\ \'\ \{if\ \(\$1\ \>\ xmin\ \&\&\ \$1\ \<\ xmin\ \&\&\ \$2\ \>\ ymin\ \&\&\ \$2\ \<\ ymin\)\ print\ \$1\ \$2\}\ \$infile\ \>\ \$outfile
awk -v xmin=-13000 -v xmax=13000 -v ymin=-500 -v ymax=500 -F ' ' {if ($1 > xmin && $1 < xmin && $2 > ymin && $2 < ymin) print $1 $2} $infile > $outfile
And also I think it would be good to enclose awk code in '
if you don't want shell to expand variables.
Creating a separate temporary script seems superfluous. Just loop over the parameters.
while read -r xmin xmax ymin ymax\
infile outfile
do
awk -v xmin="$xmin" -v xmax="$xmax" -v ymin="$ymin" -v ymax="$ymax" \
'$1 > xmin && $1 < xmax && $2 > ymin && $2 < ymax { print $1 $2 }' "$infile" > "$outfile"
done <<____
-13000 13000 -500 500 full_file.txt filtered_file.yxy
17 42 19 21 littlefile.txt other.yxy
-27350 27350 -123 123 another.txt moar.yxy
____
The ____
is just a cute alternative to the more conventional EOF
heredoc delimiter. The lines in the here document should each be one set of values for the variables in the read
.
If you really want to print each snippet to a separate file (perhaps to submit each to run on a different cluster node, for example), maybe learn to use printf
instead of echo
.
while read -r xmin xmax ymin ymax\
infile outfile srcfile
do
printf 'awk -v xmin="%i" -v xmax="%i" -v ymin="%i" -v ymax="%i" \
'"'"'$1 > xmin && $1 < xmax && $2 > ymin && $2 < ymax { print $1 $2 }'"'"' "./%s" > "./%s"\n' \
"$xmin" "$xmax" "$ymin" "$ymax" "$infile" "$outfile" >>"./$srcfile"
done <<____
-13000 13000 -500 500 full_file.txt filtered_file.yxy region1.txt
17 42 19 21 littlefile.txt other.yxy region2.txt
-27350 27350 -123 123 another.txt moar.yxy region3.txt
____
(though printing commands to .txt
files is still really weird).
For what it's worth, the triple quotes in your attempt do nothing useful. Python (for example) has this syntax, but in the shell, """
simply parses into an empty string inside a pair of quotes ""
followed by an opening double quote "
.
Similarly, the printf
example above demonstrates one way to produce a literal single quote inside a single-quoted string. 'foo'"'"'bar'
is (single-quoted) foo
next to double-quoted '
next to single-quoted bar
, which when pasted together produces foo'bar
.
I also slightly refactored your Awk script to make it more idiomatic, and fixed missing quoting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With