I have a table schema - names of the columns in a comma separated fashion. For clarity, I'll put them in one column per line as below
$ cat cols_name.txt
id
resp
x_amt
rate1
rate2
rate3
pay1
pay2
rate_r1
rate_r2
x_rate1
x_rate2
x_rate3
x_rate_r1
x_rate_r2
x_pay1
x_pay2
rev1
x_rev1
I need to find out the pairs that match column pairs ( pay1 -> x_pay1 ) and list them together as an intermediate output like below
x_rate1 rate1
x_rate2 rate2
x_rate3 rate3
x_pay1 pay1
x_pay2 pay2
x_rate_r1 rate_r1
x_rate_r2 rate_r2
x_rev1 rev1
And then finally print the frequency as
pay 2
rate 3
rate_r 2
rev 1
In my attempt to get the intermediate output, the below awk command is not working.
awk ' NR==FNR { if( $1~/^x_/ ) a[$1]=1 ; next } $1~/"x_" a[$1]/ { print $0 } ' cols_name.txt cols_name.txt
It is not printing anything. Could you pls help to fix
Here is single pass awk to get it done:
awk '/^x_/ {xk[$0]; next} {s=$0; sub(/[0-9]+$/, "", s); xv[$0]=s} END {for (i in xv) if ("x_" i in xk) {print "x_" i, i; ++fq[xv[i]]}; print "== Summary =="; for (i in fq) print i, fq[i]}' file
x_rev1 rev1
x_rate1 rate1
x_rate2 rate2
x_rate3 rate3
x_rate_r1 rate_r1
x_pay1 pay1
x_rate_r2 rate_r2
x_pay2 pay2
== Summary ==
rate_r 2
rate 3
rev 1
pay 2
A more readable form:
awk '
/^x_/ {
xk[$0]
next
}
{
s = $0
sub(/[0-9]+$/, "", s)
xv[$0] = s
}
END {
for (i in xv)
if ("x_" i in xk) {
print "x_" i, i
++fq[xv[i]]
}
print "== Summary =="
for (i in fq)
print i, fq[i]
}' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With