Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk - pull out pair columns and get the count of occurrences

Tags:

linux

awk

I have a table schema - names of the columns in a comma separated fashion. For clarity, I'll put them in one column per line as below

$ cat cols_name.txt
id
resp
x_amt
rate1
rate2
rate3
pay1
pay2
rate_r1
rate_r2
x_rate1
x_rate2
x_rate3
x_rate_r1
x_rate_r2
x_pay1
x_pay2
rev1
x_rev1

I need to find out the pairs that match column pairs ( pay1 -> x_pay1 ) and list them together as an intermediate output like below

x_rate1 rate1
x_rate2 rate2
x_rate3 rate3
x_pay1 pay1
x_pay2 pay2
x_rate_r1 rate_r1
x_rate_r2 rate_r2
x_rev1 rev1

And then finally print the frequency as

 pay 2
 rate 3
 rate_r 2
 rev 1

In my attempt to get the intermediate output, the below awk command is not working.

awk ' NR==FNR { if( $1~/^x_/ ) a[$1]=1 ; next }  $1~/"x_" a[$1]/ { print $0 } ' cols_name.txt cols_name.txt

It is not printing anything. Could you pls help to fix

like image 965
stack0114106 Avatar asked Dec 12 '25 03:12

stack0114106


1 Answers

Here is single pass awk to get it done:

 awk '/^x_/ {xk[$0]; next} {s=$0; sub(/[0-9]+$/, "", s); xv[$0]=s} END {for (i in xv) if ("x_" i in xk) {print "x_" i, i; ++fq[xv[i]]}; print "== Summary =="; for (i in fq) print i, fq[i]}' file

x_rev1 rev1
x_rate1 rate1
x_rate2 rate2
x_rate3 rate3
x_rate_r1 rate_r1
x_pay1 pay1
x_rate_r2 rate_r2
x_pay2 pay2
== Summary ==
rate_r 2
rate 3
rev 1
pay 2

A more readable form:

awk '
/^x_/ {
   xk[$0]
   next
}
{
   s = $0
   sub(/[0-9]+$/, "", s)
   xv[$0] = s
}
END {
   for (i in xv)
      if ("x_" i in xk) {
         print "x_" i, i
         ++fq[xv[i]]
      }
   print "== Summary =="
   for (i in fq)
      print i, fq[i]
}' file
like image 143
anubhava Avatar answered Dec 14 '25 15:12

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!