Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash script: find max, group by and sort by column

Tags:

bash

shell

I have a file that looks like this:

b, 20, 434
a, 20, 23
a, 10, 123
a, 20, 423
a, 10, 89
b, 20, 88
b, 10, 133
b, 10, 99
  1. Find the max of col 3 for every unique combination of col1 and col2. (e.g. max of col3 for all a,10)
  2. Group the output by col1 (all a rows together)
  3. Sort the output by col2.

That is, the output should be file should be

a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

How can I do this in a bash script? Thanks for your help.

like image 988
jitihsk Avatar asked Oct 28 '25 01:10

jitihsk


2 Answers

This does the job:

< input sort -k3,3gr | sort -k1,1 -k2,2g -u

It sorts numerically in reverse order on the third field and then sorts on the first and second field taking just the first occurrence (-u for unique).

You do not need padding, i.e. if you add to the input a line like

a, 3, 31

The output is:

a, 3, 31
a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434
like image 64
Gismo Ranas Avatar answered Oct 30 '25 15:10

Gismo Ranas


This will slightly modify the whitespace, but perhaps that is acceptable:

awk '$3 > a[$1,$2] { a[$1,$2] = $3 } END {for( k in a) print k a[k]}' input |
    sort -n -t, -k1,1 -k2,2

But that solution is highly dependent on the whitespace in the input, so it would probably be better to do something like:

awk '$3 > a[$1","$2] { a[$1","$2] = $3 } 
    END {for( k in a) print k "," a[k]}' FS=, input |
    sort -n -t, -k1,1 -k2,2
like image 42
William Pursell Avatar answered Oct 30 '25 13:10

William Pursell