bash script: find max, group by and sort by column

Question

I have a file that looks like this:

b, 20, 434
a, 20, 23
a, 10, 123
a, 20, 423
a, 10, 89
b, 20, 88
b, 10, 133
b, 10, 99

Find the max of col 3 for every unique combination of col1 and col2. (e.g. max of col3 for all a,10)
Group the output by col1 (all a rows together)
Sort the output by col2.

That is, the output should be file should be

a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

How can I do this in a bash script? Thanks for your help.

Gismo Ranas · Accepted Answer

This does the job:

< input sort -k3,3gr | sort -k1,1 -k2,2g -u

It sorts numerically in reverse order on the third field and then sorts on the first and second field taking just the first occurrence (-u for unique).

You do not need padding, i.e. if you add to the input a line like

a, 3, 31

The output is:

a, 3, 31
a, 10, 123
a, 20, 423
b, 10, 133
b, 20, 434

William Pursell · Answer

This will slightly modify the whitespace, but perhaps that is acceptable:

awk '$3 > a[$1,$2] { a[$1,$2] = $3 } END {for( k in a) print k a[k]}' input |
    sort -n -t, -k1,1 -k2,2

But that solution is highly dependent on the whitespace in the input, so it would probably be better to do something like:

awk '$3 > a[$1","$2] { a[$1","$2] = $3 } 
    END {for( k in a) print k "," a[k]}' FS=, input |
    sort -n -t, -k1,1 -k2,2

bash script: find max, group by and sort by column

Tags:

bash

shell

jitihsk

2 Answers

Gismo Ranas

William Pursell

Recent Activity

Donate For Us

bash script: find max, group by and sort by column

Tags:

bash

shell

jitihsk

2 Answers

Gismo Ranas

William Pursell

Related questions

Recent Activity

Donate For Us