Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Output name of column with max value per line

Tags:

awk

I have

chr pos C T A G
NC_044998.1     3732    21 0 0 0
NC_044998.1     3733    22 0 2 0
NC_044998.1     3734    22 0 5 0
NC_044998.1     3735    22 0 0 0
NC_044998.1     3736    0 0 7 0
NC_044998.1     3737    0 0 0 22
NC_044998.1     3738    20 0 0 0
NC_044998.1     3739    1 0 22 0
NC_044998.1     3740    0 22 0 0
NC_044998.1     3741    22 0 0 0

I need to output the max value in $3 to $7 per line as well as the column name associated with it.

so that I have

chr pos max ref
NC_044998.1     3732 21 C
NC_044998.1     3733 22 C
NC_044998.1     3734 22 C
NC_044998.1     3735 22 C
NC_044998.1     3736 7 A
NC_044998.1     3737 22 G
NC_044998.1     3738 20 C
NC_044998.1     3739 22 A
NC_044998.1     3740 22 T
NC_044998.1     3741 22 C 

I'm trying to adapt this:

awk 'NR == 1 {for (c = 3; c <= NF; i++) headers[c] = $c; next} {maxc=3;for(c=4;c<=NF;c++)if($c>$maxc){maxc=c} printf "max:%s, %s\n", $maxc, headers[maxc]}'

but it just output this max value

also have tried

awk '{maxc=3;for(c=4;c<=NF;c++)if($c>$maxc){maxc=c; $maxc = headers[c]} printf "max:%s, column:%s, column:%s\n",$maxc, maxc, headers[maxc]}'

Another issue I'm trying to figure is in cases where there's a tie between one or more columns. In that case I would like to print the max and the names of all columns associated.

like image 240
Madza Farias-Virgens Avatar asked Nov 01 '25 01:11

Madza Farias-Virgens


1 Answers

With your shown samples, please try following awk code, written and tested in GNU awk.

awk -v startField="3" -v endField="6" '
FNR==1{
  for(i=startField;i<=endField;i++){
    heading[i]=$i
  }
  next
}
{
  max=maxInd=""
  for(i=startField;i<=endField;i++){
    maxInd=(max<$i?i:maxInd)
    max=(max<$i?$i:max)
  }
  NF=(startField-1)
  print $0,heading[maxInd]
}
'  Input_file

Advantages of this code's approach:

  • user can mention starting field number and ending field number by using variables named startField and endField so we need NOT to change anything inside main awk code.
  • 2nd advantage is since nothing is hardcoded here, so lets say User tomorrow wants to check maximum values from 10th field to 20th field then we need NOT to print or mention explicitly 9th fields to get printed since that is taken care in code itself.

Detailed explanation: Adding detailed explanation for above.

awk -v startField="3" -v endField="6" '   ##Starting awk program and setting startField and endField to values on which user wants to look for maximum values.
FNR==1{                                   ##Checking condition if this is first line of Input_file.
  for(i=startField;i<=endField;i++){      ##Traversing through only those fields which user needs to get max value.
    heading[i]=$i                         ##Creating array heading whose index is i and value is current field value.
  }
  next                                    ##next will skip all further statements from here.
}
{
  max=maxInd=""                           ##Nullifying max and maxInd variables here.
  for(i=startField;i<=endField;i++){      ##Traversing through only those fields which user needs to get max value.
    maxInd=(max<$i?i:maxInd)              ##Getting maxInd variable to current field number if current field value is greater than maxInd else keep it as maxInd itself.
    max=(max<$i?$i:max)                   ##Getting max variable to current field value if current field value is greater than max else keep it as max itself.
  }
  NF=(startField-1)                       ##Setting NF(number of fields of current line) to startField-1 here.
  print $0,heading[maxInd]                ##printing current field followed by heading array value whose index is maxInd.
}
'  Input_file                             ##Mentioning Input_file name here. 
like image 60
RavinderSingh13 Avatar answered Nov 03 '25 22:11

RavinderSingh13



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!