I am trying to calculate a range of percentiles (5th-99th) in Bash for a text file that contains 5 values, one per line.
Input
34.5
32.2
33.7
30.4
31.8
Attempted Code
awk '{s[NR-1]=$1} END{print s[int(0.05-0.99)]}' input
Expected Output
99th    34.5
97th    34.4
95th    34.3
90th    34.2
80th    33.9
70th    33.4
60th    32.8
50th    32.2
40th    32.0
30th    31.9
20th    31.5
10th    31.0
5th     30.7
For calculation of percentile based on 5 values, one need to create a mapping between percentiles, and to interpolate between them. A process called 'Piecewise Linear function' (a.k.a. pwlf).
F(100) = 34.5 F(75) = 33.7 F(50) = 32.2 F(25) = 31.8 F(0) = 30.4
Mapping of any other x in the range 0..100, require linear interpolation betweeh F(L), and F(H) - where L is the highest value >= x, and H=L+1.
awk '
#! /bin/env awk
  # PWLF Interpolation function, take a value, and two arrays for X & Y
function pwlf(x, px, py) {
  # Shortcut to calculate low index of X, >= p
  p_l = 1+int(x/25)
  p_h = p_l+1
  x_l = px[p_l]
  x_h = px[p_h]
  y_l = py[p_l]
  y_h = py[p_h]
#print "X=", x, p_l, p_h, x_l, x_h, y_l, y_h
  return y_l+(y_h-y_l)*(x-x_l)/(x_h-x_l)
}
  # Read f Input in yy array, setup xx
{ yy[n*25] = $1  ; n++ }
  # Print the table
END {
  # Sort values of yy
  ny = asort(yy) ;
  # Create xx array 0, 25, ..., 100
  for (i=1 ; i<=ny ; i++) xx[i]=25*(i-1)
  # Prepare list of requested results
  ns = split("99 97 95 90 80 70 60 50 40 30 20 10 5", pv)
  for (i=1 ; i<=ns ; i++) printf "%dth %.1f\n",  pv[i], pwlf(pv[i], xx, yy) ;
}
' input
Technically a bash script, but based on comments to OP, better to place the whole think into script.awk, and execute as one lines. Solution has the '#!' to invoke awk script.
/path/to/script.awk < input 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With