Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove outliers from groups based on percentile

I have a table df like this, but longer and with many other type values.

type weight
a 35.1
a 36.7
b 100.2
b 99.3
b 102.0
b 5.0
a 38.2
a 250.8

I want to remove from df all records with outliers using the 95th percentile but broken down into individual values ​​in the type column.

For a single value of type, I do it like this:

my_perc = 95
temp = df[df['type'] == 'a']
temp[temp.weight < np.percentile(temp.weight, my_perc)]

Now I would like to do this automatically for the whole table df, taking into account individual groups in the type column.

I also tried this:

df[df.groupby(['type'])['weight'] < np.percentile(df.weight, my_perc)]

But it doesn't work.

Do you have any idea for this?

like image 440
sdom Avatar asked Nov 01 '25 21:11

sdom


1 Answers

Ok, probably problem solved:

my_perc = 0.95
df[df.groupby('type')['weight'].transform(lambda x : x < x.quantile(my_perc))]
like image 180
sdom Avatar answered Nov 04 '25 10:11

sdom