Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace value based on condition within groups in a dataframe

Tags:

python

pandas

My dataframe is something like

df

group cat_col
g1    r
g1    nr
g1    r
g1    nr
g2    nr
g2    nr

I need to replace "nr" for "r" whenever the group has at least 1 "r". In this case, I need it to return:

df_new

group cat_col
g1    r
g1    r
g1    r
g1    r
g2    nr
g2    nr

I know this question is elementary, but I'm stuck for hours and I didn't figure out how to solve it. Does someone know?

like image 227
rebar Avatar asked Dec 05 '25 13:12

rebar


2 Answers

Use groupby.transform:

df.cat_col.groupby(df.group).transform(lambda g: 'r' if g.eq('r').any() else g)

0     r
1     r
2     r
3     r
4    nr
5    nr
Name: cat_col, dtype: object

If only need to replace nr with r:

df.cat_col = df.cat_col.groupby(df.group).transform(
  lambda g: g.replace('nr', 'r') if g.eq('r').any() else g
)
like image 88
Psidom Avatar answered Dec 08 '25 04:12

Psidom


We can also use groupby transform to see if there are any values in each group that eq r and use this Boolean index to then replace those values with r:

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = 'r'

Or conditionally replace nr with r using the same boolean index (in case there are multiple replace values):

m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = df.loc[m, 'cat_col'].replace({'nr': 'r'})

df:

  group cat_col
0    g1       r
1    g1       r
2    g1       r
3    g1       r
4    g2      nr
5    g2      nr

Boolean index steps in a DataFrame:

steps_df = pd.DataFrame({
    # Find where cat_col is r
    'step 1': df['cat_col'].eq('r'),
    # Find groups which have an r value
    'step 2': df['cat_col'].eq('r').groupby(df['group']).transform('any')
})
   step 1  step 2
0    True    True
1   False    True
2    True    True
3   False    True
4   False   False
5   False   False

Setup (DataFrame and imports):

import pandas as pd

df = pd.DataFrame({
    'group': ['g1', 'g1', 'g1', 'g1', 'g2', 'g2'],
    'cat_col': ['r', 'nr', 'r', 'nr', 'nr', 'nr']
})
like image 41
Henry Ecker Avatar answered Dec 08 '25 04:12

Henry Ecker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!