My dataframe is something like
df
group cat_col
g1 r
g1 nr
g1 r
g1 nr
g2 nr
g2 nr
I need to replace "nr" for "r" whenever the group has at least 1 "r". In this case, I need it to return:
df_new
group cat_col
g1 r
g1 r
g1 r
g1 r
g2 nr
g2 nr
I know this question is elementary, but I'm stuck for hours and I didn't figure out how to solve it. Does someone know?
Use groupby.transform:
df.cat_col.groupby(df.group).transform(lambda g: 'r' if g.eq('r').any() else g)
0 r
1 r
2 r
3 r
4 nr
5 nr
Name: cat_col, dtype: object
If only need to replace nr with r:
df.cat_col = df.cat_col.groupby(df.group).transform(
lambda g: g.replace('nr', 'r') if g.eq('r').any() else g
)
We can also use groupby transform to see if there are any values in each group that eq r and use this Boolean index to then replace those values with r:
m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = 'r'
Or conditionally replace nr with r using the same boolean index (in case there are multiple replace values):
m = df['cat_col'].eq('r').groupby(df['group']).transform('any')
df.loc[m, 'cat_col'] = df.loc[m, 'cat_col'].replace({'nr': 'r'})
df:
group cat_col
0 g1 r
1 g1 r
2 g1 r
3 g1 r
4 g2 nr
5 g2 nr
Boolean index steps in a DataFrame:
steps_df = pd.DataFrame({
# Find where cat_col is r
'step 1': df['cat_col'].eq('r'),
# Find groups which have an r value
'step 2': df['cat_col'].eq('r').groupby(df['group']).transform('any')
})
step 1 step 2
0 True True
1 False True
2 True True
3 False True
4 False False
5 False False
Setup (DataFrame and imports):
import pandas as pd
df = pd.DataFrame({
'group': ['g1', 'g1', 'g1', 'g1', 'g2', 'g2'],
'cat_col': ['r', 'nr', 'r', 'nr', 'nr', 'nr']
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With