I have a dataframe
df = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["D",0,"rem"],["C",2,"rem"],["D",0,"rem"],["D",0,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
D 0 rem
C 2 rem
D 0 rem
D 0 rem
Remove the id group where val1 is 0 in all the rows of group and val2 is rem in all the rows of group. Here for id D, val1 is 0 for all the rows and val2 is rem for all the rows so remove D id.
Expected Output
df_out = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["C",2,"rem"]],columns=["id","val1","val2"])
id val1 val2
A 0 ret
C 2 rem
B 1 ret
A 0 rem
B 0 rem
C 2 rem
How to do it in pandas?
You can use boolean indexing with two masks:
# is there at least one non 0 per group?
m1 = df['val1'].ne(0).groupby(df['id']).transform('any')
# is there at least one non-rem?
m2 = df['val2'].ne('rem').groupby(df['id']).transform('any')
# keep is any is True
out = df[m1|m2]
Output:
id val1 val2
0 A 0 ret
1 C 2 rem
2 B 1 ret
3 A 0 rem
4 B 0 rem
6 C 2 rem
Intermediates:
id val1 val2 m1 m2
0 A 0 ret False True
1 C 2 rem True False
2 B 1 ret True True
3 A 0 rem False True
4 B 0 rem True True
5 D 0 rem False False
6 C 2 rem True False
7 D 0 rem False False
8 D 0 rem False False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With