Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select the dataframe based on multiple conditions on a group like all values in a column are 0 and value = x in another column in pandas

I have a dataframe

df = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["D",0,"rem"],["C",2,"rem"],["D",0,"rem"],["D",0,"rem"]],columns=["id","val1","val2"])
id val1 val2
A   0   ret
C   2   rem
B   1   ret
A   0   rem
B   0   rem
D   0   rem
C   2   rem
D   0   rem
D   0   rem

Remove the id group where val1 is 0 in all the rows of group and val2 is rem in all the rows of group. Here for id D, val1 is 0 for all the rows and val2 is rem for all the rows so remove D id.

Expected Output

df_out = pd.DataFrame([["A",0,"ret"],["C",2,"rem"],["B",1,"ret"],["A",0,"rem"],["B",0,"rem"],["C",2,"rem"]],columns=["id","val1","val2"])
id val1 val2
A   0   ret
C   2   rem
B   1   ret
A   0   rem
B   0   rem
C   2   rem

How to do it in pandas?

like image 433
Chethan Avatar asked Dec 12 '25 11:12

Chethan


1 Answers

You can use boolean indexing with two masks:

# is there at least one non 0 per group?
m1 = df['val1'].ne(0).groupby(df['id']).transform('any')
# is there at least one non-rem?
m2 = df['val2'].ne('rem').groupby(df['id']).transform('any')

# keep is any is True
out = df[m1|m2]

Output:

  id  val1 val2
0  A     0  ret
1  C     2  rem
2  B     1  ret
3  A     0  rem
4  B     0  rem
6  C     2  rem

Intermediates:

  id  val1 val2     m1     m2
0  A     0  ret  False   True
1  C     2  rem   True  False
2  B     1  ret   True   True
3  A     0  rem  False   True
4  B     0  rem   True   True
5  D     0  rem  False  False
6  C     2  rem   True  False
7  D     0  rem  False  False
8  D     0  rem  False  False
like image 61
mozway Avatar answered Dec 15 '25 04:12

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!