Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulate row values of subset dataframe in Pandas

Tags:

python

pandas

I have a dataframe df1 like this:

    ID1    ID2
0   foo    bar
1   fizz   buzz

And another df2 like this:

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        B
2   fizz1  buzz2    9        C
3   foo    bar      6        B
4   foo    bar      6        Z

What I would like to do is filter the second dataframe where ID1 and ID2 match a row in the first dataframe, as a dataframe sub_df, then apply following code sub_df.loc[sub_df["Count"] >= 5, "Code"] = "A" to sub_df:

sub_df:

    ID1    ID2    Count    Code   
1   fizz   buzz     5        B
3   foo    bar      6        B
4   foo    bar      6        Z

Finaly I would like to yield a dataframe df that looks like this:

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        A
2   fizz1  buzz2    9        C
3   foo    bar      6        A
4   foo    bar      6        A

How could I do that? Thank you.

like image 539
ah bon Avatar asked Jan 01 '26 03:01

ah bon


2 Answers

You could merge both dataframes with an indicator and use it to set a Code as A or not:

df = df2.merge(df1, how='left', on=['ID1','ID2'], indicator='ind')
df.loc[(df["Count"] >= 5) & (df['ind'] == 'both'), "Code"] = "A" 
df = df.drop('ind', axis=1)

print(df2)

     ID1    ID2  Count Code
0    abc    def      7    B
1   fizz   buzz      5    A
2  fizz1  buzz2      9    C
3    foo    bar      6    A
4    foo    bar      6    A
like image 113
yatu Avatar answered Jan 03 '26 16:01

yatu


You can use Series.isin for test membership between combinations of ID1 and ID2 with joined both columns with Series.str.cat:

id2 = df2['ID1'].str.cat(df2['ID2'], sep='_')
id1 = df1['ID1'].str.cat(df1['ID2'], sep='_')

df2.loc[(df2["Count"] >= 5) & id2.isin(id1), "Code"] = "A" 
print (df2)
     ID1    ID2  Count Code
0    abc    def      7    B
1   fizz   buzz      5    A
2  fizz1  buzz2      9    C
3    foo    bar      6    A
4    foo    bar      6    A

EDIT:

Tested and for me working nice:

print (df1)
    ID1   ID2
0   foo   bar
1  fizz  buzz

print (df2)
     ID1    ID2        date  price
0    abc    def  2019-08-01      1
1   fizz   buzz  2019-08-02      2
2  fizz1  buzz2  2019-08-02      3
3    foo    bar  2019-08-03      4
4    foo    bar  2019-08-01      5

df2["date"] = pd.to_datetime(df2["date"])
df2.loc[(df2["date"] != '2019-08-01') & (df2['ID1'].isin(df1['ID1'])), "price"] = np.nan, 
print (df2)
     ID1    ID2       date  price
0    abc    def 2019-08-01    1.0
1   fizz   buzz 2019-08-02    NaN <- set NaN beacuse id
2  fizz1  buzz2 2019-08-02    3.0
3    foo    bar 2019-08-03    NaN <- set NaN beacuse id
4    foo    bar 2019-08-01    5.0 <- not set NaN beacuse id but 2019-08-01
like image 25
jezrael Avatar answered Jan 03 '26 16:01

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!