Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging Dataframes with different column names with aggregated column values

Merging two dataframe : I have two dataframes that need merging on some criteria but i havent been able to figure out how to do this?

df1 : 

id            positive_action    date             volume  
id_1          user 1                  2016-12-12       19720.735
              user 2                  2016-12-12       14740.800

df2 :
id            negative_action        date             volume  
id_1          user 1                  2016-12-12       10.000
              user 3                  2016-12-12       10.000     

I want : 

id            action        date             volume  
id_1          user 1         2016-12-12       19730.735
              user 2         2016-12-12       14740.800   
              user 3         2016-12-12       10.000 

Here

  1. Volume is aggregated across both dataframes
  2. merged on ID, Date and (positive action and negative action merged together)

How do i achieve this?

like image 578
CodeGeek123 Avatar asked Dec 20 '25 06:12

CodeGeek123


2 Answers

You can also concatenate your DataFrames after renaming the positive_action and negative_action columns to just action and then perform a groupby.

df1.rename(columns={'positive_action':'action'}, inplace=True)
df2.rename(columns={'negative_action':'action'}, inplace=True)
pd.concat([df1, df2]).groupby(['id', 'action', 'date']).sum().reset_index()


     id  action        date     volume
0  id_1  user 1  2016-12-12  19730.735
1  id_1  user 2  2016-12-12  14740.800
2  id_1  user 3  2016-12-12     10.000
like image 58
Ted Petrou Avatar answered Dec 21 '25 19:12

Ted Petrou


This should work:

# not sure what indexing you are using so lets remove it
# to get on the same page, so to speak ;).
df1 = df1.reset_index()
df2 = df2.reset_index()

# do an outer merge to allow mismatches on the actions.
df = df1.merge(
    df2, left_on=['id', 'positive_action', 'date'],
    right_on=['id', 'negative_action', 'date'],
    how='outer',
)


# fill the missing actions from one with the other.
# (Will only happen when one is missing due to the way we merged.)
df['action'] = df['positive_action'].fillna(df['negative_action'])

# drop the old actions
df = df.drop('positive_action', 1)
df = df.drop('negative_action', 1)

# aggregate the volumes (I'm assuming you mean a simple sum)
df['volume'] = df['volume_x'].fillna(0) + df['volume_y'].fillna(0)

# drop the old volumes
df = df.drop('volume_x', 1)
df = df.drop('volume_y', 1)

print(df)

The output is:

     id        date     volume  action
0  id_1  2016-12-12  19730.735  user_1
1  id_1  2016-12-12  14740.800  user_2
2  id_1  2016-12-12     10.000  user_3

You can then restore the indexing I may have removed.

like image 30
David Simic Avatar answered Dec 21 '25 19:12

David Simic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!