Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract two Pandas DataFrames joined on multiple column values

I am trying to subtract the values of a column in DataFrame A, from the values from a column in DataFrame B, but only if multiple column values are equal to each other.

Assume that (fictionally):

DataFrame A:
Index    Department  Speciality   TargetMonth Capacity
1        Sales       Cars         2019-1      150
2        Sales       Cars         2019-2      120
3        Sales       Furniture    2019-1      110
4        IT          Servers      2019-1      100

DataFrame B:
Index    Department  Speciality   TargetMonth Required
1        Sales       Cars         2019-1      100
2        Sales       Cars         2019-2      120
3        IT          Servers      2019-1      50
4        Sales       Furniture    2019-1      50

I swapped the order of Index 3 and 4 in DataFrame B compared to A on purpose. My goal is to subtract DataFrame B its Required column as being required capacity hours from DataFrame A's Capacity column and resulting in another, not necessarily required to be sorted, list:

Index    Department  Speciality   TargetMonth Result
1        Sales       Cars         2019-1      50
2        Sales       Cars         2019-2      0
3        Sales       Furniture    2019-1      60
4        IT          Servers      2019-1      50

So, technically, subtract only, and only if all column values match each other and not based on order, as some rows may be missing in one list or the other.

I could solve this with some for loops and conditions but I suppose there's a clean and neat Pandas way to solve this with .subtract although it's the "joining" part on which I am currently stuck.

Thanks for your time in advance.

like image 585
Ventus Avatar asked Oct 25 '25 15:10

Ventus


2 Answers

This is why the Index is so useful, subtraction will be aligned on the indices (both rows and columns).

dfA = dfA.set_index(['Department', 'Speciality', 'TargetMonth'])
dfB = dfB.set_index(['Department', 'Speciality', 'TargetMonth'])

dfA.sub(dfB.rename(columns={'Required': 'Capacity'}), fill_value=0)

                                   Capacity
Department Speciality TargetMonth          
IT         Servers    2019-1             50
Sales      Cars       2019-1             50
                      2019-2              0
           Furniture  2019-1             60
like image 189
ALollz Avatar answered Oct 28 '25 03:10

ALollz


I would use merge with keys:

For this solution taking your dataframe A as dfA & dataframe as dfB

   df_result =  pd.merge(dfA, dfB, how='inner', on=['Department','Speciality','TargetMonth'])

This will put the dataframes together based on the keys: ['Department','Speciality','TargetMonth'] and will result in a dataframe where the the keys appear in both dataframes (how = 'inner').

I.E. if there is a key in dfB that is:

   {'Department': 'IT','Speciality':'Furniture','TargetMonth':2019-1}

This value will not appear in dataframe df_result. More Information can be found here - https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Then the solution using Pandas vectorization:

   df_result['Result'] = df_result['Capacity'] - df_result['Required']
like image 23
Tiblit Avatar answered Oct 28 '25 05:10

Tiblit



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!