I am trying to subtract the values of a column in DataFrame A, from the values from a column in DataFrame B, but only if multiple column values are equal to each other.
Assume that (fictionally):
DataFrame A:
Index Department Speciality TargetMonth Capacity
1 Sales Cars 2019-1 150
2 Sales Cars 2019-2 120
3 Sales Furniture 2019-1 110
4 IT Servers 2019-1 100
DataFrame B:
Index Department Speciality TargetMonth Required
1 Sales Cars 2019-1 100
2 Sales Cars 2019-2 120
3 IT Servers 2019-1 50
4 Sales Furniture 2019-1 50
I swapped the order of Index 3 and 4 in DataFrame B compared to A on purpose. My goal is to subtract DataFrame B its Required column as being required capacity hours from DataFrame A's Capacity column and resulting in another, not necessarily required to be sorted, list:
Index Department Speciality TargetMonth Result
1 Sales Cars 2019-1 50
2 Sales Cars 2019-2 0
3 Sales Furniture 2019-1 60
4 IT Servers 2019-1 50
So, technically, subtract only, and only if all column values match each other and not based on order, as some rows may be missing in one list or the other.
I could solve this with some for loops and conditions but I suppose there's a clean and neat Pandas way to solve this with .subtract although it's the "joining" part on which I am currently stuck.
Thanks for your time in advance.
This is why the Index is so useful, subtraction will be aligned on the indices (both rows and columns).
dfA = dfA.set_index(['Department', 'Speciality', 'TargetMonth'])
dfB = dfB.set_index(['Department', 'Speciality', 'TargetMonth'])
dfA.sub(dfB.rename(columns={'Required': 'Capacity'}), fill_value=0)
Capacity
Department Speciality TargetMonth
IT Servers 2019-1 50
Sales Cars 2019-1 50
2019-2 0
Furniture 2019-1 60
I would use merge with keys:
For this solution taking your dataframe A as dfA & dataframe as dfB
df_result = pd.merge(dfA, dfB, how='inner', on=['Department','Speciality','TargetMonth'])
This will put the dataframes together based on the keys: ['Department','Speciality','TargetMonth'] and will result in a dataframe where the the keys appear in both dataframes (how = 'inner').
I.E. if there is a key in dfB that is:
{'Department': 'IT','Speciality':'Furniture','TargetMonth':2019-1}
This value will not appear in dataframe df_result. More Information can be found here - https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Then the solution using Pandas vectorization:
df_result['Result'] = df_result['Capacity'] - df_result['Required']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With