I have a dataframe as follows:
name tag price
0 x1 tweak1 1.1
1 x1 tweak2 1.2
2 x1 base 1.0
3 x2 tweak1 2.1
4 x2 tweak2 2.2
5 x2 base 2.0
I want to subtract the base price from price column and create a new column as follows:
name tag price sensitivity
0 x1 tweak1 1.1 0.1
1 x1 tweak2 1.2 0.2
2 x1 base 1.0 0.0
3 x2 tweak1 1.3 -0.7
4 x2 tweak2 2.4 0.4
5 x2 base 2.0 0.0
and eventually drop the rows with tag base to get
name tag price sensitivity
0 x1 tweak1 1.1 0.1
1 x1 tweak2 1.2 0.2
3 x2 tweak1 1.3 -0.7
4 x2 tweak2 2.4 0.4
What is the best way to perform this operation in pandas?
You can try this:
(df.groupby('name', group_keys=False)
.apply(lambda g: g.assign(sensitivity = g.price - g.price[g.tag == "base"].values))
[lambda x: x.tag != "base"])

Or another option, pivot table to wide format, do the subtraction and then transform it back to long format:
wide_df = df.pivot_table(['price'], 'name', 'tag')
(wide_df.sub(wide_df[('price', 'base')], axis=0)
.drop(('price', 'base'), 1).stack(level=1)
.reset_index())

I'd start by making your index from the 'name' and 'tag' columns.
Then I'd subtract the 'base' cross section. Pandas will align for us.
Finally, use assign + drop + reset_index for bookkeeping and formatting.
p = df.set_index(['name', 'tag'])[['price']]
p.assign(sensitivity=p - p.xs('base', level=1)).drop('base', level=1).reset_index()
name tag price sensitivity
0 x1 tweak1 1.1 0.1
1 x1 tweak2 1.2 0.2
2 x2 tweak1 1.3 -0.7
3 x2 tweak2 2.4 0.4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With