Lets say I have the data like this
df = pd.DataFrame({"MONTHS":[1,2,3,4,5,6], "METRIC":[430,584,648,571,610,535]})
df = pd.DataFrame({"MONTHS":[1,2,3,4,5,6], "METRIC":[430,584,648,571,610,535]}) what I want to do is find the sum of the metric column =3378
MONTHS NEW_METRIC
0 3378
1 3378
2 2948
3 2794
4 2730
5 2807
6 2768
So here basically created a new row with 0 added and the corresponding new metric is the sum and for value 1 it is also the sum for new metric but from month=2 it should subtract the lag value of metric (3378-430) and similiarly for months3 it would be (3378-584) etc till the end. As a first step I wanted to store the sum in result which failed
sum_METRIC = df.agg({"METRIC":"sum"}).collect()[0]
result = sum_METRIC["sum(METRIC)"]
This threw error 'Series' object has no attribute 'collect'.
How do I sum and then use the lag to subtract?
You can substract METRIC for the resulting sum:
In [11]: df["METRIC"].sum() - df["METRIC"].reindex(np.arange(0, 7)).shift(2).fillna(0)
Out[11]:
0 3378.0
1 3378.0
2 2948.0
3 2794.0
4 2730.0
5 2807.0
6 2768.0
Name: METRIC, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With