Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sum-lag of a column

Tags:

pandas

Lets say I have the data like this

df = pd.DataFrame({"MONTHS":[1,2,3,4,5,6], "METRIC":[430,584,648,571,610,535]})

df = pd.DataFrame({"MONTHS":[1,2,3,4,5,6], "METRIC":[430,584,648,571,610,535]}) what I want to do is find the sum of the metric column =3378

MONTHS NEW_METRIC
 0       3378
 1       3378
 2       2948
 3       2794
 4       2730
 5       2807
 6       2768

So here basically created a new row with 0 added and the corresponding new metric is the sum and for value 1 it is also the sum for new metric but from month=2 it should subtract the lag value of metric (3378-430) and similiarly for months3 it would be (3378-584) etc till the end. As a first step I wanted to store the sum in result which failed

sum_METRIC = df.agg({"METRIC":"sum"}).collect()[0]
result = sum_METRIC["sum(METRIC)"]

This threw error 'Series' object has no attribute 'collect'.

How do I sum and then use the lag to subtract?

like image 497
viji Avatar asked Jan 26 '26 03:01

viji


1 Answers

You can substract METRIC for the resulting sum:

In [11]: df["METRIC"].sum() - df["METRIC"].reindex(np.arange(0, 7)).shift(2).fillna(0)
Out[11]:
0    3378.0
1    3378.0
2    2948.0
3    2794.0
4    2730.0
5    2807.0
6    2768.0
Name: METRIC, dtype: float64
like image 125
Andy Hayden Avatar answered Jan 28 '26 20:01

Andy Hayden