Suppose, I've a DataFrame df.
>>> DATA = {'id':[1,2,3,4,5], 'salary':[1200,2300,2400,3620,2100] }
>>> df = DataFrame(DATA)
>>> df
id salary
0 1 1200
1 2 2300
2 3 2400
3 4 3620
4 5 2100
From this DataFrame df, I can get a new DataFrame df1 with cumulative sum of salary
>>> df['salary'] = df['salary'].cumsum()
>>> df
id salary
0 1 1200
1 2 3500
2 3 5900
3 4 9520
4 5 11620
This is very common scenario.
Now, What about if I am given df1 and I have to find df.
id salary id salary
0 1 1200 0 1 1200
1 2 3500 1 2 2300
2 3 5900 ==> 2 3 2400
3 4 9520 3 4 3620
4 5 11620 4 5 2100
All I have to find actual salary for all id from its cumulative sum.
>>> df
id salary
0 1 1200
1 2 3500
2 3 5900
3 4 9520
4 5 11620
>>> df['salary'] = df['salary'].diff().fillna(df['salary'].iloc[0])
>>> df
id salary
0 1 1200
1 2 2300
2 3 2400
3 4 3620
4 5 2100
although .fillna is not efficient in here, because it is only the first value which comes out null. so, you just need to replace the first value with cumulative value at .iloc[0].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With