Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign cumulative sum maximum to grouping of data points

If I had a pandas data frame like this:

 df = pd.Dataframe()
 df['Values'] = [2, 4, 2, 5, np.nan, np.nan, 5, 8, 2, 1, np.nan, np.nan, np.nan])

I can take the cumulative sum of the data frame like this (restarting the count when I have a new non nan grouping):

 df['Cumulative Sum'] = df.Values.groupby(df.Values.isna().cumsum()).cumsum()

such that I get:

 df['Cumulative Sum'].values
 array([ 2.,  6.,  8., 13., nan, nan,  5., 13., 15., 16., nan, nan, nan])

How do I assign the cumulative sum maximum to each non-nan grouping of data such that I get:

 array([13., 13., 13., 13., nan, nan, 16., 16., 16., 16., nan, nan, nan])

I was trying this where I groupby nan values but was struggling:

 df['Cumulative Sum'].groupby(df['Cumulative Sum'].isna().cummax()).cummax()
like image 503
the struggle Avatar asked Nov 19 '25 10:11

the struggle


1 Answers

Try:

df.groupby(df['Values'].isna().cumsum())['Values']\
  .transform('sum').where(df['Values'].notna())

Output:

0     13.0
1     13.0
2     13.0
3     13.0
4      NaN
5      NaN
6     16.0
7     16.0
8     16.0
9     16.0
10     NaN
11     NaN
12     NaN
Name: Values, dtype: float64

Details.

Use isna to create a boolean series True for NAN, then use cumsum to increment for every NaN creating groups of no-NaN values. Next, groupby this group and sum, use transform to assign that sum to every row in the dataframe. Lastly, you can mask the NaN values using where.

A little better code:

s = df['Values'].isna()
df.groupby(s.cumsum())['Values'].transform('sum').mask(s)
like image 101
Scott Boston Avatar answered Nov 20 '25 22:11

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!