If I had a pandas data frame like this:
df = pd.Dataframe()
df['Values'] = [2, 4, 2, 5, np.nan, np.nan, 5, 8, 2, 1, np.nan, np.nan, np.nan])
I can take the cumulative sum of the data frame like this (restarting the count when I have a new non nan grouping):
df['Cumulative Sum'] = df.Values.groupby(df.Values.isna().cumsum()).cumsum()
such that I get:
df['Cumulative Sum'].values
array([ 2., 6., 8., 13., nan, nan, 5., 13., 15., 16., nan, nan, nan])
How do I assign the cumulative sum maximum to each non-nan grouping of data such that I get:
array([13., 13., 13., 13., nan, nan, 16., 16., 16., 16., nan, nan, nan])
I was trying this where I groupby nan values but was struggling:
df['Cumulative Sum'].groupby(df['Cumulative Sum'].isna().cummax()).cummax()
Try:
df.groupby(df['Values'].isna().cumsum())['Values']\
.transform('sum').where(df['Values'].notna())
Output:
0 13.0
1 13.0
2 13.0
3 13.0
4 NaN
5 NaN
6 16.0
7 16.0
8 16.0
9 16.0
10 NaN
11 NaN
12 NaN
Name: Values, dtype: float64
Details.
Use isna to create a boolean series True for NAN, then use cumsum to increment for every NaN creating groups of no-NaN values. Next, groupby this group and sum, use transform to assign that sum to every row in the dataframe. Lastly, you can mask the NaN values using where.
A little better code:
s = df['Values'].isna()
df.groupby(s.cumsum())['Values'].transform('sum').mask(s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With