I'm having trouble coming up with a way to perform a cumulative sum on a column and creating a flag once it hits a certain value.
So given a dataframe:
df = pd.DataFrame([[5,1],[6,1],[30,1],[170,0],[5,1],[10,1]],columns = ['a','b'])
     a  b
0    5  1
1    6  1
2   30  1
3  170  0
4    5  1
5   10  1
For column A, I want to perform the cumulative sum and set the "Flag" column value to 1 if that max value is reached. Upon reaching that max value, it will reset to 0. In this case, the max value is 40. Any cumulative sum over 40 will trigger the reset
Desired Output
     a  b  Flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0
Any help would be appreciated!
"Ordinary" cumsum() is here useless, as this function "doesn't know" where to restart summation.
You can do it with the following custom function:
def myCumSum(x, thr):
    if myCumSum.prev >= thr:
        myCumSum.prev = 0
    myCumSum.prev += x
    return myCumSum.prev
This function is "with memory" (from the previous call) - prev, so there is a way to "know" where to restart.
To speed up the execution, define a vectorized version of this function:
myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])
Then execute:
threshold = 40
myCumSum.prev = 0  # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int)  # Compute "flag" column
The result is:
     a  b  flag
0    5  1     0
1   11  1     0
2   41  1     1
3  170  0     1
4    5  1     0
5   15  1     0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With