Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

new column with running number in pandas, incremented conditionally

Question:

Given a dataframe with data such as this:

>>> df
    data
0  START
1   blah
2   blah
3   blah
4   blah
5    END
6  START
7   blah
8   blah
9    END

What is the most efficient way to assign a new column with a running number that gets incremented at every START? This is my desired result:

>>> df
    data  number
0  START       1
1   blah       1
2   blah       1
3   blah       1
4   blah       1
5    END       1
6  START       2
7   blah       2
8   blah       2
9    END       2

What I've Done

This works fine, but is pretty slow (this will be applied to a much larger dataframe, and I'm sure there is a better way to do it:

counter = 0
df = df.assign(number = 0)
for i, row in df.iterrows():
    if row['data'] == 'START':
        counter += 1
    df.loc[i, 'number'] = counter

To Reproduce example dataframe

import pandas as pd
data = ['blah'] * 10
data[0], data[6] = ['START'] * 2
data[5], data[-1] = ['END'] * 2

df = pd.DataFrame({'data':data})
like image 644
sacuL Avatar asked Dec 06 '25 13:12

sacuL


1 Answers

Here is one way

df.data.eq('START').cumsum()
Out[74]: 
0    1
1    1
2    1
3    1
4    1
5    1
6    2
7    2
8    2
9    2
Name: data, dtype: int32

After assign it back

df['number']=df.data.eq('START').cumsum()
df
Out[76]: 
    data  number
0  START       1
1   blah       1
2   blah       1
3   blah       1
4   blah       1
5    END       1
6  START       2
7   blah       2
8   blah       2
9    END       2
like image 93
BENY Avatar answered Dec 09 '25 20:12

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!