I have a panda dataframe in which I would like to count the number of consecutive occurences of a specific string in one column.
Let's say I have the following dataframe.
col1
0 string1
1 string1
2 string1
3 string2
4 string3
5 string3
6 string1
I would like to define a as the number of maximum consecutive occurences of for example string1 or any other string in col1.
In this case, ashould return 3 if I want to search for string1 and return 2 for string3.
How can it be achieved?
Can do the usual trick of grouping consecutive values:
df1 = df.groupby((df.col1 != df.col1.shift()).cumsum().rename(None)).col1.agg(['size', 'first'])
# size first
#1 3 string1
#2 1 string2
#3 2 string3
#4 1 string1
Then sort_values + drop_duplicates to find the largest:
df1 = df1.sort_values('size').drop_duplicates('first', keep='last').set_index('first').rename_axis(None)
# size
#string2 1
#string3 2
#string1 3
So now you can look them up easily:
df1.loc['string1']
#size 3
#Name: string1, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With