Count maximum consecutive occurences of a string in a dataframe column

Question

I have a panda dataframe in which I would like to count the number of consecutive occurences of a specific string in one column.

Let's say I have the following dataframe.

   col1
0  string1
1  string1
2  string1
3  string2
4  string3
5  string3
6  string1

I would like to define a as the number of maximum consecutive occurences of for example string1 or any other string in col1.

In this case, ashould return 3 if I want to search for string1 and return 2 for string3.

How can it be achieved?

ALollz · Accepted Answer

Can do the usual trick of grouping consecutive values:

df1 = df.groupby((df.col1 != df.col1.shift()).cumsum().rename(None)).col1.agg(['size', 'first'])
#   size    first
#1     3  string1
#2     1  string2
#3     2  string3
#4     1  string1

Then sort_values + drop_duplicates to find the largest:

df1 = df1.sort_values('size').drop_duplicates('first', keep='last').set_index('first').rename_axis(None)
#         size
#string2     1
#string3     2
#string1     3

So now you can look them up easily:

df1.loc['string1']
#size    3
#Name: string1, dtype: int64

Count maximum consecutive occurences of a string in a dataframe column

Tags:

python

pandas

Sd Junk

1 Answers

ALollz

Recent Activity

Donate For Us

Count maximum consecutive occurences of a string in a dataframe column

Tags:

python

pandas

Sd Junk

1 Answers

ALollz

Related questions

Recent Activity

Donate For Us