Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count maximum consecutive occurences of a string in a dataframe column

Tags:

python

pandas

I have a panda dataframe in which I would like to count the number of consecutive occurences of a specific string in one column.

Let's say I have the following dataframe.

   col1
0  string1
1  string1
2  string1
3  string2
4  string3
5  string3
6  string1

I would like to define a as the number of maximum consecutive occurences of for example string1 or any other string in col1.

In this case, ashould return 3 if I want to search for string1 and return 2 for string3.

How can it be achieved?

like image 947
Sd Junk Avatar asked Oct 26 '25 09:10

Sd Junk


1 Answers

Can do the usual trick of grouping consecutive values:

df1 = df.groupby((df.col1 != df.col1.shift()).cumsum().rename(None)).col1.agg(['size', 'first'])
#   size    first
#1     3  string1
#2     1  string2
#3     2  string3
#4     1  string1

Then sort_values + drop_duplicates to find the largest:

df1 = df1.sort_values('size').drop_duplicates('first', keep='last').set_index('first').rename_axis(None)
#         size
#string2     1
#string3     2
#string1     3

So now you can look them up easily:

df1.loc['string1']
#size    3
#Name: string1, dtype: int64
like image 116
ALollz Avatar answered Oct 27 '25 23:10

ALollz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!