Currently I'm working with weekly data for different subjects, but it might have some long streaks without data, so, what I want to do, is to just keep the longest streak of consecutive weeks for every id. My data looks like this:
id    week
1      8
1      15
1      60
1      61
1      62
2      10
2      11
2      12
2      13
2      25
2      26
My expected output would be:
id    week
1      60
1      61
1      62
2      10
2      11
2      12
2      13
I got a bit close, trying to mark with a 1 when week==week.shift()+1. The problem is this approach doesn't mark the first occurrence in a streak, and also I can't filter the longest one:
df.loc[ (df['id'] == df['id'].shift())&(df['week'] == df['week'].shift()+1),'streak']=1
This, according to my example, would bring this:
id    week  streak
1      8     nan
1      15    nan
1      60    nan
1      61    1
1      62    1
2      10    nan
2      11    1
2      12    1
2      13    1
2      25    nan
2      26    1
Any ideas on how to achieve what I want?
Try this:
df['consec'] = df.groupby(['id',df['week'].diff(-1).ne(-1).shift().bfill().cumsum()]).transform('count')
df[df.groupby('id')['consec'].transform('max') == df.consec]
Output:
   id  week  consec
2   1    60       3
3   1    61       3
4   1    62       3
5   2    10       4
6   2    11       4
7   2    12       4
8   2    13       4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With