I was looking out to extract consecutive rows with specified text repeated continuously for more than 5 times.
ex:
A B C
10 john 1
12 paul 1
23 kishan 1
12 teja 1
12 zebo 1
324 vauh -1
3434 krish -1
232 poo -1
4535 zoo 1
4343 doo 1
342 foo -1
123 soo 1
121 koo -1
34 loo -1
343454 moo -1
565343 noo -1
2323234 voo -1
3434 coo 1
545 xoo 1
6565 zoo 1
232321 qoo 1
34454 woo 1
546556 eoo 1
65665 roo -1
5343 too -1
3232 yoo 1
1212 uoo 1
23355667 ioo 1
787878 joo -1
I am looking out for the below result where the column value 'c' has consecutive 1's repeated more than 4 times as different groups .
Output:
A B C group
10 john 1 1
12 paul 1 1
23 kishan 1 1
12 teja 1 1
12 zebo 1 1
3434 coo 1 2
545 xoo 1 2
6565 zoo 1 2
232321 qoo 1 2
34454 woo 1 2
546556 eoo 1 2
Using masks and factorize:
# identify 1s
m = df['C'].eq(1)
# group consecutive values
g = m.ne(m.shift()).cumsum()
# identify stretches of 5+ 1s
m2 = m & df.groupby(g)['C'].transform('size').ge(5)
out = (df.loc[m2]
.assign(group=pd.factorize(g[m2])[0]+1)
)
Output:
A B C group
0 10 john 1 1
1 12 paul 1 1
2 23 kishan 1 1
3 12 teja 1 1
4 12 zebo 1 1
17 3434 coo 1 2
18 545 xoo 1 2
19 6565 zoo 1 2
20 232321 qoo 1 2
21 34454 woo 1 2
22 546556 eoo 1 2
You can groupby the difference of col C, get the cumsum where the difference does not equal zero and transform the size of each group so you can only keep the groups that are greater than or equal to five
df[df['C'].eq(1) & df.groupby(df['C'].diff().ne(0).cumsum()).transform('size').gt(4)]
A B C
0 10 john 1
1 12 paul 1
2 23 kishan 1
3 12 teja 1
4 12 zebo 1
17 3434 coo 1
18 545 xoo 1
19 6565 zoo 1
20 232321 qoo 1
21 34454 woo 1
22 546556 eoo 1
If you want a group column
# create the groups by calculating the diff and getting the cumsum
df['group'] = df['C'].diff().ne(0).cumsum()
# boolean indexing to keep values where C == 1 AND the size of each group is greater than 4
df[df['C'].eq(1) & df.groupby('group').transform('size').gt(4)]
A B C group
0 10 john 1 1
1 12 paul 1 1
2 23 kishan 1 1
3 12 teja 1 1
4 12 zebo 1 1
17 3434 coo 1 7
18 545 xoo 1 7
19 6565 zoo 1 7
20 232321 qoo 1 7
21 34454 woo 1 7
22 546556 eoo 1 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With