I have a pandas dataframe that looks like this:
A B C D
0 1 2 3 0
1 4 5 6 1
2 7 8 9 2
3 10 10 10 0
4 10 10 10 1
5 1 2 3 0
6 4 5 6 1
7 7 8 8 2
I would like to remove all the set of rows that, in column 'D', are not -> 0,1,2 in this specific order;
The new dataframe I would like to obtain should look like this:
A B C D
0 1 2 3 0
1 4 5 6 1
2 7 8 9 2
3 1 2 3 0
4 4 5 6 1
5 7 8 8 2
.. because after row 3 and 4, row 5 did not have 2 in column 'D'.
A possible solution based on numpy:
w = np.lib.stride_tricks.sliding_window_view(df['D'], 3)
idx = np.flatnonzero((w == (0,1,2)).all(1)) # starting indexes of seq 0, 1, 2
df.iloc[(idx[:, None] + np.arange(3)).ravel()].reset_index(drop=True)
This uses numpy’s sliding_window_view to create a rolling 3-element view over the D column, then checks which windows match the sequence (0,1,2) by comparing element-wise and applying all along axis 1; the indices of the matching windows are obtained with flatnonzero. These starting indices are then expanded into full triplets with broadcasting, and the corresponding rows are selected from the dataframe using iloc, before reindexing cleanly with reset_index.
Output:
A B C D
0 1 2 3 0
1 4 5 6 1
2 7 8 9 2
3 1 2 3 0
4 4 5 6 1
5 7 8 8 2
Intermediates:
# w == (0,1,2)
array([[ True, True, True],
[False, False, False],
[False, False, False],
[ True, True, False],
[False, False, False],
[ True, True, True]])
# idx[:, None]
array([[0],
[5]])
# + np.arange(3)
array([[0, 1, 2],
[5, 6, 7]])
# .ravel()
array([0, 1, 2, 5, 6, 7])
seq = (0,1,2)
n = len(seq)
then:
.sliding_window_view(..., n)w == seqnp.arange(n)(thanks @wjandrea)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With