removing rows that don't fit the repeating sequence in pandas dataframe

Question

I have a pandas dataframe that looks like this:

    A   B   C   D
0   1   2   3   0
1   4   5   6   1
2   7   8   9   2
3   10  10  10  0
4   10  10  10  1
5   1   2   3   0
6   4   5   6   1
7   7   8   8   2

I would like to remove all the set of rows that, in column 'D', are not -> 0,1,2 in this specific order;

The new dataframe I would like to obtain should look like this:

    A   B   C   D
0   1   2   3   0
1   4   5   6   1
2   7   8   9   2
3   1   2   3   0
4   4   5   6   1
5   7   8   8   2

.. because after row 3 and 4, row 5 did not have 2 in column 'D'.

PaulS · Accepted Answer

A possible solution based on numpy:

w = np.lib.stride_tricks.sliding_window_view(df['D'], 3)
idx = np.flatnonzero((w == (0,1,2)).all(1)) # starting indexes of seq 0, 1, 2
df.iloc[(idx[:, None] + np.arange(3)).ravel()].reset_index(drop=True)

This uses numpy’s sliding_window_view to create a rolling 3-element view over the D column, then checks which windows match the sequence (0,1,2) by comparing element-wise and applying all along axis 1; the indices of the matching windows are obtained with flatnonzero. These starting indices are then expanded into full triplets with broadcasting, and the corresponding rows are selected from the dataframe using iloc, before reindexing cleanly with reset_index.

Output:

   A  B  C  D
0  1  2  3  0
1  4  5  6  1
2  7  8  9  2
3  1  2  3  0
4  4  5  6  1
5  7  8  8  2

Intermediates:

# w == (0,1,2)

array([[ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [ True,  True, False],
       [False, False, False],
       [ True,  True,  True]])

# idx[:, None]

array([[0],
       [5]])

# + np.arange(3)

array([[0, 1, 2],
       [5, 6, 7]])

# .ravel()

array([0, 1, 2, 5, 6, 7])

To turn this solution more general

seq = (0,1,2)
n = len(seq)

then:

.sliding_window_view(..., n)
w == seq
np.arange(n)

(thanks @wjandrea)

removing rows that don't fit the repeating sequence in pandas dataframe

Tags:

python

pandas

dataframe

AjWinston

1 Answers

To turn this solution more general

PaulS

Recent Activity

Donate For Us

removing rows that don't fit the repeating sequence in pandas dataframe

Tags:

python

pandas

dataframe

AjWinston

1 Answers

To turn this solution more general

PaulS

Related questions

Recent Activity

Donate For Us