Group Pandas dataframe by one column, drop rows based on another column

Question

I have a pandas dataframe which looks like this:

     UNIT        MACHINE
1    a100        001
2    a100        002
3    a100        003
4    a100        001
5    b222        001
6    b222        002
7    b222        002
8    b222        003

I would like to group it based on "UNIT" and drop the rows that do not have the [001, 002, 003] "MACHINE" sequence. That means because UNIT "a100" has the sequence [001, 002, 003, 001], it should be dropped, but the UNIT "b222" remains, because the sequence is right regardless of repetition of MACHINE 002.

The output should look like this:

     UNIT        MACHINE
5    b222        001
6    b222        002
7    b222        002
8    b222        003

The sequence [001, 002, 003] is one of the acceptable MACHINE sequences which I have written here as an example. There are several of such sequences, all of which are monotonically increasing.

How should I combine GroupBy and drop to perform this?

MaxU - stop WAR against UA · Accepted Answer

In [26]: chk_set = set(df.MACHINE.unique())

In [27]: df[df.groupby('UNIT')['MACHINE']
              .transform(lambda x: x.is_monotonic_increasing & chk_set.issubset(set(x)))]
Out[27]:
   UNIT MACHINE
5  b222     001
6  b222     002
7  b222     002
8  b222     003

UPDATE:

assuming you have the following DF:

In [90]: df
Out[90]:
    UNIT MACHINE
1   a100     001
2   a100     002
3   a100     003
4   a100     001
5   b222     001
6   b222     002
7   b222     002
8   b222     003
9     c1     001
10    c1     003
11    c2     078
12    c2     079
13    c2     080
14    c3     078
16    c3     080

and an array of concatenated groups to check against:

In [91]: chk_groups = np.array(['001002003','078079080'])

Solution:

In [92]: df[df.groupby('UNIT')['MACHINE']
              .transform(lambda x: x.is_monotonic_increasing
                                   & np.in1d(x.unique().sum(),chk_groups))]
Out[92]:
    UNIT MACHINE
5   b222     001
6   b222     002
7   b222     002
8   b222     003
11    c2     078
12    c2     079
13    c2     080

Group Pandas dataframe by one column, drop rows based on another column

Tags:

python

pandas

dataframe

group-by

pandas-groupby

sepideh

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Group Pandas dataframe by one column, drop rows based on another column

Tags:

python

pandas

dataframe

group-by

pandas-groupby

sepideh

1 Answers

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us