I have a pandas dataframe which looks like this:
UNIT MACHINE
1 a100 001
2 a100 002
3 a100 003
4 a100 001
5 b222 001
6 b222 002
7 b222 002
8 b222 003
I would like to group it based on "UNIT" and drop the rows that do not have the [001, 002, 003] "MACHINE" sequence. That means because UNIT "a100" has the sequence [001, 002, 003, 001], it should be dropped, but the UNIT "b222" remains, because the sequence is right regardless of repetition of MACHINE 002.
The output should look like this:
UNIT MACHINE
5 b222 001
6 b222 002
7 b222 002
8 b222 003
The sequence [001, 002, 003] is one of the acceptable MACHINE sequences which I have written here as an example. There are several of such sequences, all of which are monotonically increasing.
How should I combine GroupBy and drop to perform this?
In [26]: chk_set = set(df.MACHINE.unique())
In [27]: df[df.groupby('UNIT')['MACHINE']
.transform(lambda x: x.is_monotonic_increasing & chk_set.issubset(set(x)))]
Out[27]:
UNIT MACHINE
5 b222 001
6 b222 002
7 b222 002
8 b222 003
UPDATE:
assuming you have the following DF:
In [90]: df
Out[90]:
UNIT MACHINE
1 a100 001
2 a100 002
3 a100 003
4 a100 001
5 b222 001
6 b222 002
7 b222 002
8 b222 003
9 c1 001
10 c1 003
11 c2 078
12 c2 079
13 c2 080
14 c3 078
16 c3 080
and an array of concatenated groups to check against:
In [91]: chk_groups = np.array(['001002003','078079080'])
Solution:
In [92]: df[df.groupby('UNIT')['MACHINE']
.transform(lambda x: x.is_monotonic_increasing
& np.in1d(x.unique().sum(),chk_groups))]
Out[92]:
UNIT MACHINE
5 b222 001
6 b222 002
7 b222 002
8 b222 003
11 c2 078
12 c2 079
13 c2 080
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With