There is a data frame with a model and an item as a column below
df = pd.DataFrame({'model':['A','A','A','A','A','A','A','B','B','B','B','B','B','B'],
'item':['aa','ab','ab','ab','ac','ad','ad','ba','ba','ba','bb','bb','bb','bc']})
I want to add a serial column to this data frame, but there are some rules
The serial number is reset when the model(A, B) is changed and starts from zero.
Serial cannot exceed 3.
0, 1, 2 values only
After two(2), it starts at zero(0)
In the case of the same item, serial is be the same
what I want is
You want pd.factorize
on item
within each model
group (groupby
). The reset part is just a modulo away:
df['serial'] = df.groupby(['model'])['item'].transform(lambda x: pd.factorize(x)[0]) % 3
Output:
model item serial
0 A aa 0
1 A ab 1
2 A ab 1
3 A ab 1
4 A ac 2
5 A ad 0
6 A ad 0
7 B ba 0
8 B ba 0
9 B ba 0
10 B bb 1
11 B bb 1
12 B bb 1
13 B bc 2
Compare each item with the previous and apply a cumulative sum. The rest is a math operation to adjust to your expected output (count starts from 0 and not exceed 3)
to_serial = lambda x: x.ne(x.shift()).cumsum().sub(1).mod(3)
df['serial'] = df.groupby('model')['item'].transform(to_serial)
print(df)
# Output
model item serial
0 A aa 0
1 A ab 1
2 A ab 1
3 A ab 1
4 A ac 2
5 A ad 0
6 A ad 0
7 B ba 0
8 B ba 0
9 B ba 0
10 B bb 1
11 B bb 1
12 B bb 1
13 B bc 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With