I have the following dataframe :
event_id occurred_at user_id
19148 2015-10-01 1
19693 2015-10-05 2
20589 2015-10-12 1
20996 2015-10-15 1
20998 2015-10-15 1
23301 2015-10-23 2
23630 2015-10-26 1
25172 2015-11-03 1
31699 2015-12-11 1
32186 2015-12-14 2
43426 2016-01-13 1
68300 2016-04-04 2
71926 2016-04-19 1
I would like to rank the events by chronological order (1 to n), for each user.
I can achieve this by doing :
df.groupby('user_id')['occurred_at'].rank(method='dense')
However, for those 2 lines, that occurred on the same date (for the same user), I end up with the same rank :
20996 2015-10-15 1
20998 2015-10-15 1
In case the event date is the same, I would like to compare the event_id
and arbitrarily rank lower the event with the lowest event_id
. How can I achieve this easily ?
I can post process the ranks to make sure every rank is only used once, but this seems pretty bulky...
Edit : how to reproduce :
Copy paste the data in data.csv
file.
Then :
import pandas as pd
df = pd.read_csv('data.csv', delim_whitespace=True)
df['rank'] = df.groupby('user_id')['occurred_at'].rank(method='dense')
>>> df[df['user_id'] == 1]
event_id occurred_at user_id rank
0 19148 2015-10-01 1 1.0
2 20589 2015-10-12 1 2.0
3 20996 2015-10-15 1 3.0 <--
4 20998 2015-10-15 1 3.0 <--
6 23630 2015-10-26 1 4.0
7 25172 2015-11-03 1 5.0
8 31699 2015-12-11 1 6.0
10 43426 2016-01-13 1 7.0
12 71926 2016-04-19 1 8.0
Am using python3 and pandas 0.18.1
sort_values('event_id')
prior to grouping then pass method='first'
to rank
Also note that if occurred_at
isn't already datetime
, make it datetime
.
# unnecessary if already datetime, but doesn't hurt to do it anyway
df.occurred_at = pd.to_datetime(df.occurred_at)
df['rank'] = df.sort_values('event_id') \
.groupby('user_id').occurred_at \
.rank(method='first')
df
from StringIO import StringIO
import pandas as pd
text = """event_id occurred_at user_id
19148 2015-10-01 1
19693 2015-10-05 2
20589 2015-10-12 1
20996 2015-10-15 1
20998 2015-10-15 1
23301 2015-10-23 2
23630 2015-10-26 1
25172 2015-11-03 1
31699 2015-12-11 1
32186 2015-12-14 2
43426 2016-01-13 1
68300 2016-04-04 2
71926 2016-04-19 1"""
df = pd.read_csv(StringIO(text), delim_whitespace=True)
df.occurred_at = pd.to_datetime(df.occurred_at)
df['rank'] = df.sort_values('event_id').groupby('user_id').occurred_at.rank(method='first')
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With