Pandas number of unique users grouped by year-week

Question

I have a report_date column that I want to aggregate by its year-week and count the number of unique users.

import pandas as pd
from io import StringIO

datastring = StringIO("""\
report_date  user_id
2015-12-01         1
2015-12-01         2
2015-12-01         2
2015-12-02         2
2015-12-02         3
2016-01-01         1
""")

df = pd.read_table(datastring, sep='\s\s+', engine='python')
df['report_date'] = pd.to_datetime(df['report_date'])

Output I want:

2015-48    3
2016-00    1

I have come up with a solution (posted below) but it's relatively slow when using a much larger dataset (> 1MM rows). Curious if there's a better solution to this problem.

EdChum · Accepted Answer

As your column is already datetime there is no need to convert to string and groupby on the string, we can group on the year and week component and then just call nunique:

In [108]:
df.groupby([df['report_date'].dt.year, df['report_date'].dt.week])['user_id'].nunique()

Out[108]:
report_date  report_date
2015         49             3
2016         53             1
Name: user_id, dtype: int64

Kamil Sindi · Answer

(df.assign(report_week=lambda x: x.report_date.dt.strftime('%Y-%W'))
  .groupby('report_week')
  .user_id
  .nunique()
)

Edit

I ended up modifying @EdChum's suggestion to get rid of the cases like '2016-53' when report_date is 2016-01-01 by grouping by the week number modulo 53:

df.groupby([df.report_date.dt.year, df.report_date.dt.week.mod(53)]).user_id.nunique()

Pandas number of unique users grouped by year-week

Tags:

python

python-3.x

pandas

Kamil Sindi

2 Answers

EdChum

Edit

Kamil Sindi

Recent Activity

Donate For Us

Pandas number of unique users grouped by year-week

Tags:

python

python-3.x

pandas

Kamil Sindi

2 Answers

EdChum

Edit

Kamil Sindi

Related questions

Recent Activity

Donate For Us