Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas number of unique users grouped by year-week

I have a report_date column that I want to aggregate by its year-week and count the number of unique users.

import pandas as pd
from io import StringIO

datastring = StringIO("""\
report_date  user_id
2015-12-01         1
2015-12-01         2
2015-12-01         2
2015-12-02         2
2015-12-02         3
2016-01-01         1
""")

df = pd.read_table(datastring, sep='\s\s+', engine='python')
df['report_date'] = pd.to_datetime(df['report_date'])

Output I want:

2015-48    3
2016-00    1

I have come up with a solution (posted below) but it's relatively slow when using a much larger dataset (> 1MM rows). Curious if there's a better solution to this problem.

like image 902
Kamil Sindi Avatar asked Sep 06 '25 03:09

Kamil Sindi


2 Answers

As your column is already datetime there is no need to convert to string and groupby on the string, we can group on the year and week component and then just call nunique:

In [108]:
df.groupby([df['report_date'].dt.year, df['report_date'].dt.week])['user_id'].nunique()

Out[108]:
report_date  report_date
2015         49             3
2016         53             1
Name: user_id, dtype: int64
like image 55
EdChum Avatar answered Sep 07 '25 19:09

EdChum


(df.assign(report_week=lambda x: x.report_date.dt.strftime('%Y-%W'))
  .groupby('report_week')
  .user_id
  .nunique()
)

Edit

I ended up modifying @EdChum's suggestion to get rid of the cases like '2016-53' when report_date is 2016-01-01 by grouping by the week number modulo 53:

df.groupby([df.report_date.dt.year, df.report_date.dt.week.mod(53)]).user_id.nunique()
like image 41
Kamil Sindi Avatar answered Sep 07 '25 19:09

Kamil Sindi