Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to bin a pandas series of date strings by day/month/year?

I find myself often doing something like the below, starting with a dataframe which has a column of dates in string format which I want to bin by some calendar unit (days, months, years etc.) I resort to something like the below because I know resample only works on a DateTimeIndex series.

Is there a more efficient way to bin by dates that is more concise in terms of code and/or faster to process?

    build = pd.to_datetime(df.date_build,'%m/%d/%y')
    build = pd.DatetimeIndex(build)
    final = build.to_series()
    one = final.resample('M',how='count')
like image 518
JL1515 Avatar asked Oct 15 '25 14:10

JL1515


1 Answers

You can groupby the time column directly:

df.groupby(pd.Grouper(key='date_build', freq='M')).size()

Only a little faster in terms of runtime - all those conversion operations are pretty fast.

In [198]: df = pd.DataFrame({'date_build': pd.date_range('1900-1-1', periods=100000)})

In [199]: %timeit pd.DatetimeIndex(df.date_build).to_series().resample('M', how='count')
10 loops, best of 3: 149 ms per loop

In [200]: %timeit df.groupby(pd.Grouper(key='date_build', freq='M')).size()
10 loops, best of 3: 136 ms per loop
like image 87
chrisb Avatar answered Oct 18 '25 03:10

chrisb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!