I find myself often doing something like the below, starting with a dataframe which has a column of dates in string format which I want to bin by some calendar unit (days, months, years etc.) I resort to something like the below because I know resample
only works on a DateTimeIndex series.
Is there a more efficient way to bin by dates that is more concise in terms of code and/or faster to process?
build = pd.to_datetime(df.date_build,'%m/%d/%y')
build = pd.DatetimeIndex(build)
final = build.to_series()
one = final.resample('M',how='count')
You can groupby the time column directly:
df.groupby(pd.Grouper(key='date_build', freq='M')).size()
Only a little faster in terms of runtime - all those conversion operations are pretty fast.
In [198]: df = pd.DataFrame({'date_build': pd.date_range('1900-1-1', periods=100000)})
In [199]: %timeit pd.DatetimeIndex(df.date_build).to_series().resample('M', how='count')
10 loops, best of 3: 149 ms per loop
In [200]: %timeit df.groupby(pd.Grouper(key='date_build', freq='M')).size()
10 loops, best of 3: 136 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With