I want to create a pandas dataframe from a given start and end date:
import pandas as pd
from pandas.tseries.offsets import MonthEnd
start_date = "2020-05-17"
end_date = "2020-07-23"
For each row in this dataframe, I should have the start day and end day of the month, so the expected output is:
start end month year
2020-05-17 2020-05-31 May 2020
2020-06-01 2020-06-30 June 2020
2020-07-01 2020-07-23 July 2020
I know I have to loop over each month between the interval created by start_date and end_date. While I know how to extract the last day in a date:
def last_day(date: str):
return pd.Timestamp(date) + MonthEnd(1)
I'm stuck over how to run this over the interval. Any suggestion will be appreciated.
You can use pd.date_range and pd.to_datetime:
start = pd.to_datetime([start_date] + pd.date_range(start_date, end_date, freq='MS').tolist())
end = pd.to_datetime(pd.date_range(start_date, end_date, freq='M').tolist() + [end_date])
month = start.strftime('%B')
year = start.year
df = pd.DataFrame({'start': start, 'end': end, 'month': month, 'year': year})
Output:
>>> df
start end month year
0 2020-05-17 2020-05-31 May 2020
1 2020-06-01 2020-06-30 June 2020
2 2020-07-01 2020-07-23 July 2020
You can transform the output from .isocalendar():
r = pd.date_range(start_date, end_date, freq="D").isocalendar()
out = (
r.assign(month=r.index.month)
.reset_index()
.groupby(["year", "month"])["index"]
.agg(("first", "last"))
.reset_index()
)
print(out)
Prints:
year month first last
0 2020 5 2020-05-17 2020-05-31
1 2020 6 2020-06-01 2020-06-30
2 2020 7 2020-07-01 2020-07-23
To have string month names:
out = out.rename(columns={'first':'start', 'last':'end'})
out['month'] = pd.to_datetime(out['month'], format='%m').dt.strftime('%b')
print(out)
Prints:
year month start end
0 2020 May 2020-05-17 2020-05-31
1 2020 Jun 2020-06-01 2020-06-30
2 2020 Jul 2020-07-01 2020-07-23
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With