I have a promotion description dataset with the information about various promotions running and their start date-end date:
promo item start_date end_date
Buy1-get 1 A 2015-01-08 2015-01-12
Buy1-get 1 A 2015-02-16 2015-02-20
Buy1-40% off B 2016-05-08 2016-05-09
Now I want to organise my data for subsequent analysis such that I have only single variable date with promo information for it.
date item Promo
2015-01-08 A Buy1-get 1
2015-01-09 A Buy1-get 1
2015-01-10 A ......
2015-01-11 ....
2015-01-12
2015-02-16 A Buy1-get 1
2015-02-17 A Buy1-get 1
2015-02-18 .... .......
2015-02-19 .....
..........
2016-05-09 B Buy1-40% off
Any help would be really appreciated.
You can use concat all Seriess created by date_range with itertuples and then join columns promo and item:
df1 = pd.concat([pd.Series(r.Index,
pd.date_range(r.start_date,r.end_date)) for r in df.itertuples()])
.reset_index()
df1.columns = ['date','idx']
df1 = df1.set_index('idx')
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
date item promo
0 2015-01-08 A Buy1-get 1
1 2015-01-09 A Buy1-get 1
2 2015-01-10 A Buy1-get 1
3 2015-01-11 A Buy1-get 1
4 2015-01-12 A Buy1-get 1
5 2015-02-16 A Buy1-get 1
6 2015-02-17 A Buy1-get 1
7 2015-02-18 A Buy1-get 1
8 2015-02-19 A Buy1-get 1
9 2015-02-20 A Buy1-get 1
10 2016-05-08 B Buy1-40% off
11 2016-05-09 B Buy1-40% off
Another solution with melt and groupby with resample:
df1 = df.reset_index().rename(columns={'index':'idx'})
df1 = pd.melt(df1, id_vars='idx', value_vars=['start_date','end_date'], value_name='date')
.set_index('date')
df1 = df1.groupby('idx')
.resample('d')
.ffill()
.reset_index(level=1)
.drop(['idx','variable'], axis=1)
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
date item promo
0 2015-01-08 A Buy1-get 1
1 2015-01-09 A Buy1-get 1
2 2015-01-10 A Buy1-get 1
3 2015-01-11 A Buy1-get 1
4 2015-01-12 A Buy1-get 1
5 2015-02-16 A Buy1-get 1
6 2015-02-17 A Buy1-get 1
7 2015-02-18 A Buy1-get 1
8 2015-02-19 A Buy1-get 1
9 2015-02-20 A Buy1-get 1
10 2016-05-08 B Buy1-40% off
11 2016-05-09 B Buy1-40% off
You could let the non-date columns take on the index-axis as they would vary in the direction w.r.t any changes made to the date cols. This would be handy during the grouping operation.
Specifying append=True helps to keep track of the index which the original DF had before. stack them so that the start_date and the end_date fall below one another.
mi_ser = df.set_index(['promo', 'item'], append=True).stack()
grouper = mi_df.index.get_level_values(0)
Construct a new Series having it's values and indices same as the values present in our previous multi-index series object.
Perform groupby w.r.t first index level and resample the values according to daily frequency and choose any form of aggregation function compatible with datetime values. [Here, first is chosen]
s = pd.Series(mi_ser.values, mi_ser.values).groupby(grouper).resample('D').first()
idx, val = s.index.get_level_values(0), s.index.get_level_values(1)
Again, construct a new dataframe this time having it's values and indices as defined below.
Since the indices are aligned, we could join the two dataframes along them.
pd.DataFrame(val, idx, ['date']).join(df[['item', 'promo']]).reset_index(drop=True)

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With