Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data manipulation startdate enddate python pandas

Tags:

python

pandas

I have a promotion description dataset with the information about various promotions running and their start date-end date:

promo        item      start_date      end_date 

Buy1-get 1     A        2015-01-08    2015-01-12

Buy1-get 1     A        2015-02-16    2015-02-20

Buy1-40% off   B        2016-05-08      2016-05-09

Now I want to organise my data for subsequent analysis such that I have only single variable date with promo information for it.

date            item       Promo

2015-01-08       A         Buy1-get 1 

2015-01-09       A         Buy1-get 1 
2015-01-10       A         ......
2015-01-11       ....
2015-01-12

2015-02-16       A         Buy1-get 1 
2015-02-17       A         Buy1-get 1 
2015-02-18       ....      .......
2015-02-19       .....

..........
2016-05-09       B         Buy1-40% off

Any help would be really appreciated.

like image 468
user2906657 Avatar asked Nov 28 '25 20:11

user2906657


2 Answers

You can use concat all Seriess created by date_range with itertuples and then join columns promo and item:

df1 = pd.concat([pd.Series(r.Index,
                           pd.date_range(r.start_date,r.end_date)) for r in df.itertuples()])
        .reset_index()
df1.columns = ['date','idx']
df1 = df1.set_index('idx')
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
         date item         promo
0  2015-01-08    A    Buy1-get 1
1  2015-01-09    A    Buy1-get 1
2  2015-01-10    A    Buy1-get 1
3  2015-01-11    A    Buy1-get 1
4  2015-01-12    A    Buy1-get 1
5  2015-02-16    A    Buy1-get 1
6  2015-02-17    A    Buy1-get 1
7  2015-02-18    A    Buy1-get 1
8  2015-02-19    A    Buy1-get 1
9  2015-02-20    A    Buy1-get 1
10 2016-05-08    B  Buy1-40% off
11 2016-05-09    B  Buy1-40% off

Another solution with melt and groupby with resample:

df1 = df.reset_index().rename(columns={'index':'idx'})
df1 = pd.melt(df1, id_vars='idx', value_vars=['start_date','end_date'], value_name='date')
        .set_index('date')
df1 = df1.groupby('idx')
         .resample('d')
         .ffill()
         .reset_index(level=1)
         .drop(['idx','variable'], axis=1)
df1 = df1.join(df[['item','promo']]).reset_index(drop=True)
print (df1)
         date item         promo
0  2015-01-08    A    Buy1-get 1
1  2015-01-09    A    Buy1-get 1
2  2015-01-10    A    Buy1-get 1
3  2015-01-11    A    Buy1-get 1
4  2015-01-12    A    Buy1-get 1
5  2015-02-16    A    Buy1-get 1
6  2015-02-17    A    Buy1-get 1
7  2015-02-18    A    Buy1-get 1
8  2015-02-19    A    Buy1-get 1
9  2015-02-20    A    Buy1-get 1
10 2016-05-08    B  Buy1-40% off
11 2016-05-09    B  Buy1-40% off
like image 62
jezrael Avatar answered Dec 01 '25 08:12

jezrael


You could let the non-date columns take on the index-axis as they would vary in the direction w.r.t any changes made to the date cols. This would be handy during the grouping operation.

Specifying append=True helps to keep track of the index which the original DF had before. stack them so that the start_date and the end_date fall below one another.

mi_ser = df.set_index(['promo', 'item'], append=True).stack()
grouper = mi_df.index.get_level_values(0)

Construct a new Series having it's values and indices same as the values present in our previous multi-index series object.

Perform groupby w.r.t first index level and resample the values according to daily frequency and choose any form of aggregation function compatible with datetime values. [Here, first is chosen]

s = pd.Series(mi_ser.values, mi_ser.values).groupby(grouper).resample('D').first()
idx, val = s.index.get_level_values(0), s.index.get_level_values(1)

Again, construct a new dataframe this time having it's values and indices as defined below. Since the indices are aligned, we could join the two dataframes along them.

pd.DataFrame(val, idx, ['date']).join(df[['item', 'promo']]).reset_index(drop=True)

enter image description here

like image 41
Nickil Maveli Avatar answered Dec 01 '25 10:12

Nickil Maveli