So I have a data frame like this.
df1 = pd.DataFrame({'period':['2021-02'], 'customer':['A'], 'product':['Apple'], 'sales_flag': ['Yes']})
df2 = pd.DataFrame({'period':['2021-03', '2021-04', '2021-04'], 'customer':['A', 'A', 'A'], 'product':['Banana', 'Apple', 'Tangerine'], 'feedback_flag': ['Yes', 'Yes', 'No']})
I want to join the data frame like this.
period customer product sales_flag feedback_flag
'2021-02' A Apple Yes NULL
'2021-02' A Banana NULL NULL
'2021-02' A Tangerine NULL NULL
'2021-03' A Apple NULL NULL
'2021-03' A Banana NULL Yes
'2021-03' A Tangerine NULL NULL
'2021-04' A Apple NULL Yes
'2021-04' A Banana NULL NULL
'2021-04' A Tangerine NULL No
My code like this. But it didn't work.
df3 = df1.merge(df2, on = ['period', 'customer', 'product'], how = 'outer')
Do you have any idea how to make it work?
Try outer merge followed by a groupby apply to reindex based on unique products then ffill + bfill to fill out period and customer:
def reindex_group(g, idx):
g = g.set_index('product').reindex(idx)
g[['period', 'customer']] = g[['period', 'customer']].ffill().bfill()
return g
df3 = df1.merge(df2, on=['period', 'customer', 'product'], how='outer')
products = df3['product'].unique()
df3 = (
df3.groupby(['period', 'customer'], as_index=False)
.apply(reindex_group, idx=products)
.reset_index()
.drop(columns='level_0')
)[['period', 'customer', 'product', 'sales_flag', 'feedback_flag']]
df3:
period customer product sales_flag feedback_flag
0 2021-02 A Apple Yes NaN
1 2021-02 A Banana NaN NaN
2 2021-02 A Tangerine NaN NaN
3 2021-03 A Apple NaN NaN
4 2021-03 A Banana NaN Yes
5 2021-03 A Tangerine NaN NaN
6 2021-04 A Apple NaN Yes
7 2021-04 A Banana NaN NaN
8 2021-04 A Tangerine NaN No
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With