These are my DataFrames:
import pandas as pd
df1 = pd.DataFrame(
{
'close': [100, 150, 200, 55, 69, 221, 2210, 111, 120, 140, 150, 170],
'date': [
'2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
'2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08',
'2024-01-09', '2024-01-10', '2024-01-11', '2024-01-12',
]
}
)
df2 = pd.DataFrame(
{
'group': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
'close': [100, 105, 112, 117, 55, 65, 221, 211],
'date': [
'2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04',
'2024-01-05', '2024-01-06', '2024-01-07', '2024-01-08'
],
'extend': [
'2024-01-09', '2024-01-09', '2024-01-09', '2024-01-09',
'2024-01-11', '2024-01-11', '2024-01-11', '2024-01-11'
],
}
)
And this is the expected output. I want to extend df2
for each group in group
column:
group close date extend
0 a 100 2024-01-01 2024-01-09
1 a 105 2024-01-02 2024-01-09
2 a 112 2024-01-03 2024-01-09
3 a 117 2024-01-04 2024-01-09
4 a 69 2024-01-05 2024-01-09
5 a 221 2024-01-06 2024-01-09
6 a 2210 2024-01-07 2024-01-09
7 a 111 2024-01-08 2024-01-09
8 a 120 2024-01-09 2024-01-09
7 b 55 2024-01-05 2024-01-11
8 b 65 2024-01-06 2024-01-11
9 b 221 2024-01-07 2024-01-11
10 b 211 2024-01-08 2024-01-11
11 b 120 2024-01-09 2024-01-11
12 b 140 2024-01-10 2024-01-11
13 b 150 2024-01-11 2024-01-11
The logic is:
Each group in df2
has a fixed extend
date. This is the basically the date that each group should be extended using df1
.
For example for group a
, The data should be extended from 2024-01-04 to 2024-01-09. The start point of extending is basically df2.date.iloc[-1]
for each group and the end is the extend
.
This is my attempt that didn't work:
import janitor
def func(df2, df1):
df2['extend_start'] = df2.date.iloc[-1]
df2['extend_start'] = pd.to_datetime(df2.extend_start)
df3 = df2.conditional_join(
df1,
('extend_start', 'date', '<'),
('extend', 'date', '>')
)
return df3
df1['date'] = pd.to_datetime(df1.date)
df2['extend'] = pd.to_datetime(df2.extend)
out = df2.groupby('group').apply(func, df1=df1)
You could add the missing dates with groupby.apply
, then map
the unknown dates:
# ensure datetime
df1['date'] = pd.to_datetime(df1['date'])
df2[['date', 'extend']] = df2[['date', 'extend']].apply(pd.to_datetime)
# fill missing dates
# map missing one with values of df1
out = (df2.set_index('date').groupby('group')
.apply(lambda x: x.reindex(pd.date_range(min(x.index.min(), x['extend'].min()),
max(x.index.max(), x['extend'].max()),
)).rename_axis('date')
.drop(columns=['group'])
.assign(extend=lambda x: x['extend'].ffill(),)
)
.reset_index()
.assign(close=lambda x: x['close'].fillna(x['date'].map(df1.set_index('date')['close'])))
)
Output:
group date close extend
0 a 2024-01-01 100.0 2024-01-09
1 a 2024-01-02 105.0 2024-01-09
2 a 2024-01-03 112.0 2024-01-09
3 a 2024-01-04 117.0 2024-01-09
4 a 2024-01-05 69.0 2024-01-09
5 a 2024-01-06 221.0 2024-01-09
6 a 2024-01-07 2210.0 2024-01-09
7 a 2024-01-08 111.0 2024-01-09
8 a 2024-01-09 120.0 2024-01-09
9 b 2024-01-05 55.0 2024-01-11
10 b 2024-01-06 65.0 2024-01-11
11 b 2024-01-07 221.0 2024-01-11
12 b 2024-01-08 211.0 2024-01-11
13 b 2024-01-09 120.0 2024-01-11
14 b 2024-01-10 140.0 2024-01-11
15 b 2024-01-11 150.0 2024-01-11
Intermediate before the map
step:
group date close extend
0 a 2024-01-01 100.0 2024-01-09
1 a 2024-01-02 105.0 2024-01-09
2 a 2024-01-03 112.0 2024-01-09
3 a 2024-01-04 117.0 2024-01-09
4 a 2024-01-05 NaN 2024-01-09
5 a 2024-01-06 NaN 2024-01-09
6 a 2024-01-07 NaN 2024-01-09
7 a 2024-01-08 NaN 2024-01-09
8 a 2024-01-09 NaN 2024-01-09
9 b 2024-01-05 55.0 2024-01-11
10 b 2024-01-06 65.0 2024-01-11
11 b 2024-01-07 221.0 2024-01-11
12 b 2024-01-08 211.0 2024-01-11
13 b 2024-01-09 NaN 2024-01-11
14 b 2024-01-10 NaN 2024-01-11
15 b 2024-01-11 NaN 2024-01-11
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With