import pandas as pd
import numpy as np
one = pd.read_csv('data1.csv')
two = pd.read_csv('data2.csv')
I wrote codes so,and one shows
A Date
10 2011-01-03
20 2011-01-04
10 2011-01-06
20 2011-01-07
30 2011-01-10
40 2011-01-13
25 2011-01-15
・
・
・
two shows
B Date
15 2011-01-01
15 2011-01-02
15 2011-01-03
25 2011-01-07
35 2011-01-10
10 2011-01-13
25 2011-01-15
・
・
・
I want to put 0 to missing date’s data when data frame is marged. Now I wrote codes
one_and_two = pd.merge(one, two, on='Date', how='inner')
print(one_and_two)
and run,one_and_two is
A Date B
0 10 2011-01-03 15
1 20 2011-01-07 25
2 30 2011-01-10 35
3 40 2011-01-13 10
4 25 2011-01-15 25
・
・
・
Ideal output is
A Date B
0 0 2011-01-01 15
1 0 2011-01-02 15
2 10 2011-01-03 15
3 20 2011-01-04 0
4 0 2011-01-05 0
5 10 2011-01-06 0
6 20 2011-01-07 25
7 0 2011-01-08 0
8 0 2011-01-09 0
9 30 2011-01-10 35
・
・
・
Dataframe has 2011-01-01 〜2011-12-31 , I want to put 0 to missing date’s data ,but how can I do it?What is wrong in my codes?
Use outer join with reindex by defined date ranges:
df = (pd.merge(one, two, on='Date', how='outer')
.fillna(0)
.sort_values('Date')
.set_index('Date'))
df = (df.reindex(pd.date_range('2011-01-01', '2011-12-31'), name='Date'), fill_value=0)
.reset_index()
.reindex(columns=['A','Date','B']))
Or by minimal and maximal dates:
df = (df.reindex(pd.date_range(df.index.min(), df.index.max(), name='Date'), fill_value=0)
.reset_index()
.reindex(columns=['A','Date','B']))
print (df)
A Date B
0 0.0 2011-01-01 15.0
1 0.0 2011-01-02 15.0
2 10.0 2011-01-03 15.0
3 20.0 2011-01-04 0.0
4 0.0 2011-01-05 0.0
5 10.0 2011-01-06 0.0
6 20.0 2011-01-07 25.0
7 0.0 2011-01-08 0.0
8 0.0 2011-01-09 0.0
9 30.0 2011-01-10 35.0
10 0.0 2011-01-11 0.0
11 0.0 2011-01-12 0.0
12 40.0 2011-01-13 10.0
13 0.0 2011-01-14 0.0
14 25.0 2011-01-15 25.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With