I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.
ValueError: to assemble mappings requires at least that [year, month, day] be sp
ecified: [day,month,year] is missing
first dte.head()
0  10/14/2016  10/17/2016  10/19/2016    8/9/2016  10/17/2016   7/20/2016
1   7/15/2016   7/18/2016   7/20/2016    6/7/2016   7/18/2016   4/19/2016
2   4/15/2016   4/14/2016   4/18/2016   3/15/2016   4/18/2016   1/14/2016
3   1/15/2016   1/19/2016   1/19/2016  10/19/2015   1/19/2016  10/13/2015
4  10/15/2015  10/14/2015  10/19/2015   7/23/2015  10/14/2015   7/15/2015
this dataframe converts fine using the following code:
dte = pd.to_datetime(dte, infer_datetime_format=True)
or
dte = pd.to_datetime(dte[x], format='%m/%d/%Y')
the second dtd.head()
0   2004-01-02 2004-01-02  2004-01-09 2004-01-16  2004-01-23  2004-01-30
1   2004-01-05 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
2   2004-01-06 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
3   2004-01-07 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
4   2004-01-08 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
this csv doesn't convert using either:
dtd = pd.to_datetime(dtd, infer_datetime_format=True)
or
dtd = pd.to_datetime(dtd, format='%Y-%m-%d')
It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?
You can stack / pd.to_datetime / unstack
pd.to_datetime(dte.stack()).unstack()

explanationpd.to_datetime works on a string, list, or pd.Series. dte is a pd.DataFrame and is why you are having issues. dte.stack() produces a a pd.Series where all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series, I can get a vectorized pd.to_datetime to work on it. the subsequent unstack simply reverses the initial stack to get the original form of dte
For me works apply function to_datetime:
print (dtd)
            1           2           3           4           5           6
0                                                                        
0  2004-01-02  2004-01-02  2004-01-09  2004-01-16  2004-01-23  2004-01-30
1  2004-01-05  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
2  2004-01-06  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
3  2004-01-07  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
4  2004-01-08  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
dtd = dtd.apply(pd.to_datetime)
print (dtd)
           1          2          3          4          5          6
0                                                                  
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
It works for me:
dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))
This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With