I need to read integer format nullable date values ('YYYYMMDD') to pandas and then save this pandas dataframe to Parquet as a Date32[Day] format in order for Athena Glue Crawler classifier to recognize that column as a date. The code below does not allow me to save the column to parquet from pandas:
import pandas as pd
dates = [None, "20200710", "20200711", "20200712"]
data_df = pd.DataFrame(dates, columns=['date'])
data_df['date'] = pd.to_datetime(data_df['date']).dt.date
data_df.to_parquet(r'my_path', engine='pyarrow')
I receive this error below:
Traceback (most recent call last):
File "", line 123, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 265, in pyarrow.lib.array
File "pyarrow\array.pxi", line 80, in pyarrow.lib._ndarray_to_array
TypeError: an integer is required (got type datetime.date)
If I move the None
value towards the end of the date list, this will work without any issue and pyarrow would infer the date column as Date32[Day]
. My guess is that since the Pandas column type for dt.date
is object
plus the first value for the column is NaT
(not a time), pyarrow is not able to infer the column as Date32[Day]
from Pandas dataframe or some sample value, it infers the column as Integer
instead. What is a good way to save this dataframe column to parquet as a Date32[Day]
column without sorting the column values? Thanks.
You are right. As the first value is NaT, you need to remove it without changing the datatype. I used the below code.
import pandas as pd
dates = [None, "20200710", "20200711", "20200712"]
data_df = pd.DataFrame(dates, columns=['date'])
data_df['date'] = pd.to_datetime(data_df['date']).dt.date
# In addition, add this line to remove NaT without changing type
# Change strfttime as you want (I have used YMD)
data_df['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in data_df['date']]
data_df.to_parquet(r'my_path', engine='pyarrow')
I hope this works for you and the error is solved.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With