The first possible parquet timestamp, 0001-01-01 00:00:00.000 cannot be loaded with Azure Synapse Serverless SQL Pools, as DATETIME2 or any other type. I'm baffled bc the DATETIME2 docs say that this is within the acceptable range:
Property Value Date range 0001-01-01 through 9999-12-31 January 1,1 CE through December 31, 9999 CE Time range 00:00:00 through 23:59:59.9999999
user @RandolphWest found this relevant thread that gets the same error message but they lay the blame on ADF.
Here's how to make a file that will reproduce the error. Alternatively, here's a Dropbox link to the file
from pyspark.sql.functions import to_timestamp
dfx = (
spark.createDataFrame(
data = [ ("1","0001-01-01 00:00:00.000")],
schema=["id","my_ts"])
.withColumn("my_ts",to_timestamp("my_ts"))
)
dfx.write.parquet('abfss://<MY-CONTAINER>@<MY-WORKSPACE>.dfs.core.windows.net/dfx')
display(dfx)
+---+-------------------+
| id| my_ts|
+---+-------------------+
| 1|0001-01-01 00:00:00|
+---+-------------------+
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://<MY-WORKSPACE>.dfs.core.windows.net/<MY-CONTAINER>/dfx/**',
FORMAT = 'PARQUET'
) AS [result]
Error message
Failed to execute query.
Error: Error handling external file: 'Inserting value to batch for column type DATETIME2 failed.
Invalid argument provided.'.
File/External table name: 'https://<MY-WORKSPACE>.dfs.core.windows.net/<MY-CONTAINER>/dfx/part-00015-f98c11bd-ffb4-45de-b505-0aa226183098-c000.snappy.parquet'.
Statement ID: {F3062AA6-DD96-4271-A093-6D24F8E0C2FE} | Query hash: 0x9CCB08D8E9D29322 | Distributed request ID: {D0BA926E-FAB2-47DD-8309-BE91DFAC48F8}.
Total size of data scanned is 1 megabytes, total size of data moved is 0 megabytes, total size of data written is 0 megabytes.
I had the same error, I've solved this issue saving as datatype string (so do not use to_timestamp) in the adls and then converting to datetime2 into the Synapse query. This sound strange but it actually works. using your code it will be something like this:
SELECT
cast(my_ts as datetime2) as my_ts
FROM
OPENROWSET(
BULK 'https://<MY-WORKSPACE>.dfs.core.windows.net/<MY-CONTAINER>/dfx/**',
FORMAT = 'PARQUET'
) WITH (my_ts as nvarchar(27)) AS [result]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With