I've got 2 parquets files.
The first one contains the following column: DECIMAL: decimal(38,18) (nullable = true)
The second one has the same column, but with a different type: DECIMAL: integer (nullable = true)
I want to merge them, but I can't simply read them separatedly and throw a cast into the specific column, because this is part of an app that receives lots of distinct parquet schemas. I need something that would cover every scenario.
I am reading both like this:
df = spark.read.format("parquet").load(['path_to_file_one', 'path_to_file_2'])
It fails with the error below when I try to display the data
Parquet column cannot be converted. Column: [DECIMAL], Expected: DecimalType(38,18), Found: INT32
I am using Azure Databricks with the following configs:
I have uploaded the parquet files here: https://easyupload.io/m/su37e8
Is there anyway I can force spark to autocast null columns into the type of the same column in the other dataframe?
It should be easy, all the columns are nullable...
This is expected if you are providing external schema with column datatype definition as a decimal and that column contains decimal(38,18).

We found that it's a limitation with the spark. Columns with datatype decimal(38,18).
Try df.show() to display the results.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With