Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark - Saving Hive Table - org.apache.spark.SparkException: Cannot recognize hive type string

I am saving a spark dataframe to a hive table. The spark dataframe is a nested json data structure. I am able to save the dataframe as files but it fails at the point where it creates a hive table on top of it saying org.apache.spark.SparkException: Cannot recognize hive type string

I cannot create a hive table schema first and then insert into it since the data frame consists of a couple hundreds of nested columns.

So I am saving it as:

df.write.partitionBy("dt","file_dt").saveAsTable("df")

I am not able to debug what the issue this.

like image 913
Gayatri Avatar asked Sep 06 '25 03:09

Gayatri


1 Answers

The issue I was having was to do with a few columns which were named as numbers "1","2","3". Removing such columns in the dataframe let me create a hive table without any errors.

like image 95
Gayatri Avatar answered Sep 09 '25 21:09

Gayatri