I am trying to insert a nested json into Postgres using pyspark. I use dataframe.
this my schema
|-- info: struct (nullable = true)
| |-- Id: string (nullable = true)
| |-- name: string (nullable = true)
| |-- version: long (nullable = true)
| |-- label: string (nullable = true)
| |-- params: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- text: string (nullable = true)
| | | |-- entity: string (nullable = true)
| | | |-- input: struct (nullable = true)
| | | | |-- format: string (nullable = true)
| | | | |-- maxLength: long (nullable = true)
| | | | |-- patterns: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | | |-- prompt: struct (nullable = true)
| | | | |-- lang: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | | |-- sample: string (nullable = true)
| | | |-- strategy: string (nullable = true)
| | | |-- type: string (nullable = true)
after creating my dataframe, whenever I try to write to postgreSQL using df.write() method I get the error
pyspark.sql.utils.IllegalArgumentException: u"Can't get JDBC type for struct<>
should I convert the json to string? I tried this with explode function but since its a deeply nested json it didn't help. is there a work around for this? I am new to this so any inputs would help.
I figured out using to_json function is a work around.
from pyspark.sql.functions import to_json, struct
df.select(to_json(struct(struct([df[x] for x in df.columns]))).alias("jsonobject")
if anyone has a better solution let know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With