insert nested json object to PostgreSQL using pyspark

Question

I am trying to insert a nested json into Postgres using pyspark. I use dataframe.

this my schema

 |-- info: struct (nullable = true)
 |    |-- Id: string (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- version: long (nullable = true)
 |    |-- label: string (nullable = true)
 |    |-- params: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- text: string (nullable = true)
 |    |    |    |-- entity: string (nullable = true)
 |    |    |    |-- input: struct (nullable = true)
 |    |    |    |    |-- format: string (nullable = true)
 |    |    |    |    |-- maxLength: long (nullable = true)
 |    |    |    |    |-- patterns: array (nullable = true)
 |    |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- prompt: struct (nullable = true)
 |    |    |    |    |-- lang: array (nullable = true)
 |    |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- sample: string (nullable = true)
 |    |    |    |-- strategy: string (nullable = true)
 |    |    |    |-- type: string (nullable = true)

after creating my dataframe, whenever I try to write to postgreSQL using df.write() method I get the error

pyspark.sql.utils.IllegalArgumentException: u"Can't get JDBC type for struct<>

should I convert the json to string? I tried this with explode function but since its a deeply nested json it didn't help. is there a work around for this? I am new to this so any inputs would help.

user9484528 · Accepted Answer

I figured out using to_json function is a work around.

from pyspark.sql.functions import to_json, struct
df.select(to_json(struct(struct([df[x] for x in df.columns]))).alias("jsonobject")

if anyone has a better solution let know.

insert nested json object to PostgreSQL using pyspark

Tags:

postgresql-9.1

apache-spark

apache-spark-sql

pyspark

user9484528

1 Answers

user9484528

Recent Activity

Donate For Us

insert nested json object to PostgreSQL using pyspark

Tags:

postgresql-9.1

apache-spark

apache-spark-sql

pyspark

user9484528

1 Answers

user9484528

Related questions

Recent Activity

Donate For Us