Convert string column to json and parse in pyspark

Question

My dataframe looks like

|ID|Notes|
---------------
|1|'{"Country":"USA","Count":"1000"}'|
|2|{"Country":"USA","Count":"1000"}|

ID : int
Notes : string

When i use from_json to parse the column Notes, it gives all Null values. I need help in parsing this column Notes into columns in pyspark

Saideep Arikontham · Accepted Answer

When you are using from_json() function, make sure that the column value is exactly a json/dictionary in String format. In the sample data you have given, the Notes column value with id=1 is not exactly in json format (it is a string but enclosed within additional single quotes). This is the reason it is returning NULL values. Implementing the following code on the input dataframe gives the following output.

df = df.withColumn("Notes",from_json(df.Notes,MapType(StringType(),StringType())))

enter image description here

You need to change your input data such that the entire Notes column is in same format which is json/dictionary as a string and nothing more because it is the main reason for the issue. The below is the correct format that helps you to fix your issue.

| ID | Notes |
---------------
| 1 | {"Country":"USA","Count":"1000"} |
| 2 | {"Country":"USA","Count":"1000"} |

To parse Notes column values as columns in pyspark, you can simply use function called json_tuple() (no need to use from_json()). It extracts the elements from a json column (string format) and creates the result as new columns.

df = df.select(col("id"),json_tuple(col("Notes"),"Country","Count")) \
    .toDF("id","Country","Count")
df.show()

Output:

enter image description here

NOTE: json_tuple() also returns null if the column value is not in the correct format (make sure the column values are json/dictionary as a string without additional quotes).

Convert string column to json and parse in pyspark

Tags:

json

dictionary

pyspark

azure-databricks

KM Kavia

1 Answers

Saideep Arikontham

Recent Activity

Donate For Us

Convert string column to json and parse in pyspark

Tags:

json

dictionary

pyspark

azure-databricks

KM Kavia

1 Answers

Saideep Arikontham

Related questions

Recent Activity

Donate For Us