Spark is not loading all multiline json objects in a single file even with multiline option set to true

Question

My json file looks like below, it has got two multiline json objects (in a single file)

{
    "name":"John Doe",
    "id":"123456"
}
{
    "name":"Jane Doe",
    "id":"456789"
}

So when i load multiline json dataframe it should load two json instead it is loading first json object only. How can i load all the multiline json objects in a single file?

val rawData = spark.read.option("multiline", true).option("mode", "PERMISSIVE").format("json").load("/tmp/search/baggage/test/1")
scala> rawData.show
+------+--------+
|    id|    name|
+------+--------+
|123456|John Doe|
+------+--------+

scala> rawData.count
res20: Long = 1

blackbishop · Accepted Answer

Your input JSON is not valid, it misses brackets as you have multiples objects. You can check this using any json validator tool. That's why multiLine option won't work in this case.

That said, I think you want to use JsonLines format where each line represents a JSON object.

{"name":"John Doe","id":"123456"}
{"name":"Jane Doe","id":"456789"}

Spark can read this JSON without setting multiline option:

val df = spark.read.json("file:///your/json/file.json")
df.show()

Output :

+------+--------+
|    id|    name|
+------+--------+
|123456|John Doe|
|456789|Jane Doe|
+------+--------+

Spark is not loading all multiline json objects in a single file even with multiline option set to true

Tags:

apache-spark

apache-spark-sql

Despicable me

1 Answers

blackbishop

Recent Activity

Donate For Us

Spark is not loading all multiline json objects in a single file even with multiline option set to true

Tags:

apache-spark

apache-spark-sql

Despicable me

1 Answers

blackbishop

Related questions

Recent Activity

Donate For Us