Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why spark to_json() not populating null values?

Can try in spark-shell

case class Employee(id: Int, name: String, department: String, salary: Option[Double])
import org.apache.spark.sql.functions._
import spark.implicits._

case class Employee(id: Int, name: String, department: String, salary: Option[Double])
val data = List(Employee(1, "XYZ", "dep1", Some(1234.0)), Employee(0, null, "unknown", None)).toDS()
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary")).as("json_data")).show(false)

return =>

|id |json_data                                                |
+---+---------------------------------------------------------+
|1  |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}|
|0  |{"id":0,"department":"unknown"}                          |

expecting =>

|id |json_data                                                   |
+---+------------------------------------------------------------+
|1  |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}   |
|0  |{"id":0,"name": null, "department":"unknown","salary":null} |

null fields(name & salary) also should be populated in resulting json. I don't want to use lit("null") to populate null values

like image 446
Kishorekumar Yakkala Avatar asked Oct 26 '25 05:10

Kishorekumar Yakkala


2 Answers

A feature was recently added to preserve null values when generating JSON, and should be available in the upcoming Spark 3.0 release. See SPARK-29444 for details. In 3.0, you'll be able to control this via:

data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)

AFAIK, there are no plans at present to add this to the 2.x branch.

like image 121
Charlie Flowers Avatar answered Oct 29 '25 07:10

Charlie Flowers


A feature was recently added to preserve null values when generating JSON, and should be available in the upcoming Spark 3.0 release. See SPARK-29444 for details. In 3.0, you'll be able to control this via:

data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)

AFAIK, there are no plans at present to add this to the 2.x branch.

like image 31
Charlie Flowers Avatar answered Oct 29 '25 09:10

Charlie Flowers



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!