Can try in spark-shell
case class Employee(id: Int, name: String, department: String, salary: Option[Double])
import org.apache.spark.sql.functions._
import spark.implicits._
case class Employee(id: Int, name: String, department: String, salary: Option[Double])
val data = List(Employee(1, "XYZ", "dep1", Some(1234.0)), Employee(0, null, "unknown", None)).toDS()
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary")).as("json_data")).show(false)
return =>
|id |json_data |
+---+---------------------------------------------------------+
|1 |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}|
|0 |{"id":0,"department":"unknown"} |
expecting =>
|id |json_data |
+---+------------------------------------------------------------+
|1 |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0} |
|0 |{"id":0,"name": null, "department":"unknown","salary":null} |
null fields(name & salary) also should be populated in resulting json. I don't want to use lit("null") to populate null values
A feature was recently added to preserve null values when generating JSON, and should be available in the upcoming Spark 3.0 release. See SPARK-29444 for details. In 3.0, you'll be able to control this via:
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)
AFAIK, there are no plans at present to add this to the 2.x branch.
A feature was recently added to preserve null values when generating JSON, and should be available in the upcoming Spark 3.0 release. See SPARK-29444 for details. In 3.0, you'll be able to control this via:
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)
AFAIK, there are no plans at present to add this to the 2.x branch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With