Why Spark SQL translates String "null" to Object null for Float/Double types?

Question

I have a dataframe containing float and double values.

scala> val df = List((Float.NaN, Double.NaN), (1f, 0d)).toDF("x", "y")
df: org.apache.spark.sql.DataFrame = [x: float, y: double]

scala> df.show
+---+---+
|  x|  y|
+---+---+
|NaN|NaN|
|1.0|0.0|
+---+---+

scala> df.printSchema
root
 |-- x: float (nullable = false)
 |-- y: double (nullable = false)

When I replace NaN values with null value, I gave null as String to the Map in fill operation.

scala> val map = df.columns.map((_, "null")).toMap
map: scala.collection.immutable.Map[String,String] = Map(x -> null, y -> null)

scala> df.na.fill(map).printSchema
root
 |-- x: float (nullable = true)
 |-- y: double (nullable = true)


scala> df.na.fill(map).show
+----+----+
|   x|   y|
+----+----+
|null|null|
| 1.0| 0.0|
+----+----+

And I got correct value. But I was not able to understand as to How/Why Spark SQL is translating null as a String to a null object ?

koiralo · Accepted Answer

If you looked in to the fill function in Dataset, It checks the datatype and tries to convert to datatype of its column's schema. If it can be converted then it converts otherwise it returns null.

It does not convert to "null" to object null but it returns null if exception occurs while converting.

val map = df.columns.map((_, "WHATEVER")).toMap

gives null

and val map = df.columns.map((_, "9999.99")).toMap

gives 9999.99

If you want to update the NAN with same datatype, you can get result as expected.

Hope this helps you to understand!

Shaido · Answer

It's not that "null" as a String translates to a null object. You can try using the transformation with any String and still get null (with the exception of strings that can directly be cast to double/float, see below). For example, using

val map = df.columns.map((_, "abc")).toMap

would give the same result. My guess is that as the columns are of type float and double converting the NaN values to a string will give null. Using a number instead would work as expected, e.g.

val map = df.columns.map((_, 1)).toMap

As some strings can directly be cast to double or float those are also possible to use in this case.

val map = df.columns.map((_, "1")).toMap

Why Spark SQL translates String "null" to Object null for Float/Double types?

Tags:

scala

apache-spark

apache-spark-sql

himanshuIIITian

2 Answers

koiralo

Shaido

Recent Activity

Donate For Us

Why Spark SQL translates String "null" to Object null for Float/Double types?

Tags:

scala

apache-spark

apache-spark-sql

himanshuIIITian

2 Answers

koiralo

Shaido

Related questions

Recent Activity

Donate For Us