I just started learning Scala and I'm trying to figure out a way to get the min of two or multiple Columns of the same type in a DataFrame. I have the following code which gives me the min and max of a Column individually.
inputDF.select(min($"dropoff_longitude")).show
inputDF.select(max($"pickup_longitude")).show
How do I get the min of both the Columns, dropoff_longitude and pickup_longitude. I did it like this
scala.math.min(
inputDF.select(min($"pickup_longitude")).head.getFloat(0),
inputDF.select(min($"dropoff_longitude")).head.getFloat(0)
)
Is there a better way to do this?
Thank you
You can use least and greatest Spark SQL functions in select expressions for this purpose. In your case it will look like this:
import org.apache.spark.sql.functions._
val minLongitude =
df.select(least($"pickup_longitude", $"dropoff_longitude") as "least_longitude")
.agg(min($"least_longitude"))
.head.getFloat(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With