After joining two dataframes, I find that the column order has changed what I supposed it would be.
Ex: Joining two data frames with columns [b,c,d,e] and [a,b] on b yields a column order of [b,a,c,d,e].
How can I change the order of the columns (e.g., [a,b,c,d,e])?
I've found ways to do it in Python/R but not Scala or Java. Are there any methods that allow swapping or reordering of dataframe columns?
In Scala you can use the "splat" (:_*) syntax to pass a variable length list of columns to the DataFrame.select() method.
To address your example, you can get a list of the existing columns via DataFrame.columns, which returns an array of strings. Then just sort that array and convert the values to columns. You can then "splat" out to the select() method:
val mySortedCols = myDF.columns.sorted.map(str => col(str))
// Array[String]=(b,a,c,d,e) => Array[Column]=(a,b,c,d,e)
val myNewDF = myDF.select(mySortedCols:_*)
One way of doing it is reordering after your join:
case class Person(name : String, age: Int)
val persons = Seq(Person("test", 10)).toDF
persons.show
+----+---+
|name|age|
+----+---+
|test| 10|
+----+---+
persons.select("age", "name").show
+---+----+
|age|name|
+---+----+
| 10|test|
+---+----+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With