Convert row values into columns with its value from another column in spark scala [duplicate]

Question

I'm trying to convert values from row into different columns with its value from another column. For example -

Input dataframe is like -

+-----------+
| X | Y | Z |
+-----------+
| 1 | A | a |
| 2 | A | b |
| 3 | A | c |
| 1 | B | d |
| 3 | B | e |
| 2 | C | f |
+-----------+

And the output dataframe should be like this -

+------------------------+
| Y | 1    | 2    | 3    |
+------------------------+
| A | a    | b    | c    |
| B | d    | null | e    |
| C | null | f    | null |
+------------------------+

I've tried to groupBy the values based on Y and collect_list on X and Z and then zipped X & Z together to get some sort of key-value pairs. But some Xs may be missing for some values of Y so in order to fill them with null values, I cross joined all possible values of X and all possible values of Y and then joined it my original dataframe. This is approach is highly inefficient.

Is there any efficient method to approach this problem ? Thanks in advance.

koiralo · Accepted Answer

You can simply use groupBy with pivot and first as aggregate function as

import org.apache.spark.sql.functions._
df.groupBy("Y").pivot("X").agg(first("z"))

Output:

+---+----+----+----+
|Y  |1   |2   |3   |
+---+----+----+----+
|B  |d   |null|e   |
|C  |null|f   |null|
|A  |a   |b   |c   |
+---+----+----+----+

Convert row values into columns with its value from another column in spark scala [duplicate]

Tags:

scala

apache-spark

apache-spark-sql

Ishan

1 Answers

koiralo

Recent Activity

Donate For Us

Convert row values into columns with its value from another column in spark scala [duplicate]

Tags:

scala

apache-spark

apache-spark-sql

Ishan

1 Answers

koiralo

Related questions

Recent Activity

Donate For Us