Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find for each row the first non-null value in a group of columns and the column name

I have a dataframe like this:

col1     col2     col3     Other
====================================
NULL     1        2        A
3        4        5        B
NULL     NULL     NULL     C

and I would like to get as result the following one with this rules:

  • For each row find the first non-NULL value and set its value in FirstValue and set its column name in ColName
  • If in a row all values are NULL, FirstValue and ColName are set to NULL
  • keep Other column

Expected result:

FirstValue     ColName      Other
====================================
1             col2         A
3             col1         B
NULL          NULL         C
like image 427
Benjamin Avatar asked Sep 05 '25 01:09

Benjamin


1 Answers

You can use coalesce:

val df2 = df.select(
    coalesce(df.columns.dropRight(1).map(col):_*).as("FirstValue"), 
    coalesce(df.columns.dropRight(1).map(c => when(col(c).isNotNull, lit(c))):_*).as("ColName"), 
    col("Other")
)


df2.show
+----------+-------+-----+
|FirstValue|ColName|Other|
+----------+-------+-----+
|         1|   col2|    A|
|         3|   col1|    B|
|      null|   null|    C|
+----------+-------+-----+
like image 145
mck Avatar answered Sep 07 '25 17:09

mck