I have two dataframes. The first dataframe has an array as its value for column2 and I want to join it with the second dataframe so the numerical values are mapped to their string values. The order of the elements should stay the same since they correspond to the array elements in column3 by index.
df_one:
column1| column2| column3
----------------------------------
"thing1"|[1,2,3..]|[0.1,0.2,0.3..]
"thing2"|[1,2,3..]|[0.1,0.2,0.3..]
"thing3"|[1,2,3..]|[0.1,0.2,0.3..]
...
df_two:
columnA|columnB
---------------
1|"item1"
2|"item2"
3|"item3"
...
Is there a way to join these dataframes and select the columns like so:
column1 | newColumn| column3
----------------------------------------------------
"thing1"|["item1","item2","item3"..]|[0.1,0.2,0.3..]
"thing2"|["item1","item2","item3"..]|[0.1,0.2,0.3..]
"thing3"|["item1","item2","item3"..]|[0.1,0.2,0.3..]
...
As mentioned in the comments, explode on column2 and then join on columnA is the good way to go. Yet, when you group back the data, I am not sure that the order would always be preserved.
To be sure, and avoid costly UDF in python, you could use posexplode to keep track of the position of each element, and then an ordered window function to build the list back:
df_one = spark.createDataFrame([("thing1", [1, 2, 3], "X"), ("thing2", [1, 2, 3], "Y"), ("thing3", [1, 2, 3], "Z")],
["column1", "column2", "column3"])
df_two = spark.createDataFrame([(1, "item1"), (2, "item2"), (3, "item3")],
["columnA", "columnB"])
w = Window.partitionBy("column1").orderBy("pos")
df_one\
.select("*", f.posexplode("column2").alias("pos", "columnA"))\
.join(df_two, ['columnA'])\
.withColumn("newColumn", f.collect_list("columnB").over(w))\
.where(f.col("pos")+1 == f.size(f.col("column2")))\
.select("column1", "newColumn", "column3")\
.show(truncate=False)
+-------+---------------------+-------+
|column1|newColumn |column3|
+-------+---------------------+-------+
|thing1 |[item1, item2, item3]|X |
|thing2 |[item1, item2, item3]|Y |
|thing3 |[item1, item2, item3]|Z |
+-------+---------------------+-------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With