Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to yield one array element and keep other elements in pyspark DataFrame?

I have a pyspark DataFrame like:

+------------------------+
|                     ids|
+------------------------+
|[101826, 101827, 101576]|
+------------------------+

and I want explode this dataframe like:

+------------------------+
|     id|             ids|
+------------------------+
|101826 |[101827, 101576]|
|101827 |[101826, 101576]|
|101576 |[101826, 101827]|
+------------------------+

How can I do using pyspark udf or other methods?

like image 826
littlely Avatar asked Sep 20 '25 12:09

littlely


1 Answers

The easiest way out is to copy id into ids. Explode id and use array except to exclude each id in the row. Code below.

 (
  df1.withColumn('ids', col('id'))
 .withColumn('id',explode('id'))
    .withColumn('ids',array_except(col('ids'), array('id')))

).show(truncate=False)

+------+----------------+
|id    |ids             |
+------+----------------+
|101826|[101827, 101576]|
|101827|[101826, 101576]|
|101576|[101826, 101827]|
+------+----------------+
like image 106
wwnde Avatar answered Sep 22 '25 05:09

wwnde