How to yield one array element and keep other elements in pyspark DataFrame?

Question

I have a pyspark DataFrame like:

+------------------------+
|                     ids|
+------------------------+
|[101826, 101827, 101576]|
+------------------------+

and I want explode this dataframe like:

+------------------------+
|     id|             ids|
+------------------------+
|101826 |[101827, 101576]|
|101827 |[101826, 101576]|
|101576 |[101826, 101827]|
+------------------------+

How can I do using pyspark udf or other methods?

wwnde · Accepted Answer

The easiest way out is to copy id into ids. Explode id and use array except to exclude each id in the row. Code below.

 (
  df1.withColumn('ids', col('id'))
 .withColumn('id',explode('id'))
    .withColumn('ids',array_except(col('ids'), array('id')))

).show(truncate=False)

+------+----------------+
|id    |ids             |
+------+----------------+
|101826|[101827, 101576]|
|101827|[101826, 101576]|
|101576|[101826, 101827]|
+------+----------------+

How to yield one array element and keep other elements in pyspark DataFrame?

Tags:

python

apache-spark-sql

pyspark

littlely

1 Answers

wwnde

Recent Activity

Donate For Us

How to yield one array element and keep other elements in pyspark DataFrame?

Tags:

python

apache-spark-sql

pyspark

littlely

1 Answers

wwnde

Related questions

Recent Activity

Donate For Us