I have a pyspark DataFrame like:
+------------------------+ | ids| +------------------------+ |[101826, 101827, 101576]| +------------------------+
and I want explode this dataframe like:
+------------------------+ | id| ids| +------------------------+ |101826 |[101827, 101576]| |101827 |[101826, 101576]| |101576 |[101826, 101827]| +------------------------+
How can I do using pyspark udf or other methods?
The easiest way out is to copy id
into ids
. Explode id
and use array except
to exclude each id in the row. Code below.
(
df1.withColumn('ids', col('id'))
.withColumn('id',explode('id'))
.withColumn('ids',array_except(col('ids'), array('id')))
).show(truncate=False)
+------+----------------+
|id |ids |
+------+----------------+
|101826|[101827, 101576]|
|101827|[101826, 101576]|
|101576|[101826, 101827]|
+------+----------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With