I am trying to split a dataframe in pyspark This is the data i have
df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value'])
df = df.withColumn('Splitted', split(df['Value'], '|')[0])
I got
+-----+---------+-----+
|Key|Value|Splitted   |
+-----+---------+-----+
|    1|   Food|10|   F|
|    2|   Bar|11 |   B|
|    3|   Caring 12| C|
+-----+---------+-----+
But i want
+-----+---------+-----+
|Key  | Value|Splitted|
+-----+---------+-----+
|    1|   10|  Food   |
|    2|   11|  Bar    |
|    3|   12|Caring   |
+-----+---------+-----+
Can any one please point me to what i am doing wrong?
What if i have a unique situation like this?
df = sc.parallelize([[1, 'Foo|10|we'], [2, 'Bar|11|we'], [3,'Car|12|we']]).toDF(['Key', 'Value'])
+---+---------+
|Key|    Value|
+---+---------+
|  1|Foo|10|we|
|  2|Bar|11|we|
|  3|Car|12|we|
+---+---------+
The PySpark SQL provides the split() function to convert delimiter separated String to an Array (StringType to ArrayType) column on DataFrame It can be done by splitting the string column on the delimiter like space, comma, pipe, etc. and converting it into ArrayType.
Spark SQL provides split() function to convert delimiter separated String to array (StringType to ArrayType) column on Dataframe. This can be done by splitting a string column based on a delimiter like space, comma, pipe e.t.c, and converting into ArrayType.
To split multiple array column data into rows pyspark provides a function called explode(). Using explode, we will get a new row for each element in the array.
You forgot the escape character, you should include escape character as
df = df.withColumn('Splitted', split(df['Value'], '\|')[0])
If you want output as
+---+-----+--------+
|Key|Value|Splitted|
+---+-----+--------+
|1  |10   |Foo     |
|2  |11   |Bar     |
|3  |12   |Car     |
+---+-----+--------+
You should do
from pyspark.sql import functions as F
df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With