I would like to add a string to an existing column. For example, df['col1'] has values as '1', '2', '3' etc and I would like to concat string '000' on the left of col1 so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'. 
I thought I should use df.withColumn('col1', '000'+df['col1']) but of course it does not work since pyspark dataframe are immutable?
This should be an easy task but i didn't find anything online. Hope someone can give me some help!
Thank you!
PySpark Concatenate Using concat() select() is a transformation function in PySpark and returns a new DataFrame with the selected columns. In the above example, using concat() function of Pyspark SQL, I have concatenated three input string columns(firstname, middlename, lastname) into a single string column(FullName).
In PySpark, to add a new column to DataFrame use lit() function by importing from pyspark. sql. functions import lit , lit() function takes a constant value you wanted to add and returns a Column type, if you wanted to add a NULL / None use lit(None) .
add_prefix() is used to add a prefix string to each and every column at the beginning of the pyspark pandas dataframe. It is also possible to add a prefix to only a single column by specifying the column name. In this scenario, it will be added to row labels.
In order to convert array to a string, PySpark SQL provides a built-in function concat_ws() which takes delimiter of your choice as a first argument and array column (type Column) as the second argument. In order to use concat_ws() function, you need to import it using pyspark. sql.
from pyspark.sql.functions import concat, col, lit   df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5) +------------------------------+ |concat(firstname,  , lastname)| +------------------------------+ |                Emanuel Panton| |              Eloisa Cayouette| |                   Cathi Prins| |             Mitchel Mozdzierz| |               Angla Hartzheim| +------------------------------+ only showing top 5 rows http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With