Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove last character from string

I am trying to create a new dataframe column (b) removing the last character from (a). column a is a string with different lengths so i am trying the following code -

from pyspark.sql.functions import *
df.select(substring('a', 1, length('a') -1 ) ).show()

I get a TypeError: 'Column' object is not callable

it seems to be due to using multiple functions but i cant understand why as these work on their own -

if i hardcode the column length this will work

df.select(substring('a', 1, 10 ) ).show()

or if i use length on it's own it works

df.select(length('a') ).show()

why can i not use multiple functions ? is there an easier method of removing the last character from all rows in a column ?

like image 632
David Avatar asked Oct 24 '25 22:10

David


1 Answers

Using substr

df.select(col('a').substr(lit(0), length(col('a')) - 1))

or using regexp_extract:

df.select(regexp_extract(col('a'), '(.*).$', 1))

Function substring does not work as the parameters pos and len needs to be integers, not columns http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=substring#pyspark.sql.functions.substring

like image 126
ollik1 Avatar answered Oct 27 '25 14:10

ollik1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!