Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify columns based on datatype and convert them in pyspark?

I have a dataframe like as shown below

df = pd.DataFrame({
'date':['11/12/2001','11/12/2002','11/12/2003','11/12/2004','11/12/2005','11/12/2006'],
'readings' : ['READ_1','READ_2','READ_1','READ_3','READ_4','READ_5'],
 'val_date' :['21/12/2001','22/12/2002','23/12/2003','24/12/2004','25/12/2005','26/12/2006'],
 })
spark_df = spark.createDataFrame(df)
spark_df = spark_df.withColumn("date", spark_df["date"].cast(TimestampType()))
spark_df = spark_df.withColumn("val_date", spark_df["val_date"].cast(TimestampType()))

enter image description here

I have a dataframe with column datatypes as shown above

What I would like to do is identify

a) columns that contains the terms date,time in their names and convert their datatype from Timestamp/Datetime to string

and

b) identify columns based on Timestamp or Datetime datatypes and convert them to string type

Though the below works, this isn't elegant and efficient. I have more than 3k columns and cannot do this line by line

spark_df = spark_df.withColumn("date", spark_df["date"].cast(StringType()))
spark_df = spark_df.withColumn("val_date", spark_df["val_date"].cast(StringType()))

I also tried below but doesn't help

selected = [c.cast(StringType()) for c in spark_df.columns if ('date') in c]+['time']
spark_df.select(selected)

Is there anyway to identify columns based on condition a and b given above and convert them all at once?

Your inputs to solve this problem with at least one of the approach would be helpful

like image 701
The Great Avatar asked Sep 08 '25 10:09

The Great


1 Answers

You can do the following:

from pyspark.sql.functions import col

schema = {col: col_type for col, col_type in df.dtypes}
time_cols = [col for col, col_type in schema.items() if col_type in "timestamp date".split() or "date" in col or "time" in col]

for column in time_cols:
    df = df.withColumn(column, col(column).cast("string"))
like image 109
pissall Avatar answered Sep 10 '25 04:09

pissall