I have the following dataFrame
|—- id: long
|—- user: struct
|—- name: string
|—- date: string
When I run the following code:
df = df.select(
F.col(“id”),
F.col(“user.name”),
F.col(“user.date”).cast(“date”)
)
I get the following schema
|—- id: long
|—- user.name: string
|—- user.date: date
How can I cast the date without losing the struct? My desired output is:
|—- id: long
|—- user: struct
|—- name: string
|—- date: date
Try with withField function.
Example:
from pyspark.sql.functions import *
ds = "{'id':1, 'user':{'name':'abc','date':'2022-02-12'}}"
df = spark.read.json(sc.parallelize([ds]), multiLine=True)
df1 = df.withColumn("user", col("user").withField("date", col("user.date").cast("date")))
df1.show(10,False)
#+---+-----------------+
#|id |user |
#+---+-----------------+
#|1 |{2022-02-12, abc}|
#+---+-----------------+
withField in Spark SQL gives an example of how to do it with the DSL but also points out that it's not available in the standard sql functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With