Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change the datatype of any fields of Arraytype column in Pyspark

I want to change the datatype of the field "value", which is inside the arraytype column "readings". The column "reading" has two fields, "key" nd "value".

root
 |-- name: string (nullable = true)
 |-- languagesAtSchool: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languagesAtSchool1: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _id: integer (nullable = true)
 |-- languagesAtWork: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- currentState: string (nullable = true)
 |-- previousState: double (nullable = true)
 |-- readings: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- value: integer (nullable = true)
 |    |    |-- key: string (nullable = true)

Expected Schema is

   root
     |-- name: string (nullable = true)
     |-- languagesAtSchool: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- languagesAtSchool1: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- _id: integer (nullable = true)
     |-- languagesAtWork: array (nullable = true)
     |    |-- element: string (containsNull = true)
     |-- currentState: string (nullable = true)
     |-- previousState: double (nullable = true)
     |-- readings: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- value: string (nullable = true)
     |    |    |-- key: string (nullable = true)
like image 992
Afzal Abdul Azeez Avatar asked Oct 17 '25 07:10

Afzal Abdul Azeez


1 Answers

Transform using higher order function

Option 1; suitable when you want to drop some fields-name required fields instruct, sql expression

df1=df.withColumn('readings', expr('transform(readings, x-> struct(cast(x.value as integer) value,x.key))'))

or

Option 2; suitable when you dont want to name the fields in struct, also sql expression

df1=df.withColumn('readings', expr('transform(readings, x-> struct(x,cast(x.value as integer)))'))

option 3, suitable when you dont want to type fields in struct, doesnt use sql expression

df.withColumn('readings', F.transform('readings', lambda x: x.withField('value', x['value'].cast('int'))))


root
 |-- name: string (nullable = true)
 |-- languagesAtSchool: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languagesAtSchool1: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: integer (nullable = true)
 |-- languagesAtWork: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- currentState: string (nullable = true)
 |-- previousState: double (nullable = true)
 |-- readings: array (nullable = true)
 |    |-- element: struct (containsNull = false)
 |    |    |-- value: integer (nullable = true)
 |    |    |-- key: string (nullable = true)
like image 104
wwnde Avatar answered Oct 19 '25 21:10

wwnde



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!