how to extract the column name and data type from nested struct type in spark

Question

How to extract the column name and data type from nested struct type in spark

schema getting like this:

(events,StructType(
   StructField(beaconType,StringType,true),     
   StructField(beaconVersion,StringType,true), 
   StructField(client,StringType,true), 
   StructField(data,StructType(
      StructField(ad,StructType(
         StructField(adId,StringType,true)
      )
   )
)

I want to convert into below format

Array[(String, String)] = Array(
  (client,StringType), 
  (beaconType,StringType), 
  (beaconVersion,StringType), 
  (phase,StringType)

could you please help on this

Tzach Zohar · Accepted Answer

Question is somewhat unclear, but if you're looking for a way to "flatten" a DataFrame schema (i.e. get an array of all non-struct fields), here's one:

def flatten(schema: StructType): Array[StructField] = schema.fields.flatMap { f =>
  f.dataType match {
    case struct: StructType => flatten(struct)
    case _ => Array(f)
  }
}

For example:

val schema = StructType(Seq(StructField("events", 
  StructType(Seq(
    StructField("beaconVersion", IntegerType, true),
    StructField("client", StringType, true),
    StructField("data", StructType(Seq(
      StructField("ad", StructType(Seq(
        StructField("adId", StringType, true)
      )))
    )))
  )))
))

println(flatten(schema).toList)
// List(StructField(beaconVersion,IntegerType,true), StructField(client,StringType,true), StructField(adId,StringType,true))

how to extract the column name and data type from nested struct type in spark

Tags:

scala

apache-spark

mahipal

1 Answers

Tzach Zohar

Recent Activity

Donate For Us

how to extract the column name and data type from nested struct type in spark

Tags:

scala

apache-spark

mahipal

1 Answers

Tzach Zohar

Related questions

Recent Activity

Donate For Us