Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to extract the column name and data type from nested struct type in spark

How to extract the column name and data type from nested struct type in spark

schema getting like this:

(events,StructType(
   StructField(beaconType,StringType,true),     
   StructField(beaconVersion,StringType,true), 
   StructField(client,StringType,true), 
   StructField(data,StructType(
      StructField(ad,StructType(
         StructField(adId,StringType,true)
      )
   )
)

I want to convert into below format

Array[(String, String)] = Array(
  (client,StringType), 
  (beaconType,StringType), 
  (beaconVersion,StringType), 
  (phase,StringType)

could you please help on this

like image 398
mahipal Avatar asked Sep 06 '25 23:09

mahipal


1 Answers

Question is somewhat unclear, but if you're looking for a way to "flatten" a DataFrame schema (i.e. get an array of all non-struct fields), here's one:

def flatten(schema: StructType): Array[StructField] = schema.fields.flatMap { f =>
  f.dataType match {
    case struct: StructType => flatten(struct)
    case _ => Array(f)
  }
}

For example:

val schema = StructType(Seq(StructField("events", 
  StructType(Seq(
    StructField("beaconVersion", IntegerType, true),
    StructField("client", StringType, true),
    StructField("data", StructType(Seq(
      StructField("ad", StructType(Seq(
        StructField("adId", StringType, true)
      )))
    )))
  )))
))

println(flatten(schema).toList)
// List(StructField(beaconVersion,IntegerType,true), StructField(client,StringType,true), StructField(adId,StringType,true))
like image 131
Tzach Zohar Avatar answered Sep 09 '25 05:09

Tzach Zohar