How to extract the column name and data type from nested struct type in spark
schema getting like this:
(events,StructType(
StructField(beaconType,StringType,true),
StructField(beaconVersion,StringType,true),
StructField(client,StringType,true),
StructField(data,StructType(
StructField(ad,StructType(
StructField(adId,StringType,true)
)
)
)
I want to convert into below format
Array[(String, String)] = Array(
(client,StringType),
(beaconType,StringType),
(beaconVersion,StringType),
(phase,StringType)
could you please help on this
Question is somewhat unclear, but if you're looking for a way to "flatten" a DataFrame schema (i.e. get an array of all non-struct fields), here's one:
def flatten(schema: StructType): Array[StructField] = schema.fields.flatMap { f =>
f.dataType match {
case struct: StructType => flatten(struct)
case _ => Array(f)
}
}
For example:
val schema = StructType(Seq(StructField("events",
StructType(Seq(
StructField("beaconVersion", IntegerType, true),
StructField("client", StringType, true),
StructField("data", StructType(Seq(
StructField("ad", StructType(Seq(
StructField("adId", StringType, true)
)))
)))
)))
))
println(flatten(schema).toList)
// List(StructField(beaconVersion,IntegerType,true), StructField(client,StringType,true), StructField(adId,StringType,true))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With