I am trying to read json data in spark streaming job.
By default sqlContext.read.json(rdd) is converting all map types to struct types.
|-- legal_name: struct (nullable = true)
 |    |-- first_name: string (nullable = true)
 |    |-- last_name: string (nullable = true)
 |    |-- middle_name: string (nullable = true)
But when i read from hive table using sqlContext
val a = sqlContext.sql("select * from student_record")
below is the schema.
|-- leagalname: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
Is there any way we can read data using read.json(rdd) and get Map data type?
Is there any option like 
 spark.sql.schema.convertStructToMap?
Any help is appreciated.
You need to explicitly define your schema, when calling read.json.
You can read about the details in Programmatically specifying the schema in the Spark SQL Documentation.
For example in your specific case it would be
import org.apache.spark.sql.types._
val schema = StructType(List(StructField("legal_name",MapType(StringType,StringType,true))))
That would be one column legal_name being a map.
When you have defined you schema you can call
sqlContext.read.json(rdd, schema) to create your data frame from your JSON dataset with the desired schema.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With