I am importing data to spark from MYSQL through JDBC and one of the column has time type (SQL type TIME and JDBC type java.sql.Time) with large hour value (Eg: 168:03:01). Spark convert them to timestamp format and causing error while reading three digit hour.How to deal with Time type in Spark
Probably your best shot at this moment is to cast data before it is actually read by Spark and parse it directly in your application. JDBC data source allows you to pass a valid subquery as a dbtable option or table argument. It means you can do for example something similar to this:
sqlContext.read.format("jdbc").options(Map(
"url" -> "xxxx",
"dbtable" -> "(SELECT some_field, CAST(time_field AS TEXT) FROM table) tmp",
))
and use some combination of built-in functions to convert it in Spark to a type that is applicable for your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With