Is it possible to force schema definition when loading tables from AWS RDS (MySQL)

Question

I'm using Apache Spark to read data from MySQL database from AWS RDS.

It is actually inferring the schema from the database as well. Unfortunately, one of the table's columns is of type TINYINT(1) (column name : active). The active column has the following values:

non active
active
pending
etc.

Spark recognizes TINYINT(1) as BooleanType. So he change all value in active to true or false. As a result, I can’t identify the value.

Is it possible to force schema definition when loading tables to spark?

eliasah · Accepted Answer

It's not spark that converts the TINYINT type into a boolean but the j-connector used under the hood.

So, actually you don't need to specify a schema for that issue. Because what's actually causing this is the jdbc driver that treats the datatype TINYINT(1) as the BIT type (because the server silently converts BIT -> TINYINT(1) when creating tables).

You can check all the tips and gotchas of the jdbc connector in the MySQL official Connector/J Configuration Properties guide.

You just need to pass the right parameters for your jdbc connector by adding the following to your url connection :

val newUrl = s"$oldUrl&tinyInt1isBit=false"

val data = spark.read.format("jdbc")
  .option("url", newUrl)
  // your other jdbc options
  .load

Is it possible to force schema definition when loading tables from AWS RDS (MySQL)

Tags:

mysql

amazon-web-services

apache-spark

apache-spark-sql

Rotem Ashkenazi

1 Answers

eliasah

Recent Activity

Donate For Us

Is it possible to force schema definition when loading tables from AWS RDS (MySQL)

Tags:

mysql

amazon-web-services

apache-spark

apache-spark-sql

Rotem Ashkenazi

1 Answers

eliasah

Related questions

Recent Activity

Donate For Us