Adding JDBC driver to Spark on EMR

Question

I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:

java.sql.SQLException: No suitable driver found for exception.

I tried the following things:

Use addJar to add the driver Jar explicitly from the code.
Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.

Could you please help me with that,how can I introduce the driver to the Spark cluster easily?

Thanks,

David

Source code of the application

val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")

val conf = new SparkConf().setAppName("***")
      .setMaster("yarn-cluster")
      .setJars(JavaSparkContext.jarOfClass(this.getClass()))

val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)

var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***)
// Additional actions on df

Fab · Accepted Answer

I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.

The main thing is to add the entire spark class path to the --driver-class-path

Here are my steps:

I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
Copied the MySQL JAR file to each node in the EMR cluster.
Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it

My driver class path ended up looking like this:

--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml

You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.

Adding JDBC driver to Spark on EMR

Tags:

jdbc

apache-spark

amazon-emr

Guest

1 Answers

Fab

Recent Activity

Donate For Us

Adding JDBC driver to Spark on EMR

Tags:

jdbc

apache-spark

amazon-emr

Guest

1 Answers

Fab

Related questions

Recent Activity

Donate For Us