Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding JDBC driver to Spark on EMR

I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:

java.sql.SQLException: No suitable driver found for exception.

I tried the following things:

  1. Use addJar to add the driver Jar explicitly from the code.
  2. Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
  3. Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.

Could you please help me with that,how can I introduce the driver to the Spark cluster easily?

Thanks,

David

Source code of the application

val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")

val conf = new SparkConf().setAppName("***")
      .setMaster("yarn-cluster")
      .setJars(JavaSparkContext.jarOfClass(this.getClass()))

val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)

var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***)
// Additional actions on df
like image 947
Guest Avatar asked Oct 14 '25 11:10

Guest


1 Answers

I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.

The main thing is to add the entire spark class path to the --driver-class-path

Here are my steps:

  1. I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
  2. Copied the MySQL JAR file to each node in the EMR cluster.
  3. Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it

My driver class path ended up looking like this:

--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml

You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.

like image 128
Fab Avatar answered Oct 17 '25 15:10

Fab



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!