Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server through JDBC in PySpark

os.environ.get("PYSPARK_SUBMIT_ARGS", "--master yarn-client --conf spark.yarn.executor.memoryOverhead=6144 \
        --executor-memory 1G –jars  /mssql/jre8/sqljdbc42.jar --driver-class-path  /mssql/jre8/sqljdbc42.jar")

source_df = sqlContext.read.format('jdbc').options(
          url='dbc:sqlserver://xxxx.xxxxx.com',
          database = "mydbname",
          dbtable=mytable,
          user=username,
          password=pwd,
          driver='com.microsoft.jdbc.sqlserver.SQLServerDriver'
          ).load()

I am trying to load SQL Server Table using Spark Context.

But running into the following error.

Py4JJavaError: An error occurred while calling o59.load.
: java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

I have the jar file in the location. Is that the correct jar file? Is there a problem with the code.

Not sure what is the problem.

Scala error

scala> classOf[com.microsoft.sqlserver.jdbc.SQLServerDriver]
<console>:27: error: object sqlserver is not a member of package com.microsoft
              classOf[com.microsoft.sqlserver.jdbc.SQLServerDriver]


scala> classOf[com.microsoft.jdbc.sqlserver.SQLServerDriver]
<console>:27: error: object jdbc is not a member of package com.microsoft
              classOf[com.microsoft.jdbc.sqlserver.SQLServerDriver]
like image 664
Tronald Dump Avatar asked Mar 24 '26 10:03

Tronald Dump


1 Answers

The configuration is similar to Spark-Oracle configuration. Here is my Spark-sqlserver configurations:

from pyspark.sql import SparkSession
spark = SparkSession\
    .builder\
    .master('local[*]')\
    .appName('Connection-Test')\
    .config('spark.driver.extraClassPath', '/your/jar/folder/sqljdbc42.jar')\
    .config('spark.executor.extraClassPath', '/your/jar/folder/sqljdbc42.jar')\
    .getOrCreate()


sqlsUrl = 'jdbc:sqlserver://your.sql.server.ip:1433;database=YourSQLDB'

qryStr = """ (
    SELECT *
    FROM yourtable
    ) t """

spark.read.format('jdbc')\
    .option('url',sqlsUrl)\
    .option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver')\
    .option('dbtable', qryStr )\
    .option("user", "yourID") \
    .option("password", "yourPasswd") \
    .load().show()
  1. Set the location of the jar file you downloaded => "/your/jar/folder/sqljdbc42.jar". The jar file can be downloaded from: https://www.microsoft.com/en-us/download/details.aspx?id=54671 (*google sqljdbc42.jar if the link does not work)
  2. Set the correct jdbc url => 'jdbc:sqlserver://your.sql.server.ip:1433;database=YourSQLDB' (change the port number if you have a different setting)
  3. Set the correct driver name => .option('driver', 'com.microsoft.sqlserver.jdbc.SQLServerDriver')
  4. Enjoy
like image 74
kennyut Avatar answered Mar 26 '26 23:03

kennyut



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!