I have my spark project on data_proc in GCP, and on spark submit, running the driver program. When I am trying to connect to Azure SQL DB, it is throwing the below exception:
20:39:15 DOCKER: Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:3609)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:3580)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:3548)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:261)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:103)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.access$100(SQLServerConnection.java:82)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnectionInternal(SQLServerDataSource.java:968)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnection(SQLServerDataSource.java:69)
Below are the versions of the components :
The authentication is via Active Directory. The same thing works in local, but not in dataproc. Appreciate any help!!
Seems like you are using Docker. If so you need to make sure that adal4j.jar
is included in driver Docker container or it was added via --jars
flag in Spark submit command:
gcloud dataproc jobs spark submit \
--cluster-name $CLUSTER_NAME \
. . . \
--jars adal4j.jar
For reference, see how to manage Java dependencies in Spark: https://cloud.google.com/dataproc/docs/guides/manage-spark-dependencies
If you packaged your job code as a fat jar with all its dependencies and you submitted it appropriately to your Dataproc cluster, and even then you are facing the error, one possible reason of the problem is that a classpath conflict related to the SQL Server driver library exists somewhere. As pointed out as well in my comment, although in a different context, a similar behavior is reported in several Github issues like this or this other.
In addition to trying removing the conflicting library, I do not know if applicable to your use case - probably not being a database driver - but perhaps you could try relocating the SQL Server code to a different package and use that package instead.
The approach is described in the GCP Dataproc documentation, for instance, using the Maven shade plugin.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With