Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting up Java Version to be used by PySpark in Jupyter Notebook

When trying to run PySpark in my Jupyter Notebook in VSCode I run into the following error message after the following lines of code:

import pyspark

from pyspark.sql import SparkSession

import findspark # part of my google work on the problem - didn't help...

spark=SparkSession.builder.appName('Practise').getOrCreate()
calling None.org.apache.spark.api.java.JavaSparkContext.\n:
java.lang.IllegalAccessError: class
org.apache.spark.storage.StorageUtils$ (in unnamed module @0x34d2e626)
cannot access class sun.nio.ch.DirectBuffer (in module java.base)
because module java.base does not export sun.nio.ch to unnamed module
@0x34d2e626\n\tat
org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)\n\tat
org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)\n\tat
org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)\n\tat
org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)\n\tat
org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)\n\tat
org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)\n\tat
org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)\n\tat
org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)\n\tat
org.apache.spark.SparkContext.<init>(SparkContext.scala:460)\n\tat
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)\n\tat
java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:67)\n\tat
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)\n\tat
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:483)\n\tat
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)\n\tat
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat
py4j.Gateway.invoke(Gateway.java:238)\n\tat
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)\n\tat
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)\n\tat
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat
py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat
java.base/java.lang.Thread.run(Thread.java:833)\n", ```

The error message goes on but I think this is the relevant part. By doing some google work I found that this probably hints that the wrong version of Java is used (if not I might've to change the headline of this thread). On my Mac I got Java 18 and 8 installed. In my Terminal I can switch between versions like this:

JAVA_HOME='/Library/Java/JavaVirtualMachines/jdk-18.0.1.1.jdk/Contents/Home'
JAVA_HOME='/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home'

However, in my notebook this does not seem to work:

!export JAVA_HOME='/Users/_______/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home'
!java -version

Output is still:

java version "18.0.1.1" 2022-04-22
Java(TM) SE Runtime Environment (build 18.0.1.1+2-6)
Java HotSpot(TM) 64-Bit Server VM (build 18.0.1.1+2-6, mixed mode, sharing)

Is there a solution other than completely removing the newer Java-Version from my Mac (or is the problem actually a completely different one)?

like image 680
Morgan Avatar asked Oct 15 '25 23:10

Morgan


1 Answers

If you have the correct version of Java installed, but it's not the default version for your operating system, you can update your system PATH environment variable dynamically, or set the JAVA_HOME environment variable within Python before creating your Spark context.

Your two options would look like this:

Set JAVA_HOME

import os

JAVA_HOME = "/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home"
os.environ["JAVA_HOME"] = JAVA_HOME

from pyspark.sql import SparkSession
spark=SparkSession.builder.appName('Practise').getOrCreate()

Add Java bin to PATH

import os

JAVA_HOME = "/Library/Java/JavaVirtualMachines/jdk1.8.0_271.jdk/Contents/Home"
os.environ["PATH"] = f"{JAVA_HOME}/bin:{os.environ['PATH']}"  # Note /bin is added to JAVA_HOME path.

from pyspark.sql import SparkSession
spark=SparkSession.builder.appName('Practise').getOrCreate()
like image 97
JGC Avatar answered Oct 18 '25 13:10

JGC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!