I'm running python script on Spark cluster using jupyter. I want to change driver default stack size. I found in the documentation that I can use spark.driver.extraJavaOptions
to send any options to driver JVM, but there is a note in the documentation:
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.
The question is: How to change default driver parameter when running from jupyter ?
You can customize the Java options used for the driver by passing spark.driver.extraJavaOptions as a configuration value into the SparkConf, eg:
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setMaster("spark://spark-master:7077")
.setAppName("MyApp")
.set("spark.driver.extraJavaOptions", "-Xss4M"))
sc = SparkContext.getOrCreate(conf = conf)
Note that in http://spark.apache.org/docs/latest/configuration.html it states about spark.driver.extraJavaOptions:
Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.
However this is talking about the JVM SparkConf class. When it’s set in the PySpark Python SparkConf, that passes it as a command-line parameter to spark-submit, which then uses it when instantiating the JVM, so that comment in the Spark docs does not apply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With