Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set spark driver maxResultSize when in client mode in pyspark?

I know when you are in client mode in pyspark, you cannot set configurations in your script, because the JVM gets started as soon as the libraries are loaded.

So, the way to set the configurations is to actually go and edit the shell script that launches it: spark-env.sh...according to this documentation here.

If I want to change the maximum results size at the driver, I would normally do this: spark.driver.maxResultSize. What is the equivalent to that in the spark-env.sh file ?

Some of the environmental variables are easy to set, such as SPARK_DRIVER_MEMORY is clearly the setting for spark.driver.memory, but what is the environmental variable for spark.driver.maxResultSize ? Thank you.

like image 971
makansij Avatar asked Oct 16 '25 15:10

makansij


1 Answers

The configuration file is conf/spark-default.conf.

If conf/spark-default.conf does not exist

cp conf/spark-defaults.conf.template conf/spark-defaults.conf

Add configuration like

spark.driver.maxResultSize  2g

There are many configuration available, refer Spark Configuration

like image 51
Rockie Yang Avatar answered Oct 18 '25 06:10

Rockie Yang