I use Spark 1.3.0 in a cluster of 5 worker nodes with 36 cores and 58GB of memory each. I'd like to configure Spark's Standalone cluster with many executors per worker.
I have seen the merged SPARK-1706, however it is not immediately clear how to actually configure multiple executors.
Here is the latest configuration of the cluster:
spark.executor.cores = "15"
spark.executor.instances = "10"
spark.executor.memory = "10g"
These settings are set on a SparkContext
when the Spark application is submitted to the cluster.
You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.
Executors Scheduling The number of cores assigned to each executor is configurable. When spark. executor. cores is explicitly set, multiple executors from the same application may be launched on the same worker if the worker has enough cores and memory.
Number of executors per node = 30/10 = 3. Memory per executor = 64GB / 3 = 21GB.
Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3. Memory per executor = 64GB/3 = 21GB.
In stand-alone mode, by default, all the resources on the cluster are acquired as you launch an application. You need to specify the number of executors you need using the --executor-cores
and the --total-executor-cores
configs.
For example, if there is 1 worker (1 worker == 1 machine in your cluster, it's a good practice to have only 1 worker per machine) in your cluster which has 3 cores and 3G available in its pool (this is specified in spark-env.sh), when you submit an application with --executor-cores 1 --total-executor-cores 2 --executor-memory 1g
, two executors are launched for the application with 1 core and 1g each. Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With