I'd like to understand the internals of Spark's FAIR scheduling mode. The thing is that it seems not so fair as one would expect according to the official Spark documentation:
Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
It seems like jobs are not handled equally and actually managed in fifo order.
To give more information on the topic:
I am using Spark on YARN. I use the Java API of Spark. To enable the fair mode, The code is :
SparkConf conf = new SparkConf();
conf.set("spark.scheduler.mode", "FAIR");
conf.setMaster("yarn-client").setAppName("MySparkApp");
JavaSparkContext sc = new JavaSparkContext(conf);
Did I miss something?
To enable the fair mode, The code is : SparkConf conf = new SparkConf(); conf. set("spark. scheduler.
The allocation file is located in HADOOP_HOME/conf/fair-scheduler. xml.
The FairScheduler is a pluggable scheduler for Hadoop that allows YARN applications to share resources in a large cluster fairly. Fair scheduling is a method of assigning resources to applications such that all applications get, on average, an equal share of resources over time.
It appears that you didn't set up the pools and all your jobs end up in a single default pool as described in Configuring Pool Properties:
Specific pools’ properties can also be modified through a configuration file.
and later
A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).
It can also be that you didn't set up the local property to set up the pool to use for a given job(s) as described in Fair Scheduler Pools:
Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool “local property” to the SparkContext in the thread that’s submitting them.
It can finally mean that you use a single default FIFO pool so one pool in FIFO mode changes nothing comparing to FIFO without pools.
It's only you to know the real answer :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With