Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark-submit on yarn - multiple jobs

I would like to submit multiple spark-submit jobs with yarn. When I run

spark-submit --class myclass --master yarn --deploy-mode cluster blah blah

as it is now, I have to wait for the job to complete for me to submit more jobs. I see the heartbeat:

16/09/19 16:12:41 INFO yarn.Client: Application report for application_1474313490816_0015 (state: RUNNING) 16/09/19 16:12:42 INFO yarn.Client: Application report for application_1474313490816_0015 (state: RUNNING)

How can I tell yarn to pick up another job all from the same terminal. Ultimately I want to be able to run from a script where I cand send hundreds of jobs in one go.

Thank you.

like image 221
ab3 Avatar asked Sep 15 '25 07:09

ab3


2 Answers

Every user has a fixed capacity as specified in the yarn configuration. If you are allocated N executors (usually, you will be allocated some fixed number of vcores), and you want to run 100 jobs, you will need to specify the allocation to each of the jobs:

spark-submit --num-executors N/100 --executor-cores 5

Otherwise, the jobs will loop in accepted.

You can launch multiple jobs in parallel using & at the last of every invocation.

for i inseq 20; do spark-submit --master yarn --num-executors N/100 --executor-cores 5 blah blah &; done

like image 56
axiom Avatar answered Sep 17 '25 00:09

axiom


  • Check dynamic allocation in spark
  • Check what scheduler is in use with Yarn, if FIFO change it to FAIR
  • How are you planning to allocate resources to N number of jobs on yarn?
like image 31
avrsanjay Avatar answered Sep 17 '25 01:09

avrsanjay