Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In simple terms, how does Spark schedule jobs?

Just wondering how does Spark schedule jobs? In simple terms please, I have read many descriptions of how it does it but they were too complicated to understand.

like image 283
user2768498 Avatar asked Oct 17 '25 11:10

user2768498


1 Answers

At high level, when any action is called on the RDD, Spark creates the DAG and submits to the DAG scheduler.

  • The DAG scheduler divides operators into stages of tasks. A stage is comprised of tasks based on partitions of the input data. The DAG scheduler pipelines operators together. For e.g. Many map operators can be scheduled in a single stage. The final result of a DAG scheduler is a set of stages.

  • The Stages are passed on to the Task Scheduler.The task scheduler launches tasks via cluster manager.(Spark Standalone/Yarn/Mesos). The task scheduler doesn't know about dependencies of the stages.

  • The Worker executes the tasks on the Slave.

look at this answer for more information

like image 151
Sathish Avatar answered Oct 20 '25 17:10

Sathish



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!