Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark - what triggers a spark job to be re-attempted?

For educational purposes mostly, I was trying to get Yarn + Spark to re-attempt my Spark job on purpose (i.e. fail it, and see it be rescheduled by yarn in another app-attempt).

Various failures seem to cause a Spark job to be re-run; I know I have seen this numerous times. I'm having trouble simulating it though.

I have tried forcefully stopping the streaming context and calling System.exit(-1) and neither achieved the desired affect.

like image 506
John Humphreys Avatar asked Dec 02 '25 21:12

John Humphreys


1 Answers

After lots of playing with this, I have seen that Spark + YARN do not play well together with exit codes (at least not for MapR 5.2.1's versions), but I don't think it's MapR-specific.

Sometimes a spark program will throw an exception and die, and it reports SUCCESS to YARN (or YARN gets SUCCESS somehow), so there are no reattempts.

Doing System.exit(-1) does not provide any more stable results, sometimes it can be SUCCESS or FAILURE even when the same code is repeated.

Interestingly, getting a reference to the main thread of the driver and killing it does seem to force a re-attempt; but that is very dirty and requires use of a deprecated function on the thread class.

like image 184
John Humphreys Avatar answered Dec 05 '25 11:12

John Humphreys



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!