Spark - what triggers a spark job to be re-attempted?

Question

For educational purposes mostly, I was trying to get Yarn + Spark to re-attempt my Spark job on purpose (i.e. fail it, and see it be rescheduled by yarn in another app-attempt).

Various failures seem to cause a Spark job to be re-run; I know I have seen this numerous times. I'm having trouble simulating it though.

I have tried forcefully stopping the streaming context and calling System.exit(-1) and neither achieved the desired affect.

John Humphreys · Accepted Answer

After lots of playing with this, I have seen that Spark + YARN do not play well together with exit codes (at least not for MapR 5.2.1's versions), but I don't think it's MapR-specific.

Sometimes a spark program will throw an exception and die, and it reports SUCCESS to YARN (or YARN gets SUCCESS somehow), so there are no reattempts.

Doing System.exit(-1) does not provide any more stable results, sometimes it can be SUCCESS or FAILURE even when the same code is repeated.

Interestingly, getting a reference to the main thread of the driver and killing it does seem to force a re-attempt; but that is very dirty and requires use of a deprecated function on the thread class.

Spark - what triggers a spark job to be re-attempted?

Tags:

apache-spark

hadoop-yarn

John Humphreys

1 Answers

John Humphreys

Recent Activity

Donate For Us

Spark - what triggers a spark job to be re-attempted?

Tags:

apache-spark

hadoop-yarn

John Humphreys

1 Answers

John Humphreys

Related questions

Recent Activity

Donate For Us