I have a simple Spark app that reads some data, computes some metrics, and then saves the result (input and output are Cassandra table). This piece of code runs at regular intervals (i.e., every minute).
I have a Cassandra/Spark (Spark 1.6.1) and after a few minutes, my temporary directory on the master node of the Spark cluster is filled, and the master refuses to run any more jobs. I am submitting the job with spark-submit.
What is it that I am missing? How do I make sure that the master nodes removes the temporary folder?
Spark uses this directory as the scratch space and outputs temp map output files in there. This can be changed. You should take a look into spark.local.dir.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With