Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Master filling temporary directory

Tags:

apache-spark

I have a simple Spark app that reads some data, computes some metrics, and then saves the result (input and output are Cassandra table). This piece of code runs at regular intervals (i.e., every minute).

I have a Cassandra/Spark (Spark 1.6.1) and after a few minutes, my temporary directory on the master node of the Spark cluster is filled, and the master refuses to run any more jobs. I am submitting the job with spark-submit.

What is it that I am missing? How do I make sure that the master nodes removes the temporary folder?

like image 646
davideanastasia Avatar asked Jan 18 '26 19:01

davideanastasia


1 Answers

Spark uses this directory as the scratch space and outputs temp map output files in there. This can be changed. You should take a look into spark.local.dir.

like image 69
tesnik03 Avatar answered Jan 20 '26 08:01

tesnik03



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!