When will Spark clean the cached RDDs automatically?

Question

The RDD, which have been cached used the rdd.cache() method from the scala terminal, are being stored in the memory.

That means it will consume some part of the ram being available for the Spark process itself.

Having said that if the ram is being limited, and more and more RDDs have been cached, when will spark clean the memory automatically which has been occupied by the rdd cache?

user9068240 · Accepted Answer

Spark will clean cached RDDs and Datasets / DataFrames:

When it is explicitly asked to by calling RDD.unpersist (How to uncache RDD?) / Dataset.unpersist methods or Catalog.clearCache.
In regular intervals, by the cache cleaner:

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
When corresponding distributed data structure is garbage collected.

When will Spark clean the cached RDDs automatically?

Tags:

caching

apache-spark

rdd

apache-spark-sql

KayV

1 Answers

user9068240

Recent Activity

Donate For Us

When will Spark clean the cached RDDs automatically?

Tags:

caching

apache-spark

rdd

apache-spark-sql

KayV

1 Answers

user9068240

Related questions

Recent Activity

Donate For Us