How to use long-lived expensive-to-instantiate utility services where executors run?

Question

My Spark processing logic depends upon long-lived, expensive-to-instantiate utility objects to perform data-persistence operations. Not only are these objects probably not Serializable, but it is probably impractical to distribute their state in any case, as said state likely includes stateful network connections.

What I would like to do instead is instantiate these objects locally within each executor, or locally within threads spawned by each executor. (Either alternative is acceptable, as long as the instantiation does not take place on each tuple in the RDD.)

Is there a way to write my Spark driver program such that it directs executors to invoke a function to instantiate an object locally (and cache it in the executor's local JVM memory space), rather than instantiating it within the driver program then attempting to serialize and distribute it to the executors?

Paul K. · Accepted Answer

It is possible to share objects at partition level:

I've tried this : How to make Apache Spark mapPartition work correctly?

The repartition to make numPartitions match a multiple of the number of executors.

How to use long-lived expensive-to-instantiate utility services where executors run?

Tags:

apache-spark

sumitsu

1 Answers

Paul K.

Recent Activity

Donate For Us

How to use long-lived expensive-to-instantiate utility services where executors run?

Tags:

apache-spark

sumitsu

1 Answers

Paul K.

Related questions

Recent Activity

Donate For Us