Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Only one SparkContext is allowed per JVM?

This question was asked in interview that how many sparkcontexts are allowed to be created per JVM and why? I know only one sparkContext is allowed per jvm but cant understand why? Would anyone please help me understand the reason behind "one sparkcontext per jvm"?

like image 962
user2017 Avatar asked Oct 25 '25 20:10

user2017


1 Answers

The answer is simple - it has not been designed to work with multiple contexts. Quoting Reynold Xin:

I don't think we currently support multiple SparkContext objects in the same JVM process. There are numerous assumptions in the code base that uses a a shared cache or thread local variables or some global identifiers which prevent us from using multiple SparkContext's.

In a broader sense - single application (with main), single JVM - is standard approach in Java world (Is there one JVM per Java application?, Why have one JVM per application?). Application servers choose different approach, but it is exception, not a rule.

From practical point of view - handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run.

Finally there would not be much use of having multiple contexts, as each distributed data structure is tightly connected to its context.

like image 102
user9170705 Avatar answered Oct 28 '25 19:10

user9170705



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!