This question was asked in interview that how many sparkcontexts are allowed to be created per JVM and why? I know only one sparkContext is allowed per jvm but cant understand why? Would anyone please help me understand the reason behind "one sparkcontext per jvm"?
The answer is simple - it has not been designed to work with multiple contexts. Quoting Reynold Xin:
I don't think we currently support multiple SparkContext objects in the same JVM process. There are numerous assumptions in the code base that uses a a shared cache or thread local variables or some global identifiers which prevent us from using multiple SparkContext's.
In a broader sense - single application (with main), single JVM - is standard approach in Java world (Is there one JVM per Java application?, Why have one JVM per application?). Application servers choose different approach, but it is exception, not a rule.
From practical point of view - handling single data intensive application is painful enough (tuning GC, dealing with leaking resources, communication overhead). Mutliple Spark applications running in a single JVM would be impossible to tune and manage in a long run.
Finally there would not be much use of having multiple contexts, as each distributed data structure is tightly connected to its context.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With