According to the API docs:
getActiveSession() Returns the active SparkSession for the current thread, returned by the builder.
getDefaultSession() Returns the default SparkSession that is returned by the builder.
I was (most likely erroneously) using getActiveSession to retrieve the SparkSession or SparkContext in some functions across multiple threads. Sometimes the activeSession was not defined (most likely because the thread had just started up).
Can someone explain the difference between the two, or is the API doc sufficiently self-explanatory?
Also, when would I use getActiveSession if
In 99% of apps there is only one session and
getDefaultSession should return that session
ActiveSession is for single thread while DefaultSession is global. The DefaultSession is the ActiveSession for main thread by default.SparkSession object share the same SparkContext. But they may have different states, like SQL configurations, temporary tables and registered functions.In 99% of apps there is only one session, you are right, in fact, more than 99%.ActiveSession?
DefaultSession, you must use different name for each dataframe like city_1, city_2.ActiveSession(you can create new session by SparkSession.newSession), you can register all the temp views with the same name city, everything goes easy.SparkSession.active can help you fall to DefaultSession when ActiveSession not existIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With