Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access global temp view in another pyspark application?

I have a spark shell which invokes pyscript and has created a global temp view

This is what I am doing in first spark shell script

from pyspark.sql import SparkSession

spark = SparkSession \
.builder \
.appName("Spark SQL Parllel load example") \
.config("spark.jars","/u/user/graghav6/sqljdbc4.jar") \
.config("spark.dynamicAllocation.enabled","true") \
.config("spark.shuffle.service.enabled","true") \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.config("spark.sql.shuffle.partitions","50") \
.config("hive.metastore.uris", "thrift://xxxxx:9083") \
.config("spark.sql.join.preferSortMergeJoin","true") \
.config("spark.sql.autoBroadcastJoinThreshold", "-1") \
.enableHiveSupport() \
.getOrCreate()

#after doing some transformation I am trying to create a global temp view of dataframe as:

df1.createGlobalTempView("df1_global_view")
spark.stop()
exit()

This is my second spark shell script:

from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Spark SQL Parllel load example") \
.config("spark.jars","/u/user/graghav6/sqljdbc4.jar") \
.config("spark.dynamicAllocation.enabled","true") \
.config("spark.shuffle.service.enabled","true") \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.config("spark.sql.shuffle.partitions","50") \
.config("hive.metastore.uris", "thrift://xxxx:9083") \
.config("spark.sql.join.preferSortMergeJoin","true") \
.config("spark.sql.autoBroadcastJoinThreshold", "-1") \
.enableHiveSupport() \
.getOrCreate()

newSparkSession = spark.newSession()
#reading dta from the global temp view
data_df_save = newSparkSession.sql(""" select * from global_temp.df1_global_view""")
data_df_save.show()

newSparkSession.close()
exit()

I am getting below error:

Stdoutput pyspark.sql.utils.AnalysisException: u"Table or view not found: `global_temp`.`df1_global_view`; line 1 pos 15;\n'Project [*]\n+- 'UnresolvedRelation `global_temp`.`df1_global_view`\n"

Looks like I am missing something. How can I shared the same global temp view across multiple sessions? Am I closing the spark session incorrectly in first spark shell? I have found couple of answers already on stack-overflow but was not able to figure out the cause.

like image 348
vikrant rana Avatar asked Sep 05 '25 08:09

vikrant rana


1 Answers

You're using createGlobalTempView so it's a temporary view and won't be available after you close the app.

In other words, it will be available in another SparkSession, but not in another PySpark application.

like image 182
Jacek Laskowski Avatar answered Sep 09 '25 02:09

Jacek Laskowski