My goal is to import a custom .py file into my spark application and call some of the functions included inside that file
Here is what I tried:
I have a test file called Test.py which looks as follows:
def func():
    print "Import is working"
Inside my Spark application I do the following (as described in the docs):
sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])
I also tried this instead (after the Spark context is created):
sc.addFile("/[AbsolutePathTo]/Test.py")
I even tried the following when submitting my spark application:
./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py
However, I always get a name error:
NameError: name 'func' is not defined
when I am calling func() inside my app.py. (same error with 'Test' if I try to call Test.func())
Finally, al also tried importing the file inside the pyspark shell with the same command as above:
sc.addFile("/[AbsolutePathTo]/Test.py")
Strangely, I do not get an error on the import, but still, I cannot call func() without getting the error. Also, not sure if it matters, but I'm using spark locally on one machine.
I really tried everything I could think of, but still cannot get it to work. Probably I am missing something very simple. Any help would be appreciated.
Spark environment provides a command to execute the application file, be it in Scala or Java(need a Jar format), Python and R programming file. The command is, $ spark-submit --master <url> <SCRIPTNAME>. py .
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using bin/pyspark .
Alright, actually my question is rather stupid. After doing:
sc.addFile("/[AbsolutePathTo]/Test.py")
I still have to import the Test.py file like I would import a regular python file with:
import Test
then I can call
Test.func()
and it works. I thought that the "import Test" is not necessary since I add the file to the spark context, but apparently that does not have the same effect. Thanks mark91 for pointing me into the right direction.
UPDATE 28.10.2017:
as asked in the comments, here more details on the app.py
from pyspark import SparkContext
from pyspark.conf import SparkConf
conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Stream")
sc = SparkContext(conf=conf)
sc.addFile("Test.py")
import Test
Test.func()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With