I have a pyspark project with a python script which runs spark-streaming. I've got some external dependencies which I run with --packages
flag.
However, in scala, we can use maven to download all required packages, make a jar file with the main spark program and have everything in one jar and then just use spark-submit
to submit it to the cluster (yarn in my case).
Is there any such similar things as jar
for pyspark?
There is no such information on the official documentation of spark. They just mention use spark-submit <python-file>
or add --py-files
but it isn't as professional as a jar
file.
Any suggestion would be helpful! Thanks!
The documentation says you can use zip or egg.
For Python applications, simply pass a .py file in the place of instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
Source
You might also find the other parameters useful.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With