Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS EMR import pyfile from S3

I'm struggling to understand how to import files as libraries with pyspark.

Let's say that I have the following

HappyBirthday.py

def run():
    print('Happy Birthday!')

sparky.py

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
import HappyBirthday
sc = SparkContext(appName="kmeans")

HappyBirthday.run()
sc.stop()

And both of them are stored in the same folder in S3.

How I make sure that, when I use

spark-submit --deploy-mode cluster s3://<PATH TO FILE>/sparky.py

, HappyBirthday.py is also imported?

like image 365
Frost Avatar asked Dec 18 '25 08:12

Frost


1 Answers

If you are trying to run sparky.py and use a function inside HappyBirthday.py, you can try something like this.

spark-submit \
--deploy-mode cluster --master yarn \
--py-files s3://<PATH TO FILE>/HappyBirthday.py \
s3://<PATH TO FILE>/sparky.py

Just remember that s3 does not have the concept of "folders", so you just need to provide the exact path of the files or the group of files.

In case you have a whole bunch of dependencies in your project, you can bundle them all up into a single .zip file with the necessary init.py files and you can import any of the functions inside the libraries.

For example - I have sqlparse library as a dependency, with a bunch of python file s inside it. I have a package zip file, like below.

unzip -l packages.zip
Archive:  packages.zip
        0  05-05-2019 12:44   sqlparse/
     2249  05-05-2019 12:44   sqlparse/__init__.py
     5916  05-05-2019 12:44   sqlparse/cli.py
...
      110  05-05-2019 12:44   sqlparse-0.3.0.dist-info/WHEEL
---------                     -------
   125034                     38 files

This is uploaded to S3 and then used in the job.

spark-submit --deploy-mode cluster --master yarn --py-files s3://my0-test-bucket/artifacts/packages.zip s3://my-test-script/script/script.py

My file can contain imports like below.

import pyspark
import sqlparse # Importing the library
from pprint import pprint
like image 58
Rajesh Chamarthi Avatar answered Dec 20 '25 03:12

Rajesh Chamarthi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!