I'm trying to run a Glue job (version 4) to perform a simple data batch processing. I'm using additional python libraries that Glue environment doesn't provide with - translate and langdetect. Additionally, regardless of the Glue env provides with 'nltk' package, when I try to import it I keep receiving the error that dependencies are not found (e.g. regex._regex, _sqlite3).
I tried a few solutions to achieve my goal:
--extra-py-files
where I specified path to s3 bucket where I uploaded either:.zip
file that consists of translate and langdetect python packages.whl
format (along with its dependencies)--additional-python-modules
where I specified path to s3 bucket where I uploaded:.whl
format (along with its dependencies)pip3
Additionally, I followed a few useful sources to overcome the issue of ModuleNotFoundError:
a) https://aws.amazon.com/premiumsupport/knowledge-center/glue-import-error-no-module-named/.
b) https://aws.amazon.com/premiumsupport/knowledge-center/glue-version2-external-python-libraries/
c) https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html
Also, I tried to play with the Glue versions 4 and 3 but haven't had luck. It seems like a bug. All permissions to read s3 bucket is granted to the glue role. The Python script version is the same as the libraries I'm trying to install - Python 3. To give you more clues, I manage glue resources via Terraform.
What did I do wrong?
The way I have been able to achieve this is in AWS Glue 4.0 is by taking the following steps: Under the Job Details tab, scroll down to Advanced Properties and expand that section. Locate the Job parameters region and add a New Parameter. For key, enter the text below: --additional-python-modules For value, enter your package name as found in the pyp.org. Example: PyMySQL==1.0.3,SQLAlchemy==2.0.19 or in your case: langdetect==1.0.9,translate==3.6.1
For each package, use a comma to separate them. This process is a lot easier than zipping packages and uploading to s3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With