Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas AWS Glue Python Shell Jobs

The AWS Documentation https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

mentions that

The environment for running a Python shell job supports the following libraries:

...

pandas (required to be installed via the python setuptools configuration, setup.py)

But it does not mention how to make the install.

How can I use Pandas in a AWS Glue Python Shell Jobs ?

like image 234
Hugo Avatar asked Dec 06 '25 14:12

Hugo


2 Answers

Just to clarify Sandeep's answer, here is what worked for me

1/ Ignore AWS doc

2/ Create a setup.py file containing :

from setuptools import setup

setup(name="pandasmodule",
        version="0.1",
        packages=[],
        install_requires=['pandas==0.25.1']
    )

3/ Run this command in the folder containing the file :

python setup.py bdist_wheel

4/ Upload the .whl file to s3

5/ Configure the "Python lib path" in your Glue ETL Job to the s3 path

You can now use "import pandas as pd" in your Glue ETL Job

like image 171
Hugo Avatar answered Dec 09 '25 02:12

Hugo


  1. Goto https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library. Check section To create a Python .egg or .whl file for 'how to create setup file for python shell job'
  2. In setup.py file, add line install_requires=['pandas==0.25.1']:
setup(name="<module name>",
        version="0.1",
        packages=['<package name if any or ignore>'],
        install_requires=['pandas==0.25.1']
    )

I also wrote small shell script to deploy python shell job without manual steps to create egg file and upload to s3 and deploy via cloudformation. Script does all automatically. You may find code at https://github.com/fatangare/aws-python-shell-deploy

like image 39
Sandeep Fatangare Avatar answered Dec 09 '25 04:12

Sandeep Fatangare



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!