Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.6 airflow with a Operator that requires 2.7

I'm currently running an airflow (1.9.0) instance on python 3.6.5. I have a manual workflow that I'd like to move to a DAG. This manual workflow now requires code written in python 2 and 3. Let's simplify my DAG to 3 steps:

  1. Dataflow job that processes data and sets up data for Machine Learning Training
  2. Tensorflow ML training job
  3. Other PythonOperators that I wrote using python 3 code

The dataflow job is written in python 2.7 (required by google) and the tensorflow model code is in python 3. Looking at "MLEngineTrainingOperator" in airflow 1.9.0 there is a python_version parameter which sets the "The version of Python used in training".

Questions:

  • Can I dynamically specify a specific python version in a worker environment?
  • Do I have to just install airflow on python 2.7 to make step 1) run?
  • Can I have tensorflow model code in python 3 that just gets packaged up and submitted via MlEngineTraining running on python 2?
  • Do I have to rewrite my 3) operators in python 2?
like image 520
Andrew Cassidy Avatar asked Nov 15 '25 05:11

Andrew Cassidy


1 Answers

There isn't a way specify the python version dynamically on a worker. However if you are using the the Celery executor, you can run multiple workers either on difference servers/vms or in different virtual environments.

You can have one worker running python 3, and one running 2.7, and have each listening to different queues. This can be done three different ways:

  • When starting the worker you can add a -q [queue-name] flag
  • set an env of AIRFLOW__CELERY__DEFAULT_QUEUE
  • updating default_queue under [celery] in the airflow.cfg.

Then in your task definitions specify a queue parameter, changing the queue up depending on which python version the task needs to run.

I'm not familiar with the MLEngineOperator, but you can specify a python_version in the PythonOperator which should run it in a virtualenv of that version. Alternative you can use the BashOperator, write the code to run in a different file and specify the python command to run it using the absolute path the version of python you want to use.

Regardless of how the task is run, you just need to ensure the DAG itself is compatible with python version you are running it as it. ie. if you are going to start an airflow worker in different python versions, the DAG file itself needs to be python 2 & 3 compatible. The DAG can have addition file dependencies that it uses have version incompatibilities.

like image 192
cwurtz Avatar answered Nov 17 '25 19:11

cwurtz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!