Python 3.6 airflow with a Operator that requires 2.7

Question

I'm currently running an airflow (1.9.0) instance on python 3.6.5. I have a manual workflow that I'd like to move to a DAG. This manual workflow now requires code written in python 2 and 3. Let's simplify my DAG to 3 steps:

Dataflow job that processes data and sets up data for Machine Learning Training
Tensorflow ML training job
Other PythonOperators that I wrote using python 3 code

The dataflow job is written in python 2.7 (required by google) and the tensorflow model code is in python 3. Looking at "MLEngineTrainingOperator" in airflow 1.9.0 there is a python_version parameter which sets the "The version of Python used in training".

Questions:

Can I dynamically specify a specific python version in a worker environment?
Do I have to just install airflow on python 2.7 to make step 1) run?
Can I have tensorflow model code in python 3 that just gets packaged up and submitted via MlEngineTraining running on python 2?
Do I have to rewrite my 3) operators in python 2?

cwurtz · Accepted Answer

There isn't a way specify the python version dynamically on a worker. However if you are using the the Celery executor, you can run multiple workers either on difference servers/vms or in different virtual environments.

You can have one worker running python 3, and one running 2.7, and have each listening to different queues. This can be done three different ways:

When starting the worker you can add a -q [queue-name] flag
set an env of AIRFLOW__CELERY__DEFAULT_QUEUE
updating default_queue under [celery] in the airflow.cfg.

Then in your task definitions specify a queue parameter, changing the queue up depending on which python version the task needs to run.

I'm not familiar with the MLEngineOperator, but you can specify a python_version in the PythonOperator which should run it in a virtualenv of that version. Alternative you can use the BashOperator, write the code to run in a different file and specify the python command to run it using the absolute path the version of python you want to use.

Regardless of how the task is run, you just need to ensure the DAG itself is compatible with python version you are running it as it. ie. if you are going to start an airflow worker in different python versions, the DAG file itself needs to be python 2 & 3 compatible. The DAG can have addition file dependencies that it uses have version incompatibilities.

Python 3.6 airflow with a Operator that requires 2.7

Tags:

python

tensorflow

airflow

google-cloud-dataflow

Andrew Cassidy

1 Answers

cwurtz

Recent Activity

Donate For Us

Python 3.6 airflow with a Operator that requires 2.7

Tags:

python

tensorflow

airflow

google-cloud-dataflow

Andrew Cassidy

1 Answers

cwurtz

Related questions

Recent Activity

Donate For Us