I am trying to pass a local directory as a volume to airflow, which in turn is being passed to a dag DockerOperator
my airflow-docker-compose.yaml (in section airflow-common) looks as follows
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- ./input:/opt/airflow/input
- ./output:/opt/airflow/output
In the DAG code, I am trying to pass the mounts parameters as following:
eod_price = DockerOperator(
task_id='run_docker',
image='alpine',
api_version='auto',
command='/bin/touch /output/run_docker_touch.txt',
auto_remove=True,
mounts=[
Mount(source='/opt/airflow/output',
target='/app_base/output',
type='volume'),
],
mount_tmp_dir=False,
docker_url='tcp://docker-proxy:2375',
network_mode='bridge'
)
With this code, I get error:
Bad Request ("create /opt/airflow/output: "/opt/airflow/output" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path")
when I change the mounts line from type='volume' to type='bind':
mounts=[
Mount(source='/opt/airflow/output',
target='/app_base/output',
type='bind'),
],
the error changes to
Bad Request ("invalid mount config for type "bind": bind source path does not exist: /opt/airflow/output")
I did bash into docker007-airflow-scheduler-1, docker007-airflow-triggerer-1, docker007-airflow-webserver-1 , docker007-airflow-worker-1 and in each of the containers I see /opt/airflow/output directory where docker-compose ps output is
NAME COMMAND SERVICE STATUS PORTS
docker007-airflow-init-1 "/bin/bash -c 'funct…" airflow-init exited (0)
docker007-airflow-scheduler-1 "/usr/bin/dumb-init …" airflow-scheduler running (healthy) 8080/tcp
docker007-airflow-triggerer-1 "/usr/bin/dumb-init …" airflow-triggerer running (healthy) 8080/tcp
docker007-airflow-webserver-1 "/usr/bin/dumb-init …" airflow-webserver running (healthy) 0.0.0.0:8080->8080/tcp
docker007-airflow-worker-1 "/usr/bin/dumb-init …" airflow-worker running (healthy) 8080/tcp
docker007-docker-proxy-1 "socat TCP4-LISTEN:2…" docker-proxy running 0.0.0.0:2376->2375/tcp
docker007-postgres-1 "docker-entrypoint.s…" postgres running (healthy) 5432/tcp
docker007-redis-1 "docker-entrypoint.s…" redis running (healthy) 6379/tcp
My end goal is that I want the output of my dockerized python app that writes to /app_base/output(this echo command is only example) to be visible in local output directory
ok, finally figured it out. The source path is not of the airflow volume mount directory but your original source directory from where the airflow application was started.
The following code worked
pre_step = DockerOperator(
task_id='pre_step_docker',
image='alpine',
api_version='auto',
command='/bin/touch /output/run_docker_touch.txt',
auto_remove=True,
mounts=[
Mount(source='/Users/me/Documents/Python/GitHub/docker-learn/docker007/output',
target='/output',
type='bind'),
],
mount_tmp_dir=False,
docker_url='tcp://docker-proxy:2375',
network_mode='bridge'
)
In other words, the volumes mounted in docker-compose are not required. I have commented out what I had added
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
# - ./input:/opt/airflow/input
# - ./output:/opt/airflow/output
However, this hard coded path was also very ugly for me, so I added new variables in docker-compose file (in airflow-common --> environment section)
APP_INPUT_DIR: ${PWD}/input
APP_OUTPUT_DIR: ${PWD}/output
and updated my code as follows:
pre_step = DockerOperator(
task_id='pre_step_docker',
image='alpine',
api_version='auto',
command='/bin/touch /output/run_docker_touch.txt',
auto_remove=True,
mounts=[
Mount(source=os.getenv("APP_OUTPUT_DIR"),
target='/output',
type='bind'),
],
mount_tmp_dir=False,
docker_url='tcp://docker-proxy:2375',
network_mode='bridge'
)
Hope it helps someone!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With