Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow + Dockeroperator unable to pass mounts / volumes using mounts parameter

I am trying to pass a local directory as a volume to airflow, which in turn is being passed to a dag DockerOperator

my airflow-docker-compose.yaml (in section airflow-common) looks as follows

  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./input:/opt/airflow/input
    - ./output:/opt/airflow/output

In the DAG code, I am trying to pass the mounts parameters as following:

eod_price = DockerOperator(
        task_id='run_docker',
        image='alpine',
        api_version='auto',
        command='/bin/touch /output/run_docker_touch.txt',
        auto_remove=True,
        mounts=[
            Mount(source='/opt/airflow/output',
                  target='/app_base/output',
                  type='volume'),
        ],
        mount_tmp_dir=False,
        docker_url='tcp://docker-proxy:2375',
        network_mode='bridge'
    )

With this code, I get error:

Bad Request ("create /opt/airflow/output: "/opt/airflow/output" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path")

when I change the mounts line from type='volume' to type='bind':

mounts=[
            Mount(source='/opt/airflow/output',
                  target='/app_base/output',
                  type='bind'),
        ],

the error changes to

Bad Request ("invalid mount config for type "bind": bind source path does not exist: /opt/airflow/output")

I did bash into docker007-airflow-scheduler-1, docker007-airflow-triggerer-1, docker007-airflow-webserver-1 , docker007-airflow-worker-1 and in each of the containers I see /opt/airflow/output directory where docker-compose ps output is

NAME                            COMMAND                  SERVICE             STATUS              PORTS
docker007-airflow-init-1        "/bin/bash -c 'funct…"   airflow-init        exited (0)          
docker007-airflow-scheduler-1   "/usr/bin/dumb-init …"   airflow-scheduler   running (healthy)   8080/tcp
docker007-airflow-triggerer-1   "/usr/bin/dumb-init …"   airflow-triggerer   running (healthy)   8080/tcp
docker007-airflow-webserver-1   "/usr/bin/dumb-init …"   airflow-webserver   running (healthy)   0.0.0.0:8080->8080/tcp
docker007-airflow-worker-1      "/usr/bin/dumb-init …"   airflow-worker      running (healthy)   8080/tcp
docker007-docker-proxy-1        "socat TCP4-LISTEN:2…"   docker-proxy        running             0.0.0.0:2376->2375/tcp
docker007-postgres-1            "docker-entrypoint.s…"   postgres            running (healthy)   5432/tcp
docker007-redis-1               "docker-entrypoint.s…"   redis               running (healthy)   6379/tcp

My end goal is that I want the output of my dockerized python app that writes to /app_base/output(this echo command is only example) to be visible in local output directory

like image 945
Ani Avatar asked Sep 11 '25 15:09

Ani


1 Answers

ok, finally figured it out. The source path is not of the airflow volume mount directory but your original source directory from where the airflow application was started.

The following code worked

pre_step = DockerOperator(
    task_id='pre_step_docker',
    image='alpine',
    api_version='auto',
    command='/bin/touch /output/run_docker_touch.txt',
    auto_remove=True,
    mounts=[
        Mount(source='/Users/me/Documents/Python/GitHub/docker-learn/docker007/output',
              target='/output',
              type='bind'),
    ],
    mount_tmp_dir=False,
    docker_url='tcp://docker-proxy:2375',
    network_mode='bridge'
)

In other words, the volumes mounted in docker-compose are not required. I have commented out what I had added

  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
#    - ./input:/opt/airflow/input
#    - ./output:/opt/airflow/output

However, this hard coded path was also very ugly for me, so I added new variables in docker-compose file (in airflow-common --> environment section)

APP_INPUT_DIR: ${PWD}/input
APP_OUTPUT_DIR: ${PWD}/output

and updated my code as follows:

pre_step = DockerOperator(
    task_id='pre_step_docker',
    image='alpine',
    api_version='auto',
    command='/bin/touch /output/run_docker_touch.txt',
    auto_remove=True,
    mounts=[
        Mount(source=os.getenv("APP_OUTPUT_DIR"),
              target='/output',
              type='bind'),
    ],
    mount_tmp_dir=False,
    docker_url='tcp://docker-proxy:2375',
    network_mode='bridge'
)

Hope it helps someone!

like image 124
Ani Avatar answered Sep 14 '25 08:09

Ani