Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run Apache Airflow DAG without Apache Airflow

Tags:

airflow

So here's a stupid idea...

I created (many) DAG(s) in airflow... and it works... however, i would like to package it up somehow so that i could run a single DAG Run without having airflow installed; ie have it self contained so i don't need all the web servers, databases etc.

I mostly instantiate new DAG Run's with trigger dag anyway, and i noticed that the overhead of running airflow appears quite high (workers have high loads doing essentially nothing, it can sometimes take 10's of seconds before dependent tasks are queued etc).

i'm not too bothered about all the logging etc.

like image 733
yee379 Avatar asked Aug 31 '25 20:08

yee379


2 Answers

You can create a script which executes airflow operators, although this loses all the meta data that Airflow provides. You still need to have Airflow installed as a Python package, but you don't need to run any webservers, etc. A simple example could look like this:

from dags.my_dag import operator1, operator2, operator3

def main():
    # execute pipeline
    # operator1 -> operator2 -> operator3

    operator1.execute(context={})
    operator2.execute(context={})
    operator3.execute(context={})

if __name__ == "__main__":
    main()
like image 140
Paulius Venclovas Avatar answered Sep 05 '25 04:09

Paulius Venclovas


It sounds like your main concern is the waste of resources by the idling workers more so than the waste of Airflow itself.

I would suggest running Airflow with the LocalExecutor on a single box. This will give you the benefits of concurrent execution without the hassle of managing workers.

As for the database - there is no way to remove the database component without modifying airflow source itself. One alternative would be to leverage the SequentialExecutor with SQLite, but this removes the ability to run concurrent tasks and is not recommended for production.

like image 28
andscoop Avatar answered Sep 05 '25 02:09

andscoop