When using LocalExecutor with a MySQL backend, running airflow scheduler on my Centos 6 box creates 33 scheduler processes, e.g.
deploy   55362 13.5  1.8 574224 73272 ?        Sl   18:59   7:42 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
deploy   55372  0.0  1.5 567928 60552 ?        Sl   18:59   0:00 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
deploy   55373  0.0  1.5 567928 60540 ?        Sl   18:59   0:00 /usr/local/bin/python2.7 /usr/local/bin/airflow scheduler
...
These are distinct from Executor processes and gunicorn master and worker processes.
Running it with the SequentialExecutor  (sqlite backend) just kicks off one scheduler process.
Airflow still works (DAGs are getting run), but the sheer number of these processes makes me think something is wrong.
When I run select * from job where state = 'running'; in the database, only 5 SchedulerJob rows get returned. 
Is this normal?
Yes this is normal. These are scheduler processes. You can control this using below parameter in airflow.cfg
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
These are spawned from scheduler whose pid can be found in airflow-scheduler.pid file
so 32+1=33 processes that you are seeing.
Hope this clears out your doubt.
Cheers!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With