This question is regarding the use of multiple remote Celery workers on separate machines. The implementation of the App can be conceptualized as:

My App (Producer) will be adding multiple tasks (say 50) to the queue every 5 mins (imagine a python for loop iterating over a list of tasks to be performed asynchronously at every 5 min interval). I want the celery workers (which will be remote machines) to pick these tasks up as soon as they are pushed.
My question is will Celery/RabbitMQ automatically handle task distribution (so no Worker picks up a task that has already been picked up by a worker from the queue - i.e. to ensure work is not duplicated) and distribute the tasks evenly so no worker is left lazying about while other workers are working hard or do these have to be configured/programmed in the settings?*
I would most appreciate it if someone could forward me relevant documentation (I was checking out Celery docs but couldn't find this specific info regarding remote celery workers in this context.)
Automatically but you need to be aware of prefetching feature which is described here: http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits, read until the end of the page.
In short, prefetching works on two levels: worker level and process level, since a worker may have multiple processes. To disable prefetch on worker level you need to specify worker_prefetch_multiplier = 1 in celery settings, to disable on the process level you need to specify -Ofair option in worker's command line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With