Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Determining when a ThreadPool has finished processing a queue

I am trying to implement a thread pool that processes a task queue using ThreadPool and Queue. It begins with an initial queue of tasks, and then each of the tasks may also push additional tasks onto the task queue. The problem is I don't know how to block until the queue is empty and the thread pool has finished processing, but still check the queue and submit any new tasks to the thread pool that were pushed onto the queue. I can't simply call ThreadPool.join(), because I need to keep the pool open for new tasks.

For example:

from multiprocessing.pool import ThreadPool
from Queue import Queue
from random import random
import time
import threading

queue = Queue()
pool = ThreadPool()
stdout_lock = threading.Lock()

def foobar_task():
    with stdout_lock: print "task called" 
    if random() > .25:
        with stdout_lock: print "task appended to queue"
        queue.append(foobar_task)
    time.sleep(1)

# set up initial queue
for n in range(5):
    queue.put(foobar_task)

# run the thread pool
while not queue.empty():
    task = queue.get() 
    pool.apply_async(task)

with stdout_lock: print "pool is closed"
pool.close()
pool.join()

This outputs:

pool is closed
task called
task appended to queue
task called
task appended to queue
task called
task appended to queue
task called
task appended to queue
task called
task appended to queue

This exits the while loop before the foobar_tasks have appended to the queue, so the appended tasks are never submitted to the thread pool. I can't find any way to determine if the thread pool still has any active worker threads. I tried the following:

while not queue.empty() or any(worker.is_alive() for worker in pool._pool):
    if not queue.empty():
        task = queue.get() 
        pool.apply_async(task)
    else:   
        with stdout_lock: print "waiting for worker threads to complete..."
        time.sleep(1)

But it seems that worker.is_alive() always returns true, so this goes into an infinite loop.

Is there a better way to do this?

like image 731
del Avatar asked Nov 28 '25 23:11

del


1 Answers

  1. Call queue.task_done after each task is processed.
  2. Then you can call queue.join() to block the main thread until all tasks have been completed.
  3. To terminate the worker threads, put a sentinel (e.g. None) in the queue, and have foobar_task break out of the while-loop when it receives the sentinel.
  4. I think this is easier to implement with threading.Threads than with a ThreadPool.

import random
import time
import threading
import logging
import Queue

logger=logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG)

sentinel=None
queue = Queue.Queue()
num_threads = 5

def foobar_task(queue):
    while True:
        n = queue.get()
        logger.info('task called: {n}'.format(n=n))
        if n is sentinel: break
        n=random.random()
        if n > .25:
            logger.info("task appended to queue")
            queue.put(n)
        queue.task_done()

# set up initial queue
for i in range(num_threads):
    queue.put(i)

threads=[threading.Thread(target=foobar_task,args=(queue,))
         for n in range(num_threads)]
for t in threads:
    t.start()

queue.join()
for i in range(num_threads):
    queue.put(sentinel)

for t in threads:
    t.join()
logger.info("threads are closed")
like image 85
unutbu Avatar answered Dec 01 '25 14:12

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!