Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Threadpool Deadlock: designing against or detecting

I hope this isn't overly broad; my question is "How do I design a service with multiple Threadpools which can't deadlock itself?".

I own a web service which fans out to 100s of threads on a single user request, to perform data aggregation with low latency. There are a number of ExecutorServices wrapping fixed-thread pools sprinkled throughout my service, and I need help solving an interesting way this can create deadlock.

I have a threadpool A which is used to hold Threads making network requests, and another threadpool B which is used to hold their "owning" threads; aggregation bits of business-logic which might fan out into a handful of requests. Additionally, threads in B occasionally submit bits of work to threadpool B, when an aggregation can be done by aggregating 3 simpler sub-aggregations.

This pattern is the problem. Let's consider a style of request x submitted to B which causes an additional request x' to be submitted to B. Let's also consider B is a fixed thread pool of 50 threads. When 50 requests of type x come in at the same time, all threads in B are used to handle these requests. All of them submit their x1 to B, which sits in the queue waiting for a thread. And then all processing of all requests sits in deadlock for 60 seconds until a timeout is hit and the x requests all return Exceptions.

Things I've considered/tried:

  • Tweak numbers. Maximum users who can connect is 50, threads in B is 100. Prevents the problem, but seems like a hack that will break when another dev tweaks unrelated numbers in a year and no one can figure out why we lock up once a week under load. I want to solve this in the design.
  • B submits fanned out work to B', a new threadpool. Doesn't work because this fan-out can potentially go multiple steps (do I create B'', B''', ...?)
  • B has no max threads. Possibly acceptable, seems dangerous.
  • Another model (more callbackish?) where threads don't submit and wait for the same unit of work; rather they submit work and submit a "callback" into the "run-after" pool. This way nothing can wait for something in its own pool. Is there precedent, is this a good idea?
  • Collapse all thread pools together and remove the max?
like image 864
Cory Kendall Avatar asked Jan 24 '26 20:01

Cory Kendall


1 Answers

Your "more callbackish" answer seems like it'd mostly be solved for you via the CompletionStage API in Java 8 - the lack of a "runAfterAllAsync" method means you may have to do some external work to get something to happen after your group of 3 subtasks, but this is where I'd start to look. This tutorial has an example that may be of some help.

like image 152
Sbodd Avatar answered Jan 26 '26 08:01

Sbodd