I'm trying to figure out how I can get the maximum performance from a multithreaded app.
I have a thread pool which I created like this:
ExecutorService executor = Executors.newFixedThreadPool(8); // I have 8 CPU cores.
My question is, should I divide the work into only 8 runnables/callables, which is the same number as the threads in the thread pool, or should I divide it into say 1000000 runnables/callables?
for (int i = 0; i < 1000000; i++)
{
Callable<Long> worker = new MyCallable(); // Each worker does little work.
Future<Long> submit = executor.submit(worker);
}
long sum = 0;
for (Future<Long> future : list)
sum += future.get(); // Much more overhead from the for loops
OR
for (int i = 0; i < 8; i++)
{
Callable<Long> worker = new MyCallable(); // Each worker does much more work.
Future<Long> submit = executor.submit(worker);
}
long sum = 0;
for (Future<Long> future : list)
sum += future.get(); // Negligible overhead from the for loops
Dividing into 1000000 callables seems slower to me since there is the overhead of instantiating all these callables and collecting results from them in for loops. On the other hand If I have 8 callables this overhead is negligible. And since I have only 8 threads, I can't run 1000000 callables at the same time so there is no performance gain from there.
Am I right or wrong?
BTW I could test these cases but the operation is very trivial and I guess the compiler realizes that and makes some optimizations. So the result might be misleading. I want to know which approach is better for something like an image processing app.
There is no straightforward answer to this question because it depends on lot of things like your code, application loigc, max, concurrency possible, hw etc.
But while considering concurrency you should consider below things,
Every runnable needs a stack which is private for that thread thus if you create large no. of threads memory consumption in thread is more than actual application usage
Thread should perform task which are independent and parallel.
Find out code patch which can be actually executed in parallel without any dependency otherwise threading will not help much
What is hardware configuration?
Maximum concurrent execution of threads you can achieve is equal total no. of cpu cores. If you have less no. of cores and huge no. of threads then switching task is more active (use cpu) than actual thread. This can badly hamper performance
All in all your second approach looks good to me but if possible find out more parallelism and you can extend it upto 20-30.
There are two aspects to this question.
First you have the technical Java stuff. As you have a few answers about this, I 'll summarize to these basics:
Thread should do more work than what is required for the task, i.e. Having N Threads counting to 10 would be much slower as the overhead of creating and managing the extra Threads is higher than the benefit of counting to 10 in parallelThreads calling a synchronized increment methods would be much slowerThreads do take up resources, most commonly memory. The more threads you have, the more difficult it becomes to estimate you memory usage and might affect GC timing (rare but I've seen it happen)Secondly you have the scheduling theory. You need to consider what is your program doing
Threads for blocking I/O operations. You don't want you program to wait for network or HDD if you could be using your CPU for other tasksa is the scheduling waiting time approximation. This is very theoretical because there are many variables not explained, but can help in designing threaded programs. (Also in the example above, since you are waiting on the Futures you most likely don't care about average response times)Thread pools. Using multiple pools can cause deadlocks (if dependencies are introduced among the two pools) and make it hard to optimize (contention can be created among the pools and getting the sizes right might become impossible)--EDIT--
Finally, if it helps, the way I think about performance is that I have 4 primary resources: CPU, RAM, Disk & Network. I try to find which is my bottleneck and use non-saturated resources to optimize. For example, if I have lots of idle CPU and low memory, I might compress my in-memory data. If I have lots of disk I/O and large memory, cache more data. If network resources (not the actual network connection) are slow use many threads to parallelize. Once you saturate a resource type on your critical path and can't use other resources to speed it up, you've reached your maximum performance and you need to upgrade your H/W to get faster results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With