Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive parallel function not using all cores

I recently implemented a recursive negamax algorithm, which I parallelized using OpenMP.

The interesting part is this:

#pragma omp parallel for
for (int i = 0; i < (int) pos.size(); i++)
{
    int val = -negamax(pos[i].first, -player, depth - 1).first;

    #pragma omp critical
    if (val >= best)
    {
        best = val;
        move = pos[i].second;
    }
}

On my Intel Core i7 (4 physical cores and hyper threading), I observed something very strange: while running the algorithm, it was not using all 8 available threads (logical cores), but only 4.

Can anyone explain why is it so? I understand the reasons the algorithm doesn't scale well, but why doesn't it use all the available cores?

EDIT: I changed thread to core as it better express my question.

like image 850
kaspersky Avatar asked Jun 25 '26 14:06

kaspersky


2 Answers

First, check whether you have enough iteration count, pos.size(). Obviously this should be a sufficient number.


Recursive parallelism is an interesting pattern, but it may not work very well with OpenMP, unless you're using OpenMP 3.0's task, Cilk, or TBB. There are several things that need to be considered:

(1) In order to use a recursive parallelism, you mostly need to explicitly call omp_set_nested(1). AFAIK, most implementations of OpenMP do not recursively spawn parallel for, because it may end up creating thousands physical threads, just exploding your operating system.

Until OpenMP 3.0's task, a OpenMP has a sort of 1-to-1 mapping of logical parallel task to a physical task. So, it won't work well in such recursive parallelism. Try out, but don't be surprised if even thousands threads are created!

(2) If you really want to use recursive parallelism with a traditional OpenMP, you need to implement code that controls the number of active threads:

if (get_total_thread_num() > TOO_MANY_THREADS) {
  // Do not use OpenMP
  ...
} else {
#pragma omp parallel for
  ...
}

(3) You may consider OpenMP 3.0's task. In your code, there could be huge number of concurrent tasks due to a recursion. To be efficiently working on a parallel machine, there must be an efficient mapping algorithm these logical concurrent tasks to physical threads (or logical processor, core). A raw recursive parallelism in OpenMP will create actual physical threads. OpenMP 3.0's task does not.

You may refer to my previous answer related to a recursive parallelism: C OpenMP parallel quickSort.

(4) Intel's Cilk Plus and TBB support full nested and recursive parallelism. In my small test program, the performance was far better than OpenMP 3.0. But, that was 3 years ago. You should check the latest OpenMP's implementation.


I have not a detailed knowledge of negamax and minimax. But, my gut says that using recursive pattern and a lock are unlikely to give a speedup. A simple Google search gives me: http://supertech.csail.mit.edu/papers/dimacs94.pdf

"But negamax is not a efficient serial search algorithm, and thus, it makes little sense to parallelize it."

like image 184
minjang Avatar answered Jun 28 '26 05:06

minjang


Optimal parallelism level has some additional considerations except as much threads as available. For example, operation systems used to schedule all threads of a single process to a single processor to optimize cache performance (unless the programmer changed it explicitly).

I guess OpenMP makes similar considerations when executing such code and you cannot always assume the maximum thread number is executed/

like image 41
MaMazav Avatar answered Jun 28 '26 06:06

MaMazav



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!