Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

behavior of pragma omp parallel with for loops

I do not seem to understand exactly the behavior of openmp parallel constructs with nested for loops. Consider the following code:

std::size_t idx;
std::size_t idx2;
omp_set_num_threads( 2 );

#pragma omp parallel default(shared) private(idx, idx2)
{

  for(std::size_t idx=0;idx<3;idx++)
  {
    for(std::size_t idx2=0;idx2<4;idx2++)
    {
      LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
    }
  }
}

This produces the following output:

From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 0
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 1
From thread 0 idx 0 idx2 2
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 3
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 0
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 1
From thread 0 idx 1 idx2 2
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 3
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 1 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 1
From thread 0 idx 2 idx2 2
From thread 1 idx 2 idx2 2
From thread 0 idx 2 idx2 3
From thread 1 idx 2 idx2 3

What seems to happen above is that 2 threads are assigned to execute the two nested loops and as a result they produce the above output (2*3*4=24 log messages total), which is straightforward.

But now consider the following code where the inner for loop is declared as a pragma omp for

std::size_t idx;
std::size_t idx2;    
omp_set_num_threads( 2 );

#pragma omp parallel default(shared) private(idx, idx2)
{

  for(std::size_t idx=0;idx<3;idx++)
  {
    #pragma omp for
    for(std::size_t idx2=0;idx2<4;idx2++)
    {
      LOG("From thread "+std::to_string(omp_get_thread_num())+" idx "+std::to_string(idx)+" idx2 "+std::to_string(idx2));
    }
  }
}

This produces the following 3*4=12 log messages:

From thread 0 idx 0 idx2 0
From thread 1 idx 0 idx2 2
From thread 0 idx 0 idx2 1
From thread 1 idx 0 idx2 3
From thread 0 idx 1 idx2 0
From thread 1 idx 1 idx2 2
From thread 0 idx 1 idx2 1
From thread 1 idx 1 idx2 3
From thread 0 idx 2 idx2 0
From thread 0 idx 2 idx2 1
From thread 1 idx 2 idx2 2
From thread 1 idx 2 idx2 3

I would have expected again two threads to be assigned to the code corresponding to the two inner for loops and get again 24 output messages. Why is the output different in these two cases?

like image 488
astrophobia Avatar asked Dec 06 '25 03:12

astrophobia


1 Answers

In the first case #pragma omp parallel runs the entire parallel region once on each thread. This means both threads will run both for loops entirely, so each thread should generate 4*3=12 lines of output.

In the second case, the inner #pragma omp for tells the computer that the inner for loop on idx2 should be split among available threads. So instead of both threads executing the inner loop from 0 to idx2, each iteration of the inner loop will be executed exactly once.

In the second output we should see all values of idx2 being printed exactly once for each value of idx and from whatever thread happened to be available.

e.g. if idx could only be zero the output might look something like:

From thread ? idx 0 idx2 0
From thread ? idx 0 idx2 1
From thread ? idx 0 idx2 2
From thread ? idx 0 idx2 3

where ? means it could be any available thread.