I have this C++ code.
Loop goes throgh the matrix, finds the min element in each row and subtracts it from each element of corresponding row. Variable myr is a summ of all min elements
Trying to parallel for:
int min = 0;
int myr = 0;
int temp[SIZE][SIZE];
int size = 0;
...//some initialization
omp_set_num_threads(1);
start_time = omp_get_wtime();
#ifdef _OPENMP
#pragma omp parallel for firstprivate(min, size) reduction(+:myr)
#endif
for(int i = 0; i < size; i++){
min = INFINITY;
for(int j = 0; j < size; j++){
if (temp[i][j] < min)
min = temp[i][j];
}
myr+=min;
for(int j = 0; j < size; j++)
temp[i][j]-=min;
}
end_time = omp_get_wtime();
if I set omp_set_num_threads(2); this part of code starts working slower.
My proc has 2 cores
Why code works slower with 2 threads?
There must be some aliasing or something going on. Make things simpler for OpenMP:
int const size0 = size;
#ifdef _OPENMP
#pragma omp parallel for reduction(+:myr)
#endif
for(int i = 0; i < size0; i++){
int min = INFINITY;
int * tmp = temp[i];
for(int j = 0; j < size0; j++){
if (tmp[j] < min)
min = tmp[j];
}
for(int j = 0; j < size0; j++)
tmp[j]-=min;
myr+=min;
}
That is, have most of the variables local and const if you may.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With