In the following example i have called the pthread_join() for both the threads in the end(before i print the sum). Even though it is expected that the sum should be 0, it prints any value. I know that if i do pthread_join(id1,NULL) just before the creation of the 2nd thread then it would work fine(it does), but i don't understand why should not it work when i call join for both threads in the end.
Because sum is printed only after both the threads must have finished execution completely. So, after the execution of the first thread, it must have added 2000000 to the variable sum and second thread must have subtracted 2000000 from the sum sum SHOULD BE 0
long long sum=0;
void* counting_thread(void* arg)
{
int offset = *(int*) arg;
for(int i=0;i<2000000;i++)
{
sum=sum+offset;
}
pthread_exit(NULL);
}
int main(void)
{
pthread_t id1;
int offset1 = 1;
pthread_create(&id1,NULL,counting_thread,&offset1);
pthread_t id2;
int offset2 = -1;
pthread_create(&id2,NULL,counting_thread,&offset2);
pthread_join(id1,NULL);
pthread_join(id2,NULL);
cout<<sum;
}
The problem is that the sum=sum+offset; is not thread safe.
This is causing some sums not to be counted.
As you specified C++, std::atomic<long long> sum; Would help, but you need to use += operator, rather than the thread-unsafe sum = sum + count;
sum += offset;
A mutex to block updates would also help.
Without these changes, the compiler can produce code, which
sum at the beginning of the function, having only one thread applying its changes.sum for the addition.The compiler can legitimately read the value of sum when the thread starts, add offset to it n times, and store the value. This would mean only one thread would work.
Consider the following assembly code.
read sum
add offset to sum
store sum
thread1 thread2
1 read sum
2 add offset to sum read sum
3 store sum add offset to sum
4 read sum store sum
5 add offset to sum read sum
6 store sum add offset to sum
Line 3 of thread 2 adds the offset to the old value which makes line 3 of thread one get lost.
In multi-threaded systems, then the cache may be inconsistent between threads of the process.
That would mean that even after sum+=offset has been executed, then another core/CPU may see the pre-updated value.
This allows the CPUs to run faster, as they can ignore sharing the data between them. However, when 2 threads are accessing the same data, this needs to be taken into account.
std::atomic / mutex ensures :-
sum = sum + count is indivisible).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With