The failure of Dekker-style synchronization is typically explained with reordering of instructions. I.e., if we write
atomic_int X;
atomic_int Y;
int r1, r2;
static void t1() {
X.store(1, std::memory_order_relaxed)
r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
Y.store(1, std::memory_order_relaxed)
r2 = X.load(std::memory_order_relaxed);
}
Then the loads can be reordered with the stores, leading to r1==r2==0.
I was expecting an acquire_release fence to prevent this kind of reordering:
static void t1() {
X.store(1, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_acq_rel);
r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
Y.store(1, std::memory_order_relaxed);
atomic_thread_fence(std::memory_order_acq_rel);
r2 = X.load(std::memory_order_relaxed);
}
The load cannot be moved above the fence and the store cannot be moved below the fence, and so the bad result should be prevented.
However, experiments show r1==r2==0 can still occur. Is there a reordering-based explanation for this? Where's the flaw in my reasoning?
And finally, ifthe relaxed atomic load reads the value written by the relaxed atomic store, the C++11 standard says that the fences synchronize-witheach other, just as I’ve shown. I like C++11’s approach to portable memory fences.
The most important thing to know about acquire and release fences is that they can establish a synchronizes-withrelationship, which means that they prohibit memory reordering in a way that allows you to pass information reliably between threads.
First things first: Acquire and release fences are considered low-levellock-free operations. If you stick with higher-level, sequentially consistentatomic types, such as volatilevariables in Java 5+, or default atomics in C++11, you don’t need acquire and release fences.
On the SPARC-V9 architecture, an acquire fence can be implemented using the membar #LoadLoad | #LoadStoreinstruction, and an a release fence can be implemented as membar #LoadStore | #StoreStore.
As I understand it (mainly from reading Jeff Preshings blog), an atomic_thread_fence(std::memory_order_acq_rel) prevents any reorderings except for StoreLoad, i.e., it still allows to reorder a Store with a subsequent Load. However, this is exactly the reordering that has to be prevented in your example.
More precisely, an atomic_thread_fence(std::memory_order_acquire) prevents the reordering of any previous Load with any subsequent Store and any subsequent Load, i.e., it prevents LoadLoad and LoadStore reorderings across the fence.
An atomic_thread_fence(std::memory_order_release) prevents the reordering of any subsequent Store with any preceding Store and any preceding Load, i.e., it prevents LoadStore and StoreStore reorderings across the fence.
An atomic_thread_fence(std::memory_order_acq_rel) then prevents the union, i.e., it prevents LoadLoad, LoadStore, and StoreStore, which means that only StoreLoad may still happen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With