The failure of Dekker-style synchronization is typically explained with reordering of instructions. I.e., if we write <pre class="prettyprint"><code>atomic_int X; atomic_int Y; int r1, r2; static void t1() { X.store(1, std::memory_order_relaxed) r1 = Y.load(std::memory_order_relaxed); } static void t2() { Y.store(1, std::memory_order_relaxed) r2 = X.load(std::memory_order_relaxed); } </code></pre> Then the loads can be reordered with the stores, leading to <code>r1==r2==0</code>. I was expecting an acquire_release fence to prevent this kind of reordering: <pre class="prettyprint"><code>static void t1() { X.store(1, std::memory_order_relaxed); atomic_thread_fence(std::memory_order_acq_rel); r1 = Y.load(std::memory_order_relaxed); } static void t2() { Y.store(1, std::memory_order_relaxed); atomic_thread_fence(std::memory_order_acq_rel); r2 = X.load(std::memory_order_relaxed); } </code></pre> The load cannot be moved above the fence and the store cannot be moved below the fence, and so the bad result should be prevented. However, experiments show <code>r1==r2==0</code> can still occur. Is there a reordering-based explanation for this? Where's the flaw in my reasoning?

As I understand it (mainly from reading Jeff Preshings blog), an <code>atomic_thread_fence(std::memory_order_acq_rel)</code> prevents any reorderings except for <code>StoreLoad</code>, i.e., it still allows to reorder a <code>Store</code> with a subsequent <code>Load</code>. However, this is exactly the reordering that has to be prevented in your example. More precisely, an <code>atomic_thread_fence(std::memory_order_acquire)</code> prevents the reordering of any previous <code>Load</code> with any subsequent <code>Store</code> and any subsequent <code>Load</code>, i.e., it prevents <code>LoadLoad</code> and <code>LoadStore</code> reorderings across the fence. An <code>atomic_thread_fence(std::memory_order_release)</code> prevents the reordering of any subsequent <code>Store</code> with any preceding <code>Store</code> and any preceding <code>Load</code>, i.e., it prevents <code>LoadStore</code> and <code>StoreStore</code> reorderings across the fence. An <code>atomic_thread_fence(std::memory_order_acq_rel)</code> then prevents the union, i.e., it prevents <code>LoadLoad</code>, <code>LoadStore</code>, and <code>StoreStore</code>, which means that only <code>StoreLoad</code> may still happen.

Why isn't a C++11 acquire_release fence enough for Dekker synchronization?

Tags:

synchronization

multithreading

c++11

atomic

memory-fences

The failure of Dekker-style synchronization is typically explained with reordering of instructions. I.e., if we write

atomic_int X;
atomic_int Y;
int r1, r2;
static void t1() { 
    X.store(1, std::memory_order_relaxed)
    r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
    Y.store(1, std::memory_order_relaxed)
    r2 = X.load(std::memory_order_relaxed);
}

Then the loads can be reordered with the stores, leading to r1==r2==0.

I was expecting an acquire_release fence to prevent this kind of reordering:

static void t1() {
    X.store(1, std::memory_order_relaxed);
    atomic_thread_fence(std::memory_order_acq_rel);
    r1 = Y.load(std::memory_order_relaxed);
}
static void t2() {
    Y.store(1, std::memory_order_relaxed);
    atomic_thread_fence(std::memory_order_acq_rel);
    r2 = X.load(std::memory_order_relaxed);
}

The load cannot be moved above the fence and the store cannot be moved below the fence, and so the bad result should be prevented.

However, experiments show r1==r2==0 can still occur. Is there a reordering-based explanation for this? Where's the flaw in my reasoning?

663

asked Dec 02 '14 11:12

Jason Ptacek

1 Answers

As I understand it (mainly from reading Jeff Preshings blog), an atomic_thread_fence(std::memory_order_acq_rel) prevents any reorderings except for StoreLoad, i.e., it still allows to reorder a Store with a subsequent Load. However, this is exactly the reordering that has to be prevented in your example.

More precisely, an atomic_thread_fence(std::memory_order_acquire) prevents the reordering of any previous Load with any subsequent Store and any subsequent Load, i.e., it prevents LoadLoad and LoadStore reorderings across the fence.

An atomic_thread_fence(std::memory_order_release) prevents the reordering of any subsequent Store with any preceding Store and any preceding Load, i.e., it prevents LoadStore and StoreStore reorderings across the fence.

An atomic_thread_fence(std::memory_order_acq_rel) then prevents the union, i.e., it prevents LoadLoad, LoadStore, and StoreStore, which means that only StoreLoad may still happen.

172

answered Oct 02 '22 01:10

Toby Brull

Related questions
                            
                                Thread abort leaves zombie transactions and broken SqlConnection
                            
                                Thread Mutual Exclusive Section
                            
                                Unable to finish a paused activity until it regains focus
                            
                                How do you run a task in the background in flutter?
                            
                                How many threads Parallel.For(Foreach) will create? Default MaxDegreeOfParallelism?
                            
                                CoreData asynchronous fetch causes concurrency debugger error
                            
                                Does Node.js support parallelism?
                            
                                How to send a string via PostMessage?
                            
                                General query about Callback functions and Threads
                            
                                How to synchronize Swing model with a rapidly changing "real" model?
                            
                                Best Way to send message to thread
                            
                                What is a good way to test that a Java method is synchronized?
                            
                                How to share an object which contains a filehandle?
                            
                                100 threads TIMED_WAITING in tomcat, causing it to stall as total number of threads crosses 200
                            
                                C++11 thread vs async performance (VS2013)
                            
                                Atomic UPDATE to increment integer in Postgresql
                            
                                Creating a lock that preserves the order of locking attempts in C++11
                            
                                Can I run multiple threads in a single heroku (python) dyno?
                            
                                Is there a way to have a Rust closure that moves only some variables into it?
                            
                                How to deal with Concurrency before you start coding [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With