Let's take this struct:
struct entry {
    atomic<bool> valid;
    atomic_flag writing;
    char payload[128];
}
Two treads A and B concurrently access this struct this way (let e be an instance of entry):
if (e.valid) {
    // do something with e.payload...
} else {
    while (e.writing.test_and_set(std::memory_order_acquire));
    if (!e.valid) {
       // write e.payload one byte at a time
       // (the payload written by A may be different from the payload written by B)
       e.valid = true;
       e.writing.clear(std::memory_order_release);
    }
}
I guess that this code is correct and does not present issues, but I want to understand why it works.
Quoting the C++ standard (29.3.13):
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Now, bearing this in mind, imagine that both thread A and B enter the else block. Is this interleave possible?
A and B enter the else branch, because valid is false
A sets the writing flagB starts to spin lock on the writing flagA reads the valid flag (which is false) and enters the if blockA writes the payloadA writes true on the valid flag; obviously, if A reads valid again, it would read true
A clears the writing flagB sets the writing flagB reads a stale value of the valid flag (false) and enters the if blockB writes its payloadB writes true on the valid flagB clears the writing flagI hope this is not possible but when it comes to actually answer the question "why it is not possible?", I'm not sure of the answer. Here is my idea.
Quoting from the standard again (29.3.12):
Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.
atomic_flag::test_and_set() is an atomic read-modify-write operation, as stated in 29.7.5.
Since atomic_flag::test_and_set() always reads a "fresh value", and I'm calling it with the std::memory_order_acquire memory ordering, then I cannot read a stale value of the valid flag, because I must see all the side-effects caused by A before the atomic_flag::clear() call (which uses std::memory_order_release).
Am I correct?
Clarification. My whole reasoning (wrong or correct) relies on 29.3.12. For what I understood so far, if we ignore the atomic_flag, reading stale data from valid is possible even if it's atomic. atomic doesn't seem to mean "always immediately visible" to every thread. The maximum guarantee you can ask for is a consistent order in the values you read, but you can still read stale data before getting the fresh one. Fortunately, atomic_flag::test_and_set() and every exchange operation have this crucial feature: they always read fresh data. So, only if you acquire/release on the writing flag (not only on valid), then you get the expected behavior. Do you see my point (correct or not)?
EDIT: my original question included the following few lines that gained too much attention if compared to the core of the question. I leave them for consistency with the answers that have been already given, but please ignore them if you are reading the question right now.
Is there any point invalidbeing anatomic<bool>andnot a plainbool? Moreover, if it should be anatomic<bool>,what is its 'minimum' memory ordering constraint that will not presentissues?
Inside the else branch valid should be protected by the acquire/release semantics imposed by the operations on waiting. However this does not obliviate the need to make valid an atomic:
You forgot to include the first line (if (e.valid)) in your analysis. If valid was an bool instead of atomic<bool> this access would be completely unprotected. Therefore you could have the situation where a change of valid becomes visible to other threads before the payload is completely written/visible. This means that a thread B could evaluate e.valid to true and enter the do something with e.payload branch while the payload isn't completely written yet. 
Other then that your analysis seems somewhat reasonable but not entirely correct to me. The thing to remember with memory ordering is that acquire and release semantics will pair up. Everything written before a release operation can safely be read after an acquire operation on the same veriable reads the modified value. With that in mind the release semantics on waiting.clear(...) ensure that the write to valid must be visible when the loop on writing.test_and_set(...) exits, since the later reads the change of waiting(the write done inwaiting.clear(...)`) with acquire semantics and doesn't exit before that change is visible.
Regarding §29.3.12: It is relevant to the correctness of your code, but unrelated to the reading a stale valid flag. You can't set the flag before the clear, so acquire-release semantics will ensure correctness there. §29.3.12 protects you from the following scenario:
- Both A and B enter the else branch, because valid is false
- A sets the writing flag
- B sees a stale value for writing and also sets it
- Both A and B read the valid flag (which is false), enter the if block and write the payload creating a race condition
Edit: For the minimal Ordering constraints: acquire for the loads and release for the stores should probably do the job, however depending on your target hardware you might as well stay with sequential consistency. For the difference between those semantics look here.
Section 29.3.12 has nothing to do with why this code is correct or incorrect. The section you want (in the draft version of the standard available online) is Section 1.10: "Multi-threaded executions and data races." Section 1.10 defines a happens-before relation on atomic operations, and on non-atomic operations with respect to atomic operations.
Section 1.10 says that if there are two non-atomic operations where you can not determine the happens-before relationship then you have a data-race. It further declares (Paragraph 21) that any program with a data-race has undefined behavior.
If e.valid is not atomic then you have a data race between the first line of code and the line e.valid=true.  So all of your reasoning about the behavior in the else clause is incorrect (the program has no defined behavior so there is nothing to reason about.)
On the other hand if all of your accesses to e.valid were protected by atomic operations on e.writing (like if the else clause was your whole program) then your reasoning would be correct.  Event 9 in your list could not happen.  But the reason is not Section 29.3.12, it is again Section 1.10, which says that your non-atomic operations will appear to be sequentially consistent if there are no dataraces.
The pattern you are using is called double checked locking. Before C++11 it was impossible to implement double checked locking portably. In C++11 you can make double checked locking work correctly and portably. The way you do it is by declaring valid to be atomic. 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With