Does standard C++11 guarantee that memory_order_seq_cst prevents StoreLoad reordering around an atomic operation for non-atomic memory accesses?
As known, there are 6 std::memory_orders in C++11, and its specifies how regular, non-atomic memory accesses are to be ordered around an atomic operation - Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
§ 29.3 Order and consistency
§ 29.3 / 1
The enumeration memory_order specifies the detailed regular (non-atomic) memory synchronization order as defined in 1.10 and may provide for operation ordering. Its enumerated values and their meanings are as follows:
Also known, that these 6 memory_orders prevent some of these reordering:

But, does memory_order_seq_cst prevent StoreLoad reordering around an atomic operation for regular, non-atomic memory accesses or only for other atomic with the same memory_order_seq_cst?
I.e. to prevent this StoreLoad-reordering should we use std::memory_order_seq_cst for both STORE and LOAD, or only for one of it?
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // Sequential Consistency
a.load(std::memory_order_seq_cst); // Sequential Consistency
About Acquire-Release semantic is all clear, it specifies exactly non-atomic memory-access reordering across atomic operations: http://en.cppreference.com/w/cpp/atomic/memory_order
To prevent StoreLoad-reordering we should use std::memory_order_seq_cst.
Two examples:
std::memory_order_seq_cst for both STORE and LOAD: there is MFENCE StoreLoad can't be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/mVZJs0
std::atomic<int> a, b;
b.store(1, std::memory_order_seq_cst); // can't be executed after LOAD
a.load(std::memory_order_seq_cst); // can't be executed before STORE
std::memory_order_seq_cst for LOAD only: there isn't MFENCE
StoreLoad can be reordered - GCC 6.1.0 x86_64: https://godbolt.org/g/2NLy12
std::atomic<int> a, b;
b.store(1, std::memory_order_release); // can be executed after LOAD
a.load(std::memory_order_seq_cst); // can be executed before STORE
Also if C/C++-compiler used alternative mapping of C/C++11 to x86, which flushes the Store Buffer before the LOAD: MFENCE,MOV (from memory), so we must use std::memory_order_seq_cst for LOAD too: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html As this example is discussed in another question as approach (3): Does it make any sense instruction LFENCE in processors x86/x86_64?
I.e. we should use std::memory_order_seq_cst for both STORE and LOAD to generate MFENCE guaranteed, that prevents StoreLoad reordering.
Is it true, that memory_order_seq_cst for atomic Load or Store:
specifi Acquire-Release semantic - prevent: LoadLoad, LoadStore, StoreStore reordering around an atomic operation for regular, non-atomic memory accesses,
but prevent StoreLoad reordering around an atomic operation only for other atomic operations with the same memory_order_seq_cst?
The default is std::memory_order_seq_cst which establishes a single total ordering over all atomic operations tagged with this tag: all threads see the same order of such atomic operations and no memory_order_seq_cst atomic operations can be reordered.
The problem is that atomic operations on their own don't prevent reordering. We need an additional concept for atomics to do this. In C11, atomic operations take in another parameter called "memory ordering" which helps solve this problem.
No, standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of non-atomic around an atomic(seq_cst).
Even standard C++11 doesn't guarantee that memory_order_seq_cst prevents StoreLoad reordering of atomic(non-seq_cst) around an atomic(seq_cst).
Working Draft, Standard for Programming Language C++ 2016-07-12: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/n4606.pdf
memory_order_seq_cst operations - C++11 Standard:§ 29.3
3
There shall be a single total order S on all memory_order_seq_cst operations, consistent with the “happens before” order and modification orders for all affected locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M observes one of the following values: ...
memory_order_seq_cst hasn't sequential consistency and hasn't single total order, i.e. non-memory_order_seq_cst operations can be reordered with memory_order_seq_cst operations in allowed directions - C++11 Standard:§ 29.3
8 [ Note: memory_order_seq_cst ensures sequential consistency only for a program that is free of data races and uses exclusively memory_order_seq_cst operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications. — end note ]
Also C++-compilers allows such reorderings:
Usually - if in compilers seq_cst implemented as barrier after store, then:
STORE-C(relaxed); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); STORE-C(relaxed);
Screenshot of Asm generated by GCC 7.0 x86_64: https://godbolt.org/g/4yyeby
Also, theoretically possible - if in compilers seq_cst implemented as barrier before load, then:
STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); STORE-A(seq_cst);
STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); STORE-A(seq_cst);
Also on PowerPC can be such reordering:
STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); STORE-A(seq_cst);
If even atomic variables are allowed to be reordered across atomic(seq_cst), then non-atomic variables can also be reordered across atomic(seq_cst).
Screenshot of Asm generated by GCC 4.8 PowerPC: https://godbolt.org/g/BTQBr8
More details:
STORE-C(release); LOAD-B(seq_cst); can be reordered to LOAD-B(seq_cst); STORE-C(release);
Intel® 64 and IA-32 Architectures
8.2.3.4 Loads May Be Reordered with Earlier Stores to Different Locations
I.e. x86_64 code:
STORE-A(seq_cst);
STORE-C(release);
LOAD-B(seq_cst);
Can be reordered to:
STORE-A(seq_cst);
LOAD-B(seq_cst);
STORE-C(release);
This can happen because between c.store and b.load isn't mfence:
x86_64 - GCC 7.0: https://godbolt.org/g/dRGTaO
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
c.store(4, std::memory_order_release); // movl 4,[c];
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
}
It can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c;
a.store(2, std::memory_order_seq_cst); // movl 2,[a]; mfence;
int tmp = b.load(std::memory_order_seq_cst); // movl [b],[tmp];
c.store(4, std::memory_order_release); // movl 4,[c];
}
Also, Sequential Consistency in x86/x86_64 can be implemented in four ways: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
LOAD(without fence) andSTORE+MFENCELOAD(without fence) andLOCK XCHGMFENCE+LOADandSTORE(without fence)LOCK XADD( 0 ) andSTORE(without fence)
LOAD and (STORE+MFENCE)/(LOCK XCHG) - we reviewed aboveMFENCE+LOAD)/LOCK XADD and STORE - allow next reordering:STORE-A(seq_cst); LOAD-C(acq_rel); can be reordered to LOAD-C(acq_rel); STORE-A(seq_cst);
STORE-A(seq_cst); LOAD-C(relaxed); can be reordered to LOAD-C(relaxed); STORE-A(seq_cst);
Allows Store-Load reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Loads
I.e. PowerPC code:
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-C(relaxed);
LOAD-B(seq_cst);
Can be reordered to:
LOAD-C(relaxed);
STORE-A(seq_cst);
STORE-C(relaxed);
LOAD-B(seq_cst);
PowerPC - GCC 4.8 : https://godbolt.org/g/xowFD3
C++ & asm - code:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
a.store(2, std::memory_order_seq_cst); // li r9<-2; sync; stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
c.load(std::memory_order_relaxed); // lwz r9<-[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
By dividing a.store into two parts - it can be reordered to:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where load-from-memory lwz r9<-[c]; executed earlier than store-to-memory stw r9->[a];.
Also on PowerPC can be such reordering:
STORE-A(seq_cst); STORE-C(relaxed); can reordered to STORE-C(relaxed); STORE-A(seq_cst);
Because PowerPC has weak memory ordering model - allows Store-Store reordering (Table 5 - PowerPC): http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdf
Stores Reordered After Stores
I.e. on PowerPC operations Store can be reordered with other Store, then previous example can be reordered such as:
#include <atomic>
// Atomic load-store
void test() {
std::atomic<int> a, b, c; // addr: 20, 24, 28
//a.store(2, std::memory_order_seq_cst); // part-1: li r9<-2; sync;
c.load(std::memory_order_relaxed); // lwz r9<-[c];
c.store(4, std::memory_order_relaxed); // li r9<-4; stw r9->[c];
a.store(2, std::memory_order_seq_cst); // part-2: stw r9->[a];
int tmp = b.load(std::memory_order_seq_cst); // sync; lwz r9<-[b]; ... isync;
}
Where store-to-memory stw r9->[c]; executed earlier than store-to-memory stw r9->[a];.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With