I'm clear on the usage of MemoryBarrier, but not on what happens behind the scenes in the runtime. Can anyone give a good explanation of what goes on?
In a really strong memory model, emitting fence instructions would be unnecessary. All memory accesses would execute in order and all stores would be globally visible.
Memory fences are needed because current common architectures do not provide a strong memory model - x86/x64 can for example reorder reads relative to writes. (A more thorough source is "Intel® 64 and IA-32 Architectures Software Developer’s Manual, 8.2.2 Memory Ordering in P6 and More Recent Processor Families"). As an example from the gazillions, Dekker's algorithm will fail on x86/x64 without fences.
Even if the JIT produces machine code in which instructions with memory loads and stores are carefully placed, its efforts are useless if the CPU then reorders these loads and stores - which it can, as long as the illusion of sequential consistency is maintained for the current context/thread.
Risking oversimplification: it may help to visualize the loads and stores resulting from the instruction stream as a thundering herd of wild animals. As they cross a narrow bridge (your CPU), you can never be sure about the order of the animals, since some of them will be slower, some faster, some overtake, some fall behind. If at the start - when you emit the machine code - you partition them into groups by putting infinitely long fences between them, you can at least be sure that group A comes before group B.
Fences ensure the ordering of reads and writes. Wording is not exact, but:
What the JIT emits for a full fence, depends on the (CPU) architecture and what memory ordering guarantees it provides. Since the JIT knows exactly what architecture it runs on, it can issue the proper instruction(s).
On my x64 machine, with .NET 4.0 RC, it happens to be a lock or.
            int a = 0;
00000000  sub         rsp,28h 
            Thread.MemoryBarrier();
00000004  lock or     dword ptr [rsp],0 
            Console.WriteLine(a);
00000009  mov         ecx,1 
0000000e  call        FFFFFFFFEFB45AB0 
00000013  nop 
00000014  add         rsp,28h 
00000018  ret 
Intel® 64 and IA-32 Architectures Software Developer’s Manual Chapter 8.1.2:
"...locked operations serialize all outstanding load and store operations (that is, wait for them to complete)." ..."Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchronize data written by one processor and read by another processor."
memory-ordering instructions address this specific need. MFENCE could have been used as full barrier in the above case (at least in theory - for one, locked operations might be faster, for two it might result in different behavior). MFENCE and its friends can be found in Chapter 8.2.5 "Strengthening or Weakening the Memory-Ordering Model". 
There are some more ways to serialize stores and loads, though they are either impractical or slower than the above methods:
In chapter 8.3 you can find full serializing instructions like CPUID. These serialize instruction flow as well: "Nothing can pass a serializing instruction and
a serializing instruction cannot pass any other instruction (read, write, instruction
fetch, or I/O)".
If you set up memory as strong uncached (UC), it will give you a strong memory model: no speculative or out-of order accesses will be allowed and all accesses will appear on the bus, therefore no need to emit an instruction. :) Of course, this will be a tad slower than usual.
...
So it depends on. If there was a computer with strong ordering guarantees, the JIT would probably emit nothing.
IA64 and other architectures have their own memory models - and thus guarantees of memory ordering (or lack of them) - and their own instructions/ways to deal with memory store/load ordering.
While doing lock-free concurrent programming one should care about program instructions reordering.
Program instructions reordering can occur at several stages:
Memory fences are the only way to ensure particular order of your program instructions. Basically, memory fence is a class of instructions which causes CPU to enforce an ordering constraint. Memory fences can be put into three categories:
In .NET Framework there are plenty of ways to emit fences: Interlock, Monitor, ReaderWriterLockSlim etc.
Thread.MemoryBarrier emits a full fence on both JIT compiler and processor level.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With