I am trying to understand the intrinsics of java volatile and its semantics, and its transaltion to the underlying architecture and its instructions. If we consider the following blogs and resourses
fences generated for volatile, What gets generated for read/write of volatile and Stack overflow question on fences
here is what I gather:
WHat I am struggling to understand is this: Java does not emit LFENCE on x86 i.e. read of volatile does not cause LFENCE.... I know that memory ordering of x86 prevent reording of loads with lods/stored, so second bullet point is taken care of. However, I would assume that in order for the state to be visible by this thread, LFENCE instruction should be issued to guarantee that all LOAD buffers are drained before the next instruction after the fence is executed (as per Intel manual). I understand there is cahce coherence protocol on x86, but volatile read should still drain any LOADs in the buffers, no?
On x86, the buffers are pinned to the cache line. If the cache line is lost, the value in the buffer isn't used. So there's no need to fence or drain the buffers; the value they contain must be current because another core can't modify the data without first invalidating the cache line.
The X86 provides TSO. So, on a hardware level, the following barriers you get for free [LoadLoad][LoadStore][StoreStore]. The only one missing is the [StoreLoad].
A load has acquire semantics
r1=X
[LoadLoad]
[LoadStore]
A store has release semantics
[LoadStore]
[StoreStore]
Y=r2
If you would do a store followed by a load you end up with this:
[LoadStore]
[StoreStore]
Y=r2
r1=X
[LoadLoad]
[LoadStore]
The issue is that the load and store can still be reordered and hence it isn't sequential consistent; and this is mandatory for the Java Memory model. They only way to prevent this is with a [StoreLoad].
[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]
r1=X
[LoadLoad]
[LoadStore]
And the most logical place would be to add it to the write since normally reads are more frequent than writes. So the write would become:
[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]
Because the X86 provides TSO, the following fences can be no-ops:
[LoadLoad][LoadStore][StoreStore]
So the only one relevant is the [StoreLoad] and this can be accomplished by an MFENCE or a lock addl %(RSP),0
The LFENCE and the SFENCE are not relevant for this situation. The LFENCE and SFENCE are for weakly ordered loads and stores (e.g. those of SSE).
What the [StoreLoad] does on the X86 is to stop executing loads, till the store buffer has been drained. This will make sure that the load is globally visible (so read from memory/cache) AFTER the store has become globally visible (has left the store buffer and entered the L1d).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With