I read the Intel manual and found there is a lock prefix for instructions, which can prevent processors writing to the same memory location at the same time. I am quite excited about it. I guess it could be used as hardware mutex. So I wrote a piece of code to have a shot. The result is quite frustrating. The lock does not support MOV or LEA instructions. The manual says LOCK only supports ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. What is more, if the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated.
I wonder why so many limitations, so many restrictions make LOCK seem useless. I cannot use it to guarantee a general write operation not have dirty data or other problems caused by parallelism.
E.g. I wrote code ++(*p) in C. p is pointer to a shared memory. The corresponding assembly is like:
movl    28(%esp), %eax
movl    (%eax), %eax
leal    1(%eax), %edx
movl    28(%esp), %eax
movl    %edx, (%eax)
I added "lock" before "movl" and "leal", but the processor complains "Invalid Instruction". :-( I guess the only way to make the write operations serialized is to use software mutex, right?
I certainly would not call lock useless. lock cmpxchg is the standard way to perform compare-and-swap, which is the basic building block of many synchronization algorithms.
Also, see fetch-and-add.
The purpose of lock is to make operations atomic, not serialized. In this way the CPU cannot be preempted before the operation takes effect.
The x86 processors are known for a hairy design with lots of features, lots of rules, and even more exceptions to all those rules. This is related to the long history to the family.
When compilers or people are using LOCK, they are always using it with all its limitations, often on data specially introduced to perform synchronization between threads, as opposed to application data that the algorithms eventually manipulate.  One then adapts the thread synchronization protocols to what LOCK can do for them, rather than vice versa.
The general type of instruction you seem to look for is called memory barriers. Indeed, x86 has several "modern" instructions from this family (MFENCE, LFENCE, SFENCE). They are full fence, load fence, and store fence, respectively. However, their importance in the instruction set is limited to SSE, because Intel guarantees serialization of writes on the traditional part of the instruction set, and that is pretty much the reason why this aged architecture is quite an easy target for multithreaded programming.
See also this answer for more info.
It is useful when, on a multiprocessor machine, there are two concurrent processes that are using the same data but they can't modify it simultaneously.
When one of the processes is modifying the data, it uses lock on the modifying instruction so that, when the second process tries to modify it, it has to wait for the first one to finish its job before being able to do its own on its turn.
I hope this will help a bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With