I want to know how the mutex (or the other locking implementation) implements the wait functionallity for the lock function. I mean, is that a cpu instruction that queue the mutex.lock calls, is that implemented in OS only or what?
In the tests I did, I think this wait functionallity is done only in OS layer and is made creating some sort of spinning, checking is the lock is available to proceed and if not putting the thread to sleep. Is that right?
Thank you very much.
It's platform dependent. Typically there is a spin lock portion which falls back to blocking in the operating system if a fixed spin limit is reached.
The spinlock is typically implemented by reading a memory address that contains a particular value when the mutex is unlocked. If it is seen as unlocked, an attempt is made to atomically change that value from the unlocked value to the locked value. If that atomic exchange succeeds, the mutex is locked. Typically the number of spins is counted and if a limit is reached, we switch to blocking in the OS.
The block in the OS is typically implemented much the same way except that instead of sleeping, the thread adds itself to a list of things waiting for the lock. When a thread releases the lock, it checks if anything is waiting in the OS and, if so, unblocks it. This causes the OS to schedule that thread. It typically then tries to perform the same atomic exchange that a spinlock would try, blocking again in the OS if it fails.
In pseudo-code:
Lock:
Unlock:
Note that the OS has to implement some mechanism to avoid the race where the request to unblock any threads waiting in the OS happens just before a thread manages to block. The method varies from OS to OS. For example, Linux has something called a "futex", which is essentially a way to implement steps 4, 5, and 6 of the locking pseudo-code atomically.
Caution: If you attempt to implement this algorithm in code, understand that you will likely produce a toy that will not perform nearly as well as a proper implementation. You need deep, platform-specific knowledge to avoid nasty performance-sucking traps you can fall into. For example, it's easy to code a spinlock such that it steals core execution resources from another thread sharing the physical core in CPUs with hyper-threading. And it's easy to code the successful exchange so that the CPU's branch prediction predicts it will fail and you take a horrible branch misprediction penalty when you acquire the lock.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With