I'm playing around with lockless C++ programming, and I came up with a method for safely(?) transferring ownership of a non-thread-safe data structure (std::unordered_map<int, float> in my trivial example) from a producer thread to a consumer thread using only a shared std::atomic<bool> for synchronization. It seems like it ought to be thread-safe, but intuition is famously unreliable in lockless programming, so I'm looking for any reasons why this approach might not be thread-safe in practice.
The design (as shown in the sketch code, below) is this: a consumer thread periodically calls the ConsumerConsumeUpdates() method of my KeyedParameterBridge object, and then iterates over the contents of the returned std::unordered_map. It can do this safely because the returned std::unordered_map (_consumerThreadUpdatesTable) is accessible only to the consumer thread; the producer thread never touches it.
Similarly, the producer thread can call ProducerPutUpdate() as many times as it likes to populate its own private table (_producerThreadUpdatesTable), and then it can call ProducerPushUpdatesToConsumer() to attempt to move the contents of its private table into the shared-access table (_bridgeUpdatesTable). The producer thread also calls ProducerPushUpdatesToConsumer() periodically, just in case a previous call was unable to push the updates successfully.
Access to _bridgeUpdatesTable is strictly controlled by the state of the _consumerOwnersBridgeUpdatesTable variable, which is a std::atomic<bool>. If that variable is true, then only the consumer thread may access _bridgeUpdatesTable, or if that variable is false, then only the producer thread may access _bridgeUpdatesTable.
The reason (I think) this should be thread-safe is that each thread will only relinquish access of _bridgeUpdatesTable to the other thread; it will never unilaterally seize access for itself.
For example, ProducerPushUpdatesToConsumer() will set _consumerOwnsBridgeUpdatesTable = true; when it's done, but it will never set it to false, so the consumer thread can rely on the fact that if it sees _consumerOwnsBridgeUpdatesTable.load() == true, that state will remain valid indefinitely -- and therefore it has a guarantee that _bridgeUpdatesTable will never be modified while ConsumerConsumeUpdates() is accessing it.
Similarly, ConsumerConsumeUpdates() will never set _consumerOwnsBridgeUpdatesTable = true;, so the producer thread can rely on the fact that if _consumerOwnsBridgeUpdatesTable.load() == false, that state will remain valid indefinitely -- and therefore the producer thread has a guarantee that _bridgeUpdatesTable will never be modified while ProducerPushUpdatesToConsumer() is accessing it.
Example code sketch is below. The code seems to work fine, but buggy lockless code often does. Am I missing any gotchas?
#include <atomic>
#include <unordered_map>
// This object lets us safely transport sets of key-value pair updates from a single ProducerThread to a single ConsumerThread without requiring any Mutex locking
class KeyedParameterBridge
{
public:
KeyedParameterBridge() : _consumerOwnsBridgeUpdatesTable(false)
{
// empty
}
/** Called by the producer thread; places a pending update into the producer-thread's local pending-update-table. */
void ProducerPutUpdate(int parameterID, float parameterValue) {_producerThreadUpdatesTable[parameterID] = parameterValue;}
/** Called by the producer thread; attempts to push all of the updates currently placed (via ProducerPutUpdate()) to
* the _bridgeUpdatesTable where the consumer thread will be able to access them
* @returns true if we were able to push some updates in this call, or false if didn't push any updates
*/
bool ProducerPushUpdatesToConsumer()
{
if (_producerThreadUpdatesTable.size() == 0) return false; // no updates to push
if (_consumerOwnsBridgeUpdatesTable.load()) return false; // can't safely do anything yet; we'll have to wait until the ConsumerThread consumes _bridgeUpdatesTable's current contents
_bridgeUpdatesTable = std::move(_producerThreadUpdatesTable); // O(1) pointer-move from _producerThreadUpdatesTable to _bridgeUpdatesTable
_consumerOwnsBridgeUpdatesTable = true; // so the ConsumerThread will know the _bridgeUpdatesTable is safe for him to access now
_producerThreadUpdatesTable.clear(); // make sure our producer-side table is in a known-empty state
return true;
}
/** Called periodically by the consumer thread; returns any available updates for the consumer thread to apply to its local data structure */
const std::unordered_map<int, float> & ConsumerConsumeUpdates()
{
_consumerThreadUpdatesTable.clear(); // make sure our consumer-side table is in a known-empty state
if (_consumerOwnsBridgeUpdatesTable.load()) // see if we have access
{
_consumerThreadUpdatesTable = std::move(_bridgeUpdatesTable); // O(1) pointer-move from _bridgeUpdatesTable to _consumerThreadUpdatesTable
_consumerOwnsBridgeUpdatesTable = false; // so the ProducerThread will know we already grabbed the update and he can push again now
}
return _consumerThreadUpdatesTable;
}
private:
std::atomic<bool> _consumerOwnsBridgeUpdatesTable; // true iff the _bridgeUpdatesTable currently contains data for the consumer thread to consume
std::unordered_map<int, float> _producerThreadUpdatesTable; // populated by the producer-thread, never accessed by the consumer thread
std::unordered_map<int, float> _bridgeUpdatesTable; // access to this table is governed by the state of (_consumerOwnsBridgeUpdatesTable)
std::unordered_map<int, float> _consumerThreadUpdatesTable; // read by the consumer-thread, never accessed by the producer thread
};
This looks safe to me. The loads only need to be acquire, the stores only need to be release. The default is seq_cst which is more than you need.
The synchronization is the same as with a spinlock (or other mutex1): acquire operation at the top of a critical section, release at the bottom. Since it's cooperative, your acquire is just a load instead of an atomic RMW. Your release is giving the lock to the other thread, instead of just unlocking it.
All accesses to non-atomic shared data happen inside the critical sections you've created, so there's a happens-before between producer and consumer.
This idea looks interesting; it's different from a readers/writers lock in that the reader can unlock right away after making a copy (or actually moveing to claim the data), and you only have 1 reader. It's also different from a SeqLock, where readers are truly read-only; they check a sequence number before/after memcpy of the payload, and check if tearing could have happened which doesn't work with pointer-based data structures.
This is closest to RCU, Read-Copy-Update, where producers copy the existing data structure, modify it, and publish it by updating a pointer. Readers a truly read-only, only needing a consume load, so it's ideal for scalability to many readers and infrequent changes (e.g. Linux kernel config stuff which is usually only changed manually at boot.) Reclaiming (deleting/freeing) old copies of the data are the major challenge with RCU, requiring a lot of infrastructure to keep the fast-path of readers fast. Your strategy instead just hands ownership to a single reader, leaving the producer to create a new data-structure from scratch. And critically, giving the reader ownership and thus responsibility for deleting it, solving the deallocation problem.
You could perhaps get away with having a std::atomic< std::unordered_map<int, float>* > (atomic pointer-to-unordered_map) and have the producer do auto old = shared.exchange( newly_allocated_map, acq_rel );. (If old is non-null, delete it; we updated before the consumer saw that version.) And the consumer can do auto hash_ptr = shared.exchange(nullptr, acquire);. If it gets nullptr, the producer hasn't produced a new hash table yet. If it's non-null, delete the old hash table and use this one.
This is not necessarily better: your way avoids any atomic-RMW operations, which is very good on x86 for example. (Every atomic RMW is a full memory barrier on x86, blocking memory-level parallelism across it for whatever code is calling this.)
Perhaps my idea could be modified to have the producer only write when the pointer is null, and the consumer only write when non-null, so its non-nullness serves the same purpose as your flag. Then it's pretty much equivalent to your idea: you use a separate flag and std::move to copy the control-block of the std::unordered_map (the pointer and size info), vs. my atomic pointer idea having a pointer to the control blocks, which is an extra level of indirection to actually use the data. (But only while passing it between producer and consumer; that's probably negligible.)
Footnote 1: You could have done this with a std::mutex and maybe a plain bool flag to track state if .size() on the map isn't sufficient, but this is more efficient. At least assuming you always have work to do and aren't just spin-waiting, effectively calling try_lock. A mutex can fall back to sleeping, giving up the CPU and having the OS wake it when the mutex is unlocked. You could perhaps roll your own of that with .wait() and .notify_one(), but notify is a system-call so to keep it lightweight in the normal case you'd want a way to skip the notify if the other thread isn't .wait()ing. (Good std::mutex implementations do this.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With