I have an unordered_map storing large objects:
std::unordered_map<uint64_t, LargeObject> map;
LargeObject is an array of POD/there are no pointer members.
I am receiving a lot of LargeObjects, so I only want to insert them in the map if they don't already exist (there will be a lot of times they do already exist, as I am listening to multiple sources).
Performance matters, as I am receiving millions.
There seem to be several ways of doing this:
map[key] = largeObject;
or:
map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject);
or:
if(map.count(key) == 0)
{
    map[key] = largeObject;
}
or:
auto iter = map.find(key);
if(iter == map.end())
{
    map[key] = largeObject;
}
There maybe more I haven't included.
Which is the most efficient technique to insert when the map's value is a large object?
This is what try_emplace is for. The API is set up in such a way as to avoid constructing nodes, or even the value, unless it's strictly necessary. It comes form N4279.
In your case, that would be:
auto [it, success] = map.try_emplace(key, largeObject);
Each of the four options in the OP has issues:
map[key] = largeObject doesn't actually do what you're asking for, it would overwrite the existing item. And even if it wasn't there, it requires default construction and copy assignment.
The approaches with count and find both require two lookups.
map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject)); is a single lookup but requires constructing the large object, and the pair, unconditionally.
Not mentioned in the OP is another option: map.emplace(key, largeObject); This has the issue that it's actually under-specified whether or not the pair is created in the case that the key exists. It does on some implementations. The motivation for try_emplace was to properly specify the API so that the pair definitely does not get created if the key already exists.
If "Performance matters", and the map is large don't use std::unordered_map, since it is known to be quite slow; and you are barking up the wrong tree w.r.t. insertion.
Here's a by-now-old SO question suggesting several alternatives:
Super high performance C/C++ hash map (table, dictionary)
Of course, you will need to choose something that fits your workload best.
Additionally - if your objects are large, it is probably better to avoid a map structure templated over the actual large object - since that usually guarantees you will at least have to copy it when inserting into the map; and you don't want to have to copy large objects. Use some kind of reference type (e.g. a pointer, or some numeric index usable to get the address of the large object), and have your hasher dereference that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With