Background
Im building a performance minded application and I came across a place where I have to use std::set. And it works like a charm. But then I started reading into the documentation (which you can find here) and the first thing I noticed was that
Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
The search, removal and insertions makes perfect sense to me as they are using some kind of a tree structure (because the documentation does not guarantee that it uses a Red-Black Tree). But the problem is, why should they?
I made an alternate solution to the std::set of my own and which uses a std::vector to store all the entries. Then I performed some basic benchmarks and here are the results,
Iterations: 100000
// Insertion
VectorSet : 211464us
std::set : 1272864us
// Find/ Lookup
VectorSet : 404264us
std::set : 551464us
// Removal
VectorSet : 254321964us
std::set : 834664us
// Traversal (iterating through all the elements (100000 elements; 100000 iterations)
VectorSet : 2464us
std::set : 4374174264us
According to these results, my implementation (VectorSet) outperformed std::set in both insertions and lookups, and traversal was over 1800000 times. But std::set outperformed my implementation VectorSet by a significant margin (which is understandable as we are dealing with vectors).
I can justify why removal is slower in VectorSet but faster on std::set and why std::set takes so long to iterate through the entries. Some things which affect the performance would be (correct me if im wrong),
For the vector being slower in removal,
Question
As what I can see, using a std::vector to store entries rather than a tree structure performs better in 3/4 instances. And even in the place where std::set performed better, it is still a small amount compared to iterating through. And in my opinion, developers use other aspects (lookups, insertions and iterations) more than removals. Even though these numbers are in the range of nanoseconds, the slightest improvement is better.
So my question is, why does std::set use a tree structure when they can use something like a vector to improve their efficiency?
Note: The container will be filled up with an average of 1000 elements and will be iterated repeatedly throughout the application lifetime and will directly affect the application runtime.
The standard set has some guarantees that you can't provide with your implementation:
If these don't matter to you, you're welcome to use a sorted vector and binary search. The standard provides std::sort, std::vector and std::binary_search, so you are good to go. The thing to notice is that each container has a specific use case and there is no "one size fits all" container.
The standard also provides unordered_set, which is a hash table. It is often criticized for being slow and causing cache-misses. Well, if that degrades your performance in a way you identified as a bottle-neck, go ahead and use some other hash-set implementation from other libraries. If you believe that you can do better, go ahead. Many projects build their own containers that are more specialized to that project. Could be faster, use less memory, could give different guarantees about iterator invalidation and/or complexity of operations. They all solve different problems.
Another point is that profiling and benchmarking is hard. Make sure you get it right. Performance comparison is usually done at scale (with varying number of input arguments). Picking a constant and relatively small size won't tell the whole story.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With