Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't the C++ STL implement more efficient std::set implementation?

Background
Im building a performance minded application and I came across a place where I have to use std::set. And it works like a charm. But then I started reading into the documentation (which you can find here) and the first thing I noticed was that

Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.

The search, removal and insertions makes perfect sense to me as they are using some kind of a tree structure (because the documentation does not guarantee that it uses a Red-Black Tree). But the problem is, why should they?

I made an alternate solution to the std::set of my own and which uses a std::vector to store all the entries. Then I performed some basic benchmarks and here are the results,

Iterations: 100000

// Insertion
VectorSet :   211464us
std::set  :  1272864us

// Find/ Lookup
VectorSet : 404264us
std::set  : 551464us

// Removal
VectorSet : 254321964us
std::set  :    834664us

// Traversal (iterating through all the elements (100000 elements; 100000 iterations)
VectorSet :       2464us
std::set  : 4374174264us

According to these results, my implementation (VectorSet) outperformed std::set in both insertions and lookups, and traversal was over 1800000 times. But std::set outperformed my implementation VectorSet by a significant margin (which is understandable as we are dealing with vectors).

I can justify why removal is slower in VectorSet but faster on std::set and why std::set takes so long to iterate through the entries. Some things which affect the performance would be (correct me if im wrong),

  • Cache misses.
  • Pointer dereferences.
  • Better locality.

For the vector being slower in removal,

  • Finding the element.
  • Removal of the element.
  • Possible resize.

Question
As what I can see, using a std::vector to store entries rather than a tree structure performs better in 3/4 instances. And even in the place where std::set performed better, it is still a small amount compared to iterating through. And in my opinion, developers use other aspects (lookups, insertions and iterations) more than removals. Even though these numbers are in the range of nanoseconds, the slightest improvement is better.

So my question is, why does std::set use a tree structure when they can use something like a vector to improve their efficiency?

Note: The container will be filled up with an average of 1000 elements and will be iterated repeatedly throughout the application lifetime and will directly affect the application runtime.

like image 310
D-RAJ Avatar asked Mar 29 '26 01:03

D-RAJ


1 Answers

The standard set has some guarantees that you can't provide with your implementation:

  • inserting/erasing doesn't invalidate other iterators/references/pointers.
  • inserting/erasing elements has (at most) logarithmic complexity, as opposed to linear in your implementation.

If these don't matter to you, you're welcome to use a sorted vector and binary search. The standard provides std::sort, std::vector and std::binary_search, so you are good to go. The thing to notice is that each container has a specific use case and there is no "one size fits all" container.

The standard also provides unordered_set, which is a hash table. It is often criticized for being slow and causing cache-misses. Well, if that degrades your performance in a way you identified as a bottle-neck, go ahead and use some other hash-set implementation from other libraries. If you believe that you can do better, go ahead. Many projects build their own containers that are more specialized to that project. Could be faster, use less memory, could give different guarantees about iterator invalidation and/or complexity of operations. They all solve different problems.


Another point is that profiling and benchmarking is hard. Make sure you get it right. Performance comparison is usually done at scale (with varying number of input arguments). Picking a constant and relatively small size won't tell the whole story.

like image 70
Ayxan Haqverdili Avatar answered Apr 01 '26 10:04

Ayxan Haqverdili