std::unordered_set::erase complexity

Question

I don't understand, why has erase method of std::unordered_set O(N) complexity in worst case (where N is number of elements)? Standard (n4296) says what all three versions of erase method have O(a.size()) complexity in worst case (a is container), and invalidated only iterators which point to erased elements, but not all iterators (i.e. rehashing doesn't take place). This is true even for a version which takes one iterator argument and have constant complexity in average case. I think it is because that erase version returns an iterator to the next element, and this requires finding the first non-empty bucket after erased element, and it gives O(a.bucket_count()) complexity, but not O(a.size())!. Number of elements is not in direct ratio to the number of buckets. For example:

#include <iostream>
#include <unordered_set>
using namespace std;

int main()
{
    std::unordered_set<int> aSet;
    aSet.insert ({125, 126});
    aSet.rehash (1000);
    cout << aSet.size() << endl;
    cout << aSet.bucket_count() << endl;
}

Output is

Size: 2
Bucket count: 1031

The size of a container is only 2 and bucket_count is 1031. When erase method will be called, it will seek next non-empty bucket, which can be placed about the end, i.e. complexity is O(a.bucket_count()), but not O(a.size()). What is reason for which Standard gives O(a.size()) complexity?

T.C. · Accepted Answer

This is true even for a version which takes one iterator argument and have constant complexity in average case.

The unordered associative containers have forward iterators - they are allowed to be implemented via singly linked lists.

Erasing a node involves relinking the node before it to the node after it. Finding the node before the one the iterator points to can be worst-case O(N) in a singly-linked-list-based implementation, because you'll basically have to traverse the bucket (which can contain every element in the container in the fully colliding case).

Dietmar Kühl · Answer

The most obvious reason is that a degenerate hashing function may yield the same value for all elements. As a result they'd all be sorted into the same bucket. Although it is unlikely, the same could happen even with a reasonably good hashing function, especially after mapping the values to buckets afain. Since there is no reasonable specification for the quality of the hashing function the standard can't mandate a better time complexity.

std::unordered_set::erase complexity

Tags:

c++

c++11

stl

containers

unordered

Denis

2 Answers

T.C.

Dietmar Kühl

Recent Activity

Donate For Us

std::unordered_set::erase complexity

Tags:

c++

c++11

stl

containers

unordered

Denis

2 Answers

T.C.

Dietmar Kühl

Related questions

Recent Activity

Donate For Us