One of the classes I defined is used in a set() to filter out equal objects. But it doesn't work as I expect it, so I obviously understand something wrong.
class Foo(object):
def __hash__(self):
return 7
x = set()
x.add(Foo())
assert len(x) == 1
x.add(Foo())
assert len(x) == 1 # AssertionError
I expect the set to consist of only one element, but it has two. Why is that?
Hash collisions are know to occur in sets (hash maps), no hash algorithm is good enough to have a unique hash for every item, or else it would take a long time to calculate. When a collision does occur, python falls back to checking the equality of the values with __eq__ to make sure they are not the same.
class Foo(object):
def __hash__(self):
return 7
def __eq__(self, other):
return True
>>> x = set()
>>> x.add(Foo())
>>> assert len(x) == 1
>>> x.add(Foo())
>>> assert len(x) == 1
>>>
This is a reason why you see frightening runtimes over here but note that you can expect O(1) amortized membership checks in sets, even though they have O(N) worst case (everything is a hash collision). The worst case is very very very unlikely to occur due to Python's smart implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With