Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the default hash of user defined classes?

The docs incorrectly claim that

Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their id()

Although I recall this being correct once, such objects hashing equal to their id is apparently not true in current versions of python (v2.7.10, v3.5.0).

>>> class A:
...     pass
... 
>>> a = A()
>>> hash(a)
-9223372036578022804
>>> id(a)
4428048072

In another part of the docs it's said that the hash is derived from the id. When/why did the implementation change, and how is the number returned by hash "derived from" the id now?

like image 294
wim Avatar asked Oct 14 '25 15:10

wim


1 Answers

The relevant function appears to be:

Py_hash_t
_Py_HashPointer(void *p)
{
    Py_hash_t x;
    size_t y = (size_t)p;
    /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
       excessive hash collisions for dicts and sets */
    y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
    x = (Py_hash_t)y;
    if (x == -1)
        x = -2;
    return x;
}

(that code comes from here, and is then used to be the tp_hash slot in type here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id) directly. Indeed, the commit that introduced that change to the function is here, and states that the reason for the change is:

Issue #5186: Reduce hash collisions for objects with no hash method by rotating the object pointer by 4 bits to the right.

which refers to this issue, which explains more why the change was made.

like image 67
circular-ruin Avatar answered Oct 17 '25 06:10

circular-ruin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!