I am trying to find out what the internal load factor is for the Python sets. For dictionary which uses a hash table with a load factor of 0.66 (2/3) is. The number of buckets start at 8 and when the 6th key is inserted the number of buckets increases to 16 The table below shows the shift in buckets.
| bucket | shift | 
|---|---|
| 8 | 5 | 
| 16 | 10 | 
| 32 | 21 | 
| 64 | 42 | 
| 128 | 85 | 
This can be seen with de following Python code where the size of a dictionary and sets is shows with the getsizeof method:
import sys
d = {}
s = set()
for x in range(25):
    d[x] = 1
    s.add(x)
    print(len(d), sys.getsizeof(d), sys.getsizeof(s))
| # of elements | memory used for dict | memory used for sets | 
|---|---|---|
| 1 | 232 | 216 | 
| 2 | 232 | 216 | 
| 3 | 232 | 216 | 
| 4 | 232 | 216 | 
| 5 | 232 | 728 | 
| 6 | 360 | 728 | 
| 7 | 360 | 728 | 
| 8 | 360 | 728 | 
| 9 | 360 | 728 | 
| 10 | 360 | 728 | 
| 11 | 640 | 728 | 
| 12 | 640 | 728 | 
| 13 | 640 | 728 | 
| 14 | 640 | 728 | 
| 15 | 640 | 728 | 
| 16 | 640 | 728 | 
| 17 | 640 | 728 | 
| 18 | 640 | 728 | 
| 19 | 640 | 2264 | 
| 20 | 640 | 2264 | 
| 21 | 640 | 2264 | 
| 22 | 1176 | 2264 | 
| 23 | 1176 | 2264 | 
| 24 | 1176 | 2264 | 
| 25 | 1176 | 2264 | 
The above table shows that the shift in the buckets correct is for the dictionary, but not for the sets. The memory in sets is different.
I am trying to find out what the load factor is for a set. Is that also 2/3? Or am I doing something wrong with the code?
Currently, it's about 3/5. See the source:
if ((size_t)so->fill*5 < mask*3)
    return 0;
return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);
fill is the number of occupied table cells (including "deleted entry" markers), and mask is 1 less than the total table capacity.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With