I have a need to use a Cuckoo Filter but I'm not sure how to size it. I found a calculator for Bloom Filters (https://hur.st/bloomfilter/) for which I can calculate in a few ways. I can specify the approximate number of items and the desired false positive rate and it will tell me the size and number of hash functions. I'm looking for something similar for a Cuckoo Filter but I haven't found one or other instructions on how to find those numbers.
I'm looking at a Node or Python implementation. It seems the parameters to define the filter are:
I want to specify the number of elements (eg 100k) and an FPR (eg .1%) to find out the parameters needed.
Based on information in the original paper (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf), you need to choose bucket size first, which allows you to determine fingerprint size and capacity. Bucket size is based on the desired false positive rate:
"the space-optimal bucket size depends on the target false positive rate ε: when ε > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when ε decreases to 0.00001 < ε ≤ 0.002, four entries per bucket minimizes space"1
For your suggested 0.1%, that would mean a bucket size of 4.
The fingerprint size depends on bucket size and false positive rate.
"To retain the target false positive rate ε, the filter ensures 2b/2f ≤ ε, thus the minimal fingerprint size required is approximately: f ≥ log2(1/ε) + log2(2b)"1
With b bucket size, an error rate of 0.1% would require ~10 + 3 = 13 bits for a fingerprint.
Finally, capacity is determined by the number of elements divided by the maximum allowable load, which is determined by bucket size.
"With k = 2 hash functions, the load factor α is 50% when the bucket size b = 1 (i.e., the hash table is directly mapped), but increases to 84%, 95% or 98% respectively using bucket size b = 2, 4 or 8."1
So 100k / 0.95 gives you a capacity of 106k.
I don't know of any one formula to give you these answers, since they depend on each other, but hopefully each of those steps makes sense.
For 100k elements and 0.1% FPR, that's:
1 Bin Fan, Dave G. Andersen , Michael Kaminsky , Michael D. Mitzenmacher, Cuckoo Filter: Practically Better Than Bloom, Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, December 02-05, 2014, Sydney, Australia [doi>10.1145/2674005.2674994]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With