Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Checking if a 800million entry hashmap contains an element

I have a hashmap containing ~800 million entries (strings) inside it. It is actually serialized into a file that I have already into a hashmap.

Now I have another huge list of strings which is around ~35million in size. I need to read these 35million strings one by one and format them in a particular manner that is a separate method by itself (it is a very light processing).

Then I need to check if the result of the formatting done on one string from the list is already present in the hashMap or not.

What is the most efficient way to do this in Java?

like image 209
London guy Avatar asked Feb 01 '26 05:02

London guy


1 Answers

You can try using a Bloom filter which is

a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive retrieval results are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set".

(Quote from wikipedia)

Google Guava provides an implementation in java.

like image 66
zagyi Avatar answered Feb 03 '26 20:02

zagyi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!