I have a hashmap containing ~800 million entries (strings) inside it. It is actually serialized into a file that I have already into a hashmap.
Now I have another huge list of strings which is around ~35million in size. I need to read these 35million strings one by one and format them in a particular manner that is a separate method by itself (it is a very light processing).
Then I need to check if the result of the formatting done on one string from the list is already present in the hashMap or not.
What is the most efficient way to do this in Java?
You can try using a Bloom filter which is
a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive retrieval results are possible, but false negatives are not; i.e. a query returns either "inside set (may be wrong)" or "definitely not in set".
(Quote from wikipedia)
Google Guava provides an implementation in java.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With