I have a gigantic data set which I've to store into a collection and need to find whethere any duplicates in there or not.
The data size could be more than 1 million. I know I can store more element in ArrayList comapre to a Map.
My questions are:
Map faster than searching in sorted ArrayList?HashMap is faster than TreeMap?n elements, which would be more efficient between a TreeMap and a HashMap implementation?1) Yes. Searching an ArrayList is O(n) on average. The performance of key lookups in a Map depends on the specific implementation. You could write an implementation of Map that is O(n) or worse if you really wanted to, but all the implementations in the standard library are faster than O(n).
2) Yes. HashMap is O(1) on average for simple key lookups. TreeMap is O(log(n)).
Class HashMap<K,V>
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
Class TreeMap<K,V>
This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations. Algorithms are adaptations of those in Cormen, Leiserson, and Rivest's Introduction to Algorithms.
3) The space requirements will be O(n) in both cases. I'd guess the TreeMap requires slightly more space, but only by a constant factor.
Map you're using.HashMap has a constant-time average lookup (O(1)), while a TreeMap's average lookup time is based on the depth of the tree (O(log(n))), so a HashMap is faster.It just did some benchmark testing on lookup performance between hashmap and sorted arraylist. The answer is hashmap is much faster as the size increase. I am talking about 10x, 20x, 30x faster. I did some test with 1 million of entries using sorted array list and hashmap and the array list get and add operation took seconds to complete, where as the hashmap get and put only takes around 50ms.
Here are something I found or observed:
For sorted arraylist, you would have to sort it first to be able to use the search efficiently (binarySearch for example). Practically you don't just have static list (meaning the list will change via add or remove). With that in mind you will need to change the add and the get methods to do "binary" operation to make it efficient (like binarySearch). So even with binary operation the add and get method will be slower and slower as the list grows.
Hashmap on the other hand does not show much of change in term of time in the put and get operation. The problem with Hashmap is memory overhead. If you can live with that then go with hashmap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With