I wanted to get a list of unique elements from a list with duplicate elements and the order of elements occurring in the list should be maintained.
To achieve this, I could have written an algorithm like:
private ArrayList<T> getUnique(ArrayList<T> list)
{
// maintain a hashmap of numbers and a uniqueList to be returned(ArrayList<T>)
// Add element in result list and the hashmap if the element isn't already present in the hashmap, else just add in the hashmap
HashMap<T, Boolean> map = new HashMap<>();
ArrayList<T> uniqueList = new ArrayList<>();
for (T t: list)
{
if (map.get(t) == null)
{
// t wasn't present so, adding them in map as well as in the list
map.put(t, true);
uniqueList.add(t);
}
}
return uniqueList;
}
This algorithm will take O(n) time with O(n) extra space (for HashMap).
Or simply, I could have used the below syntax:
Set<T> set = new LinkedHashSet<>(list);
The above syntax in Java is used for getting a set of unique elements from list with occurrence order of elements same as that of list.
Then convert this set in a list. (ArrayList<T> uniqueList = new ArrayList<>(set);)
I am assuming the time complexity here is also O(n). I wanted to know what algorithm Java uses for it.
I see the class is named LinkedHashSet, so I thought that they might be using some LinkedList concepts for achieving this, So I looked into the source code, and found these stuff:
LinkedHashSet.java, the constructor looks like:
143: public LinkedHashSet(Collection<? extends T> c)
144: {
145: super(c);
146: }
here is the source.
HashSet, I found:
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
addAll method, I found it in AbstractCollection class(which is the grandparent of HashSet class), the definition of the function is:
public boolean addAll(Collection<? extends E> c) {
boolean modified = false;
for (E e : c)
if (add(e))
modified = true;
return modified;
}
This is calling add which is like:
public boolean add(E e) {
throw new UnsupportedOperationException();
}
here.
I couldn't understand this. What algorithm do they use for this task?
Base on the source code of LinkedHashSet, HashSet, LinkedHashMap.
When constructing a LinkedHashSet which extends HashSet with other collection(LinkedHashSet.java line 143),
public LinkedHashSet(Collection<? extends T> c)
{
super(c);
}
Which will call (HashSet.java line 136):
public HashSet(Collection<? extends T> c)
{
this(Math.max(2 * c.size(), HashMap.DEFAULT_CAPACITY));
addAll(c);
}
and then call (HashSet.java line 122):
public HashSet(int initialCapacity, float loadFactor)
{
map = init(initialCapacity, loadFactor);
}
Since the init method is overrided in LinkedHashSet
HashMap<T, String> init(int capacity, float load)
{
return new LinkedHashMap<T, String>(capacity, load);
}
The backing map is a LinkedHashMap.
According to java doc of LinkedHashMap
This class provides all of the optional Map operations, and permits null elements. Like HashMap, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashMap, due to the added expense of maintaining the linked list, with one exception: Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.
And the add method of HashSet is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Hence the average time complexity is O(n) for the construction. For the algorithm, I think you can read the code of LinkedHashMap for details. Further reading How is the internal implementation of LinkedHashMap different from HashMap implementation?, HashSet vs LinkedHashSet
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With