I have the following code that attempts to populate a Map from a List in a parallel fashion by going through the Java Stream API:
class NameId {...}
public class TestStream
{
    static public void main(String[] args)
    {
        List<NameId > niList = new ArrayList<>();
        niList.add(new NameId ("Alice", "123456"));
        niList.add(new NameId ("Bob", "223456"));
        niList.add(new NameId ("Carl", "323456"));
        Stream<NameId> niStream = niList.parallelStream();
        Map<String, String> niMap = niStream.collect(Collectors.toMap(NameId::getName, NameId::getId));
    }
}
How do I know if the map is populated using multiple threads, i.e. in parallel? Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? Is this a reasonable way to parallelize the population of a map? How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?
Any stream in Java can easily be transformed from sequential to parallel. We can achieve this by adding the parallel method to a sequential stream or by creating a stream using the parallelStream method of a collection: List<Integer> listOfNumbers = Arrays.
To create a parallel stream from another stream, use the parallel() method. To create a parallel stream from a Collection use the parallelStream() method.
A sequential stream is executed in a single thread running on one CPU core. The elements in the stream are processed sequentially in a single pass by the stream operations that are executed in the same thread. A parallel stream is executed by different threads, running on multiple CPU cores in a computer.
From the Javadoc:
The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are inserted into the Map in encounter order, using toConcurrentMap(Function, Function) may offer better parallel performance.
So it sounds like toConcurrentMap will parallelize the inserts.
The backing map is, by default, a HashMap. It just calls the version of toMap which takes a Supplier<M> and passes HashMap::new. (source: the source)
How do I know if the map is populated using multiple threads, i.e. in parallel?
It is hard to tell. If your code is going surprisingly slowly it could be because you are trying to use multiple threads.
Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap?
This would help make the parallel more efficient or put another way, a little less inefficient.
Is this a reasonable way to parallelize the population of a map?
You can do it as you suggest however you should note that the cost of starting a new thread is far more expensive than everything you are doing here so adding even one thread will slow it down a lot.
How do I know what the concrete map is backing the new niMap (e.g. is it HashMap)?
The documentation says you can't know for sure.  The last time I checked toMap was using HashMap and groupingBy used LinkedHashMap but you can't assume it is any particular Map.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With