In general, is there a performance difference between these two pieces of code?
List<Integer> list1 = someStream1.sorted().collect(toList());
// vs.
List<Integer> list2 = someStream2.collect(toList());
list2.sort(Comparator.naturalOrder())
Variant 2 is obviously yucky and should be avoided, but I'm curious if there are any performance optimizations built into the mainstream (heh, mainstream) implementations of Stream that would result in a performance difference between these two.
I imagine that because the stream has strictly more information about the situation, it would have a better opportunity to optimize. E.g. I imagine if this had a findFirst() call tacked on, it would elide the sort, in favor of a min operation.
Both options should result in the same final result. But runtime characteristics could be different. What if the initial stream is a parallel one? Then option 1 would do a sort in parallel, whereas option 2 wouldn't do a "sequential" sort. The result should be the same, but the overall runtime resp. CPU load could be very much different then.
I would definitely prefer option 1 over 2: why create a list first, to then later sort it?!
Imagine for example you later want to collect into an immutable list. Then all code that follows your second pattern would break. Whereas code written using pattern 1 wouldn't be affected at all!
Of course, in the example here that shouldn't lead to issues, but what if the sort() happens in a slightly different place?!
Conceptually, streams are usually looked at as "transient" data that's being processed/manipulated, and collecting the stream conveys the notion you're done manipulating it.
While the second snippet should work, the first one would be the more idiomatic way of doing things.
In the first case the sorting happens in the call to collect. If the stream is already sorted this will be a no-op (the data will just pass through as-is). Might not make a big difference, but calling Collections.sort on an already sorted collection is  still O(n).
Also the first case benefits from parallel execution, as at least OpenJDK uses Arrays.parallelSort.
Apart from that the first line is cleaner, better to understand and less error prone when refactoring.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With