Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Thread.sleep have no effect in Stream processing? [duplicate]

The following program is from OCP Study Guide by Jeanne Boyarsky and Scott Selikoff:

import java.util.*;

class WhaleDataCalculator {
    public int processRecord(int input) {
        try {
            Thread.sleep(10);
        } catch (InterruptedException e) {
            // Handle interrupted exception
        }
        return input + 1;
    }

    public void processAllData(List<Integer> data) {
        data.stream().map(a -> processRecord(a)).count();
    }

    public static void main(String[] args) {
        WhaleDataCalculator calculator = new WhaleDataCalculator();
        // Define the data
        List<Integer> data = new ArrayList<Integer>();
        for (int i = 0; i < 4000; i++)
            data.add(i);
        // Process the data
        long start = System.currentTimeMillis();
        calculator.processAllData(data);
        double time = (System.currentTimeMillis() - start) / 1000.0;
        // Report results
        System.out.println("\nTasks completed in: " + time + " seconds");
    }
}

The authors claim

Given that there are 4,000 records, and each record takes 10 milliseconds to process, by using a serial stream(), the results will take approximately 40 seconds to complete this task.

However, when I am running this in my system, it is taking between 0.006 seconds to 0.009 seconds on every run.

Where is the discrepancy?

like image 338
Arvind Kumar Avinash Avatar asked Oct 22 '25 19:10

Arvind Kumar Avinash


2 Answers

That's because of the use of count, which performs a trick in later Java versions.

Since you're only interested in the number of elements, count will try to get the size directly from the source, and will skip most other operations. This is possible because you are only doing a map and not, for example, a filter, so the number of elements will not change.

If you add peek(System.out::println), you'll see no output as well.

If you call forEach instead of count, running the code will probably take 40 seconds.

like image 115
MC Emperor Avatar answered Oct 25 '25 10:10

MC Emperor


Since Java 9 operation count() has been optimized in such so that if during the initialization of the stream (when stages of the pipeline are being chained) it turns out that there are no operations which can change the number of elements in the stream source allows evaluating the number of elements it contains, then count() does not trigger the execution of the pipeline, but instead asks the source "how many of these guys do you have?" and immediately returns the value.

So while running processAllData() a Stream instance would be constructed and right after that the method would terminate, because none of the elements would be actually processed.

Here's a quote from the documentation:

API Note:

An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source. In such cases no source elements will be traversed and no intermediate operations will be evaluated. Behavioral parameters with side-effects, which are strongly discouraged except for harmless cases such as debugging, may be affected.For example, consider the following stream:

 List<String> l = Arrays.asList("A", "B", "C", "D");
 long count = l.stream().peek(System.out::println).count();

The number of elements covered by the stream source, a List, is known and the intermediate operation, peek, does not inject into or remove elements from the stream (as may be the case for flatMap or filter operations). Thus the count is the size of the List and there is no need to execute the pipeline and, as a side-effect, print out the list elements.

And by the way, besides the trick behind this test, this case doesn't require the usage of Stream API. Since the value returned by count() is ignored and everything that is need is to fire a side-effect on each element of the list, then Iterable.forEach() can be used instead:

public void processAllData(List<Integer> data) {
    data.forEach(a -> processRecord(a));
}
like image 43
Alexander Ivanchenko Avatar answered Oct 25 '25 08:10

Alexander Ivanchenko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!