Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Stream is more efficient?

I am trying to digest Stream package and seems like it's very difficult for me to understand.

I was reading Stream package documentation and at a point I tried to implement it to learn by doing. This is the text I have read:

Intermediate operations return a new stream. They are always lazy; executing an intermediate operation such as filter() does not actually perform any filtering, but instead creates a new stream that, when traversed, contains the elements of the initial stream that match the given predicate. Traversal of the pipeline source does not begin until the terminal operation of the pipeline is executed.

I understand this much that they provide a new Stream, so my first question is, Is creating a stream without traversing a heavy operation?

Now, since intermediate operations are lazy and terminal operations are eager and also streams are meant to be efficient than old programming standards of if-else and more readable.

Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state. Laziness also allows avoiding examining all the data when it is not necessary; for operations such as "find the first string longer than 1000 characters", it is only necessary to examine just enough strings to find one that has the desired characteristics without examining all of the strings available from the source. (This behavior becomes even more important when the input stream is infinite and not merely large.)

To demonstrate this, I started implemented a small program to understand the concept. Here is the program:

        List<String> stringList = new ArrayList<>();
        for (int i = 0; i < 10000; i++) {
            stringList.add("String" + i);
        }
        long   start  = System.currentTimeMillis();
        Stream stream = stringList.stream().filter(s -> s.contains("99"));
        long   midEnd = System.currentTimeMillis();
        System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
        System.out.println(stream.findFirst().get());
        long end = System.currentTimeMillis();
        System.out.println("Whole time in millis: " + (end - start));
        System.out.println("Time in millis for Terminal operation: " + (end - midEnd));

        start = System.currentTimeMillis();
        for (String ss1 : stringList) {
            if (ss1.contains("99")) {
                System.out.println(ss1);
                break;
            }
        }
        end = System.currentTimeMillis();
        System.out.println("Time in millis with old standard: " + (end - start));

I have executed this program many times and each time it has proved me that, creating a new stream from intermediate operations is the heavy task to do. Terminal operations do take very little time as compared to intermediate operations.

And overall, old if-else pattern is way more efficient than streams. So, again more questions here:

  1. Did I misunderstand something?
  2. If I understand correct, why and when to use streams?
  3. If I am doing or understanding anything wrong, can you please help clarify my concepts Package java.util.stream?

Actual Numbers:

Try 1:

Time is millis before applying terminal operation: 73
String99
Whole time in millis: 76
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

Try 2:

Time is millis before applying terminal operation: 56
String99
Whole time in millis: 59
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

Try 3:

Time is millis before applying terminal operation: 69
String99
Whole time in millis: 72
Time in millis for Terminal operation: 3
String99
Time in millis with old standard: 0

These are my machine details if this help:

Memory: 11.6 GiB
Processor: Intel® Core™ i7-3632QM CPU @ 2.20GHz × 8 
OS-Type: 64-bit
like image 215
java8.being Avatar asked Dec 12 '25 08:12

java8.being


1 Answers

One of the rationales for the Stream api is that it eliminates the inherent assumption of the for loop, that all iteration happens in the same way. When you use an iterator-based for loop, you are hard-coding the iteration logic to always iterate sequentially. Consider the question, "what if I wanted to change the implementation of the 'for' loop with something more efficient?"

The Stream api addresses that--it abstracts the notion of iteration and allows other ways of processing multiple data points to be considered -- iterate serially vs. in parallel, add optimizations if it is known that the data is unordered, etc.

Consider your example--although you can't change the implementation of the for loop, you can change the implementation of the Stream to suit different situations. For example, if you have more cpu-intensive operations to do on each task, you might choose a parallel Stream. Here's an example with 10 ms delays simulate more complex processing, done in parallel, with very different results:

    List<String> stringList = new ArrayList<>();
    for (int i = 0; i < 10000; i++) {
        stringList.add("String" + i);
    }
    long   start  = System.currentTimeMillis();
    Stream stream = stringList.parallelStream().filter(s -> {
        try {
            Thread.sleep(10);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return s.contains("99" );});
    long   midEnd = System.currentTimeMillis();
    System.out.println("Time is millis before applying terminal operation: " + (midEnd - start));
    System.out.println(stream.findAny().get());
    long end = System.currentTimeMillis();
    System.out.println("Whole time in millis: " + (end - start));
    System.out.println("Time in millis for Terminal operation: " + (end - midEnd));

    start = System.currentTimeMillis();
    for (String ss1 : stringList) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        if (ss1.contains("99")) {
            System.out.println(ss1);
            break;
        }
    }
    end = System.currentTimeMillis();
    System.out.println("Time in millis with old standard: " + (end - start));

I kept the same benchmark logic everyone is complaining about, to make it easier for you to compare.

As you can see, there are situations where for loops will always be more efficient than using a Stream, but Streams offer significant advantages in certain situations as well. It would be unwise to extrapolate from one isolated test that one approach is always better than the other--that is an axiom for life as well.

like image 146
Hank D Avatar answered Dec 14 '25 21:12

Hank D