Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should IEnumerable returned from OrderBy be evaluated if used multiple times?

I'm looking at some code which calls the extension method OrderBy. The resulting IEnumerable is potentially used multiple times. I've heard that with LINQ, it is better to evaluate expressions if they may be used more than once, because if you don't the LINQ query will be executed more than once. Is this the case here also? (Initially, looking at the code, I didn't realise it was LINQ, but I see from the MSDN documentation that OrderBy is in the LINQ namespace.)

To make it concrete, the code looks like this, except the item being enumerated is more complicated than an int and there may be many more orders of magnitude more of them than there are in this simple example.

IEnumerable<int> Multiply(IEnumerable<int> list, int howMany, int k)
{
    return list.Take(howMany).Select(i => i * k);
}

void Main()
{
    int[] unsorted = { 1, 7, 3, 9, 4 };
    IEnumerable<int> sorted = unsorted.OrderBy(i=>i); // Add .ToList() ?
    for(int k=1; k<=3; ++k) {
        IEnumerable<int> multiplied = Multiply(sorted, k, k);
        Console.WriteLine(String.Join(", ", multiplied));
    }
}

This code has the same output regardless of whether I use .ToList() or not.

1
2, 6
3, 9, 12

It seems slightly surprising that this code could be sorting over and over again. But if it is, and I ought to have the .ToList(), the output is the same, so how, in general, should I know that .ToList() is required? Is it simply seeing the magic words

deferred execution

in the documentation?


To address @Matt Burland's suggestion that I should test the performance for myself I changed the program to the following (using doubles to avoid overflow problems).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;

namespace OrderByPerformanceTest
{
    class Program
    {
        static IEnumerable<double> Multiply(IEnumerable<double> list, int howMany, double k)
        {
            return list.Take(howMany).Select(i => i * k);
        }

        static void Main(string[] args)
        {
            int n = 1000;
            IEnumerable<double> unsorted = Enumerable.Range(0, n).Select(i => (double)(n-i));
            //Console.WriteLine(String.Join(", ", unsorted));
            IEnumerable<double> sorted1 = unsorted.OrderBy(i => i); // Add .ToList() ?
            //Console.WriteLine(String.Join(", ", sorted1));
            var sw = new Stopwatch();
            sw.Start();
            double sum = 0;
            for (int k = 1; k <= n; ++k)
            {
                IEnumerable<double> multiplied = Multiply(sorted1, k, k);
                sum += multiplied.Sum();
                //Console.WriteLine(String.Join(", ", multiplied));
            }
            sw.Stop();
            Console.WriteLine("Time {0}ms, sum {1}", sw.ElapsedMilliseconds, sum);
        }
    }
}

Result:

  • without ToList, 115ms
  • with ToList, 10ms

(sum is the same in both cases)

like image 684
TooTone Avatar asked Jan 27 '26 05:01

TooTone


2 Answers

When you use Linq expression, the result of the expression is not calculated at the definition of the expression but when you iterate on it.

If you iterate multiple times the result will be calculated (and can be different if the base list use in linq expression are changed).

If you use ToList() and keep the result of the method, the result will be calculated now and only one time, and when you iterate multiple times on the result of the ToList() method you are sure to have the same output.

like image 130
Seuleuzeuh Avatar answered Jan 29 '26 19:01

Seuleuzeuh


"Deferred execution" does not mean the resultset will be reused for the second call. It just means the first element can be returned before the whole resultset is evaluated.

So, if you iterate over IEnumerable (returned from OrderBy) for the second time, it will sort the collection again (and then use deferred execution to give elements of the sorted collection one-by-one). The sort happens when you start iterating, so it doesn't really matter whether you consume all the elements. This applies to LINQ to Objects - SQL may behave differently.

So yes, since memorizing the "materialized" resultset (from ToArray or ToList) is typically cheaper than sorting it over-and-over again, you should probably do it. In your case, I'm guessing Multiply iterates over sorted, and is called multiple times, so it's no surprise you observe large performance advantage when avoiding multiple sorts.


NOTE: When it comes to LINQ to SQL, relational databases have the concept of "fetching" - they are able to return first row before finishing the whole query, even for sorting under right conditions (presence of the right index). In that case, it may actually be cheaper to fetch few starting rows several times, than materialize the whole big resultset.

like image 44
Branko Dimitrijevic Avatar answered Jan 29 '26 20:01

Branko Dimitrijevic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!