I initially had a method that contained a LINQ query returning int[], which then got used later in a fashion similar to:
int[] result = something.Where(s => previousarray.Contains(s.field));
This turned out to be horribly slow, until the first array was retrieved as the native IQueryable<int>. It now runs very quickly, but I'm wondering how I'd deal with the situation if I was provided an int[] from elsewhere which then had to be used as above.
Is there a way to speed up the query in such cases? Converting to a List doesn't seem to help.
In LINQ-SQL, a Contains will be converted to a SELECT ... WHERE field IN(...) and should be relatively fast. In LINQ-Objects however, it will call ICollection<T>.Contains if the source is an ICollection<T>.
When a LINQ-SQL result is treated as an IEnumerable instead of an IQueryable, you lose the linq provider - i.e., any further operations will be done in memory and not in the database.
As for why its much slower in memory:
Array.Contains() is an O(n) operation so
something.Where(s => previousarray.Contains(s.field));
is O(p * s) where p is the size of previousarray and s is the size of something.
HashSet<T>.Contains() on the other hand is an O(1) operation. If you first create a hashset, you will see a big improvement on the .Contains operation as it will be O(s) instead of O(p * s).
Example:
var previousSet = new HashSet<int>(previousarray);
var result = something.Where(s => previousSet.Contains(s.field));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With