I've noticed that the position of .Take() has no effect on the resulting SQL. For example say I have this query:
IQueryable<Item> query =
db.Items.Where(i => i.CategoryID == categoryID && i.Deleted == false)
.OrderByDescending(i => i.ItemID).Skip(0);
Then do the following two queries:
IQueryable<ItemViewModel> query1 =
query.Take(20).Select(i => new ItemViewModel { ItemID = i.ItemID });
IQueryable<ItemViewModel> query2 =
query.Select(i => new ItemViewModel { ItemID = i.ItemID }).Take(20);
Entity Framework will generate the following identical SQL for both queries:
SELECT TOP (20)
[Project1].[ItemID] AS [ItemID],
FROM ( SELECT [Project1].[ItemID] AS [ItemID],
row_number() OVER (ORDER BY [Project1].[ItemID] DESC) AS [row_number]
FROM ( SELECT
[Filter1].[ItemID] AS [ItemID],
FROM (SELECT [Extent1].[ItemID] AS [ItemID],
[Extent1].[CategoryID] AS [CategoryID]
FROM [dbo].[Items] AS [Extent1]
WHERE 0 = [Extent1].[Deleted] ) AS [Filter1]
WHERE ([Filter1].[CategoryID] = @p__linq__0) AND (@p__linq__0 IS NOT NULL)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[ItemID] DESC
Why does it do this? Shouldn't it instead generate the following for query1 with the TOP 20 on the subquery?
SELECT
[Project1].[ItemID] AS [ItemID]
FROM ( SELECT [Project1].[ItemID] AS [ItemID]
FROM ( SELECT TOP (20)
[Extent1].[ItemID] AS [ItemID],
row_number() OVER (ORDER BY [Extent1].[ItemID] DESC) AS [row_number]
FROM [dbo].[Items] AS [Extent1]
WHERE ([Extent1].[CategoryID] = @p__linq__0) AND (@p__linq__0 IS NOT NULL)
AND (0 = [Extent1].[Deleted]) AND [Extent1].[row_number] > 0
ORDER BY [Extent1].[ItemID] DESC
) AS [Project1]
) AS [Project1]
I noticed that moving TOP 20 inside will increase the time from 7 seconds to an immediate response in my actual query since it doesn't do the projection before TOP 20.
Edit: Unfortunately I can't seem to find a way to force entity framework to do this since query.Take(20).Select(i => new { ... }).ToString() == query.Select(i => new { .. }).Take(20).ToString(). Maybe this is a bug in EF?
In the specific case that you mentioned, the two LINQ queries that you provided are functionally equivalent, in that the result set will always be the same, regardless of the ordering of Take and Select.
As for performance concerns, both queries will be optimized by the database platform. I would be very surprised to see any significant difference between the two. I would not expect the second query to, for example, perform the projection on a whole bunch of items it knows won't get past TOP. As a rule LINQ query providers tend to not focus on optimization, simply because that step tends to happen at the database level when the SQL is translated into actual executable code. Databases these days have spend a considerable amount of effort optimizing the compilation of SQL code, so the query provider simply has no need to duplicate those efforts.
However, when you have say filtering or ordering, it changes what the query actually returns.
query.Take(10).Where(someFilter);
doesn't (necessarily) return the same thing as:
query.Where(someFilter).Take(10);
The first takes 10 items and the returns however many of those items passes the filter.
The second query returns up to 10 items that all pass the filter.
In the case of the two SQL queries you showed, they are functionally different because one is ordering every single item in the while table and then grabbing 20 items, the other is grabbing the first 20 items and then ordering them, which is much faster of an operation.
In that case it's important for the semantic ordering of Take to be properly translated into the SQL. In cases where the query provider can provably demonstrate that the order of two given operations isn't important, it's not a problem for it to re-order them however it wants.
The answer is actually pretty simple. You're basically comparing these two queries:
select top 20 ItemId
from Items
order by ItemId desc
to:
select ItemId
from (
select top 20 ItemId
from Items
) p
order by ItemId descending
The latter will be picking twenty rows at random rather than the 20 rows with the highest ItemIds. That is why the query that you expected but didn't get is faster: it's incorrect and much simpler to run by the database.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With