I am using Entity Framework in a C# application and I am using Lazy Loading. We noticed one query has an extremely high impact on our CPU, which merely calculates a sum. When debugging the query generated by Entity Framework it creates a INNER JOIN (SELECT ...
which is not performant. When I manually change the query to a proper JOIN the query time goes from 1.3 sec to 0.03 sec.
Let me illustrate it with a simplified version of my code.
public decimal GetPortfolioValue(Guid portfolioId)
{
var value = DbContext.Portfolios
.Where( x => x.Id.Equals(portfolioId) )
.SelectMany( p => p.Items
.Where( i => i.Status == ItemStatusConstants.Subscribed
&& _activeStatuses.Contains( i.Category.Status ) )
)
.Select( i => i.Amount )
.DefaultIfEmpty(0)
.Sum();
return value;
}
This generates a query that selects the sum but does an inner join on a SELECT of two tables joined together. I created a pastebin here for the generated query not to pollute this question but a shortened version would be:
SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN (SELECT
`Extent2`.*,
`Extent3`.*
FROM `items` AS `Extent2`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` =
`Extent2`.`category_id`) AS `Join1`
ON `Extent1`.`id` = `Join1`.`portfolio_id`
AND ((`Join1`.`status` = @gp1)
AND (`Join1`.`STATUS1` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...
The query I'd expect it to generate (and which takes 0.03 sec instead of 1.3 sec) would be something like
SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN `items` AS `Extent2` ON `Extent2`.`portfolio_id` = `Extent1`.`id`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = `Extent2`.`category_id`
AND ((`Extent2`.`status` = @gp1)
AND (`Extent3`.`status` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...
I suspect it's due to the .SelectMany
but I don't see how I should rewrite the LINQ query to make it more efficient. As for the Entities, the linking properties are virtual and have a foreign key configured:
public class Portfolio
{
public Guid Id { get; set; }
public virtual ICollection<Item> Items { get; set; }
}
public class Item
{
public Guid Id { get; set; }
public Guid PortfolioId { get; set; }
public Guid CategoryId { get; set; }
public decimal Amount { get; set; }
public string Status { get; set; }
public virtual Portfolio Portfolio { get; set; }
public virtual Category Category { get; set; }
}
public class Category
{
public Guid Id { get; set; }
public string Status { get; set; }
public virtual ICollection<Item> Items { get; set; }
}
Any help would be greatly appreciated!
As you dont need anything from Portfolio, just filter by PortfolioId, you could query directly PortfolioItems. Assuming your DbContext has a DbSet with all items in all portfolios, could be something like this:
var value = DbContext.PortfolioItems
.Where(i => i.PortfolioId == portfolioId && i.Status == ItemStatusConstants.Subscribed && _activeStatuses.Contains(i.Category.Status))
.Sum(i=>i.Amount);
I believe you don't need the DefaultIfEmpty nor the select if you use directly the appropiate Queryable.Sum overload.
EDITED: Tried two different LINQ queries without exposing a DbSet.
The first query is basically the same as yours:
var value2 = dbContext.Portfolios
.Where(p => p.Id == portfolioId)
.SelectMany(p => p.Items)
.Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
.Select(i=>i.Amount)
.DefaultIfEmpty()
.Sum();
Profiled the query in SQL Server (don't have MySql at hand) and produces an ugly sentence (parameters replaced and quotes unescaped for testing):
SELECT [GroupBy1].[a1] AS [C1]
FROM (SELECT Sum([Join2].[a1_0]) AS [A1]
FROM (SELECT CASE
WHEN ( [Project1].[c1] IS NULL ) THEN Cast(
0 AS DECIMAL(18))
ELSE [Project1].[amount]
END AS [A1_0]
FROM (SELECT 1 AS X) AS [SingleRowTable1]
LEFT OUTER JOIN
(SELECT [Extent1].[amount] AS [Amount],
Cast(1 AS TINYINT) AS [C1]
FROM [dbo].[items] AS [Extent1]
INNER JOIN [dbo].[categories] AS
[Extent2]
ON [Extent1].[categoryid] =
[Extent2].[id]
WHERE ( N'A' = [Extent1].[status] )
AND ( [Extent1].[portfolioid] =
'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' )
AND ( [Extent2].[status] IN ( N'A', N'B', N'C' )
)
AND ( [Extent2].[status] IS NOT NULL )) AS
[Project1]
ON 1 = 1) AS [Join2]) AS [GroupBy1]
If we remove the "Select" and "DefaultIfEmpty" methods, and rewrite the query as this:
var value = dbContext.Portfolios
.Where(p => p.Id == portfolioId)
.SelectMany(p => p.Items)
.Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
.Sum(i => i.Amount);
The generated sentence is much cleaner:
SELECT [GroupBy1].[a1] AS [C1]
FROM (SELECT Sum([Extent1].[amount]) AS [A1]
FROM [dbo].[items] AS [Extent1]
INNER JOIN [dbo].[categories] AS [Extent2]
ON [Extent1].[categoryid] = [Extent2].[id]
WHERE ( N'A' = [Extent1].[status] )
AND ( [Extent1].[portfolioid] =
'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' )
AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) )
AND ( [Extent2].[status] IS NOT NULL )) AS [GroupBy1]
Conclussion: We cannot rely on LINQ provider to create optimized queries. The linq query must be analyzed and optimized even before thinking in SQL sentence generated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With