Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sqlite ORDER BY group's count is slow

When I add an ORDER BY statement to my query it becomes very slow.

Here's my query without ORDER BY:

SELECT ClientIpAddress, Agentstring, Count(ClientIpAddress) AS Count FROM LogEntries
WHERE SiteIisId = 3 AND DateTime >= '13-09-2012 00:00:00'
GROUP BY ClientIpAddress, Agentstring
LIMIT 5

ET: 1ms

And now with ORDER BY:

SELECT ClientIpAddress, Agentstring, Count(ClientIpAddress) AS Count FROM LogEntries
WHERE SiteIisId = 3 AND DateTime >= '13-09-2012 00:00:00'
GROUP BY ClientIpAddress, Agentstring
ORDER BY Count DESC
LIMIT 5

ET: 294 ms

The table I'm querying contains 1.380.855 rows.

Here's the index I'm using:

CREATE INDEX "LogEntries_MostActiveClients" ON "LogEntries" ("ClientIpAddress" ASC, "Agentstring" ASC, "SiteIisId" ASC, "DateTime" DESC)

Using EXPLAIN QUERY PLAN Sqlite tells me it's scanning the table using my index and is using TEMB B-TREEfor my Order By.

How can I overcome this problem? Obviously I can't index the Count, so what to do?

Thanks a million!

like image 727
jhovgaard Avatar asked Mar 15 '26 21:03

jhovgaard


1 Answers

When you are stepping through a result set, SQLite tries to compute as many values as possible on the fly.

So, in your first query, SQLite never needs to group all the address/agent values in the table; as soon as it has read the records of the first five ClientIpAddress/Agentstring combinations through some index, it can stop.

In your second query, this is not possible: all address/agent groups have to be completely computed before they can be sorted and the first five ones chosen.

The records in the temporary result to be sorted are already in the cache, and are smaller than the data in the original table, so I'd guess that most of the time is not spent sorting, but grouping.

If the sorting were the problem, and if you had an estimate of how large the five largest counts would be, you could try adding a HAVING "Count" >= some_limit clause to reduce the number of records to be sorted.

There's nothing you can do to avoid the grouping. All you can try is to get small improvements with generic optimizations, such as:

  • increase SQLite's page cache to the size of your working set; and
  • create a covering index to avoid having to do lookups in the table itself (you already have that).

Another way would be to precompute the values for this query: have a separate table with your Count, and update it whenever you add a log entry. This will make these updates slower, and you'll have to determine what granularity to use for timestamps.

like image 108
CL. Avatar answered Mar 19 '26 17:03

CL.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!