Planning ahead with "sql_calc_found_rows" being depreciated and using the new datatables 1.10.18 I have a question. For server-side processing you need to feed back the total and filtered total. So from what I gather I have to run 3 separate queries for every request, it seems excessive but I can't think of how to get the information without doing that.
// Grab the data with page offset
// for "data"
$query1 = 'SELECT .. FROM .. WHERE .. LIMIT ..'
// Grab filtered total which is the total using the "WHERE" without the "LIMIT"
// for "recordsFiltered"
$query2 = 'SELECT COUNT FROM .. WHERE .. '
// Grab the total records without the WHERE
// for "recordsTotal"
$query3 = 'SELECT COUNT FROM ..'
With complex queries and semi large datasets (100k-2mill) records and the fact this get's fired every time someone types a letter (every letter as they type in a word) in the search or hits the column sorting and changing pages the time and amount of queries/executions seems pretty crazy.
Am I missing anything or is this just whats required to use datatables having to fire off 3 database queries every request? Thanks.
I've used Datatables extensively (e.g. browse 10 million records in a billing DB), and you're right 3 queries are needed to achieve the results you want without any optimization. You'll want to look at DB sharding if querying more than 10 million results is absolutely necessary, because after 1 million records things like DB I/O start playing an important role.
However, using some tricks you can achieve an acceptable UX on large datasets up to 1 million records that respond pretty much instantly. So the strategy is to manipulate the DB and draft queries so that you will never need to consider the entire record set every single time. A user will always be willing to wait for a few seconds if there is animated feedback (which Datatables has) and the result they get is always what they are expecting, especially if it returns only a couple of records out of millions consistently on each search. Less is always more, which is the goal here.
Here are some things I've tried that work well:
If you have that much data, doing a COUNT is going to return an exact number every time - does it need to be exact every request? Does the end user need to see exactly 2,000,001 records reported or is just "+2 million" acceptable? If so, can you cache the total number and update it on a less frequent basis? You only need that number to return an exact total, and Datatables won't actually use it unless you've paginated to the very end. You can configure Datatables to NOT report the exact total number - so just give them an estimate. You will always need the filtered number for pagination, though - but now you only need 2 queries instead of 3 each time.
Such huge datasets tend to be read only and just appended to over time, like sales records. Can you add a custom DB index for it (e.g. using dates to filter by year), and make sure your WHERE clause uses this check first, in addition to what the Datatables search request actually was? Note that Datatables allows you to add custom params with each AJAX request client side, so you can have external selectors (e.g. combobox with years, current year is default) to help filter the results earlier in the query expression where your indexes are useful.
Use dedicated input search fields for only some columns individually and apply them specifically, instead of using one generic input field that uses search criteria on every column to the tune of: where field_name LIKE 'search input*'. Datatables supports filters per column, but I've also seen some server-side implementations with one search input for all columns, which can work on large datasets too. In any case, not all columns need to be always searched against.
You'll never return more than a few hundred results at a time that are meaningful to the end user - use the LIMIT clause with only 1000 results max returned at a time, and complain to the user about entering better search criteria if this limit is exceeded. Datatables can request a lot, but your server doesn't have to honor it.
Indexes can improve performance, but the tradeoff is they use even more space. Set up specific indexes for queries that your users commonly use.
Consider "debouncing" the search input, by adding a javascript timeout that only submits the search request after the user has stopped typing after a period of time (e.g. 1 second).
I've moved BLOB and static text data out of the database to disk file or a NOSQL alternative like Mongo, and made sure the primary keys match for both implementations. You'll find any DB you'll use will definitely be faster to work with, including performing backups which I'm sure you do.
Your mileage may vary, but as you can tell there's a lot of stuff you can do to improve performance - just don't try to query absolutely everything at once every single time. Nobody cares about exact counts on huge datatasets, and if they do then what they actually want is a report, which is a different use case that doesn't involve Datatables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With