We are using MySQL 5.5.42.
We have a table publications containing about 150 million rows (about 140 GB on an SSD).
The table has many columns, of which two are of particular interest:
id is primary key of the table and is of type bigint
cluster_id is a nullable column of type bigint
Both columns have their own (separate) index.
We make queries of the form
SELECT * FROM publications
WHERE id >= 14032924480302800156 AND cluster_id IS NULL
ORDER BY id
LIMIT 0, 200;
Here is the problem: The larger the
idvalue (14032924480302800156 in the example above), the slower the request.
In other words, requests for low id value are fast (< 0.1 s) but the higher the id value, the slower the request (up to minutes).
Everything is fine if we use another (indexed) column in the WHERE clause. For instance
SELECT * FROM publications
WHERE inserted_at >= '2014-06-20 19:30:25' AND cluster_id IS NULL
ORDER BY inserted_at
LIMIT 0, 200;
where inserted_at is of type timestamp.
Edit:
Output of EXPLAIN when using id >= 14032924480302800156:
id | select_type | table        | type | possible_keys      | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+--------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | PRIMARY,cluster_id | cluster_id | 9       | const | 71647796 | Using where
Output of EXPLAIN when using inserted_at >= '2014-06-20 19:30:25':
id | select_type | table        | type | possible_keys          | key        | key_len | ref   | rows     | Extra
---+-------------+--------------+------+------------------------+------------+---------+-------+----------+------------
1  | SIMPLE      | publications | ref  | inserted_at,cluster_id | cluster_id | 9       | const | 71647796 | Using where
The IS NULL operator is used to test for empty values (NULL values).
The IS NULL constraint can be used whenever the column is empty and the symbol ( ' ') is used when there is empty value. mysql> SELECT * FROM ColumnValueNullDemo WHERE ColumnName IS NULL OR ColumnName = ' '; After executing the above query, the output obtained is.
In HeidiSql, you can insert NULL by clicking on a cell, and then Ctrl+Shift+N.
There is some guesswork involved about MySQL using indexes in the wrong order. PRIMARY index seems to be treated in a completely different way than the others.
In a query with a primary key condition indexes PRIMARY and on cluster_id can be used. For some reason, MySQL ignored PRIMARY index and looks at an index on cluster_id first, where you have a condition: it should be NULL. That leaves us with a huge potentially unordered (NULLs everywhere!) set of rows to be filtered by id.
With the next query, however, it's different: PRIMARY index cannot be used at all, so MySQL figures what to use in a better way, apparently using an index on inserted_at first without any hints.
What it should actually do in first query is take PRIMARY index first (tell it to do so). I am not a MySQL user, all my guesswork is backed only by my own understanding of internal data structures. I don't know whether it can apply an index on cluster_id on top of the results, but creating a composite index and comparing performance with and without it may give clues on whether it's used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With