I have a database that I'm taking care of containing pulse measurements.
The schema is like this:
id - monitorid - starttime - stoptime - pulses
Every monitor gives information every 10 minutes.
Currently that adds up to about 13 000 000 rows.
The start- and stoptime are varchar(10)'s, holding unix timestamps. Probably not the most efficiënt for my case.
Almost all queries against this table are 'WHERE starttime > $certaintime AND monitorid = $monid'. All these queries are currently extremely slowly.
I have an index on monitorid. I haven't yet put any on starttime and stoptime, since I figured that that will hardely give me any better cardinality, since each 10 minute slot is a new value. I'm not sure of this reasoning though.
So, my question: how would one optimize this for the range-like queries that it is confronted with mostly. Index starttime? Rebuild the table with dates instead of timestamps?
Any advice is most welcome!
Cheers,
Dieter
Create a compound btree index on monitorid + starttime
columns.
This index can give the best results for queries which use WHERE starttime > X AND monitorid = Y
clause
CREATE INDEX name ON tablename( monitorid + starttime )
monitorid
must be a leading column in this index, otherwise the index will be not usable.
Read a chapter "8.2.1.3.2 The Range Access Method for Multiple-Part Indexes" for details here: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html
They write that:
For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).
The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:
key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10
(emphasis mine)
The above means, that in your specific case if an index on monitorid + starttime
will be created, then the opimizec can use both part of the index because monitorid = $monid
is used in the where clause, but in a case of reverse index order starttime + monitorid
, the second part of the index is not usable because starttime > $certaintime
is used in the where clause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With