I have the following scenario: In a MySQL database, I have 2 MyISAM tables, one with 4.2 million rows, and another with 320 million rows. The following is the schema for the tables:
Table1 (4.2M rows)
F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 varchar(40)
f3 varchar(40)
f4 varchar(40)
f5 varchar(40)
f6 smallint(6)
f7 smallint(6)
f8 varchar(40)
f9 varchar(40)
f10 smallint(6)
f11 varchar(10)
f12 tinyint(4)
f13 smallint(6)
f14 text
Table2 (320M rows)
F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 INTEGER UNSIGNED NOT NULL
Table2 is in a different database but I am using a stored procedure which queries the two tables. The relation between the two tables is that for Table1.F1 there may be up to approx. 100 rows in Table2.F1 (foreign key) which match, and the value for Table2.f2 will be returned for these matched keys. I have an index IX1(f2(15),f3(10)) on Table1 and an index IX2(F1,f2) and IX3(f2) in Table 2
The queries I am running are the following:
SELECT g.F1
FROM DB1.Table1 g 
INNER JOIN DB2.Table2 gp ON g.F1 = gp.F1 
WHERE (gp.f2 = 452677825) AND
(g.f2 = 'A string value') LIMIT 0,56
This query is sometimes very fast (<1s) but changing the string value that g.F2 is compared to leads to queries which take even over 11 and sometimes even 30 seconds. I cannot understand why this is so. The following is the output of the EXPLAIN on the SELECT that is executed.
1, 'SIMPLE', 'g', 'ref', 'PRIMARY,IX1', 'IX1', '17', 'const', 901, 'Using where'
1, 'SIMPLE', 'gp', 'ref', 'IX3,IX2', 'IX2', '8', 'DB1.g.F1,const', 1, 'Using index'
which seems to be quite a good execution plan. The number of rows in the top row of the explain goes to 2000 at most, but I do not see why this should take any longer than a fraction of a second to return results. I also ran profiler on the query and noticed that the queries are spending 99.9% of the time on the "Sending data" stage. Can anyone please explain why this is so, and what can be done to optimise the query?
Thanks in advance, Tim
I'm not an expert in this area, but here are a few thoughts:
Query speed taking longer when g.F2 changes is because of caching.  MySQL will save the results for each query (until the cache is full), but new queries are run on an empty cache, so they take longer.  You shouldn't optimize based on this.  (See How to measure accurately)
I can't tell from your information whether the g or gp table has greater specificity (seems like gp?) in the where clause, but you may want to try a subquery instead.  (See How to force the inner query to execute first)
Regarding profiling, it's possible you're hitting a physical threshold like exceeding ram allocation (using swap is disastrous for performance) that would not be obvious from explain, or whether explain is just wrong in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With