Below are three MySQL Queries. The first returns everything I'm looking for in one query, while the second two return the same aggregate results, but in two queries.
I'm wondering why the single query takes 2-100s of times longer to run, when it seems like process time should be equivalent between all three queries. Is there a way to optimize the single query to run as fast as the individual queries? Adding more OR's inside the WHERE statement of the single query does not increase process time, but I have cases where I need to do many more ORs and eventually the single query would be as fast as running ten individual queries.
The single query also seems to get cached after it's run, and can take minutes to run the first time, while the single queries always complete within the same timeframe.
Would a multi-column index make a big difference here?
Of note is that the table does not have an ID field as a primary index. Is this causing this undesired behavior?
It's hard to run tests here as the table has a hundred million rows and adding columns and indexes takes close to a day.
SINGLE QUERY (4.2s)
SELECT name_id
FROM staging_company_search
WHERE
    (name_word_0 = 'the' AND name_word_1 = 'glazier')
    OR (name_word_0 = 'bridgewaters' AND name_word_1 = '');
EQUIVALENT AGGREGATE QUERIES (0.8s each)
SELECT name_id
FROM staging_company_search
WHERE name_word_0 = 'the' AND name_word_1 = 'glazier';
SELECT name_id
FROM staging_company_search
WHERE name_word_0 = 'bridgewaters' AND name_word_1 = '';
EXPLAIN ON THESE QUERIES
id     select_type    table                     type    possible_keys              key          key_len  ref    rows     extra
1      SIMPLE         staging_company_search    range   name_word_0,name_word_1    name_word_0  102      NULL   2197605  Using index condition; Using where
1      SIMPLE         staging_company_search    ref     name_word_0,name_word_1    name_word_1  102      const  128      Using index condition; Using where
1      SIMPLE         staging_company_search    ref     name_word_0,name_word_1    name_word_0  102      const  33       Using index condition; Using where
DATABASE SCHEMA
CREATE TABLE `staging_company_search` (
  `name_id` int(11) unsigned NOT NULL DEFAULT '0',
  `name_word_0` varchar(100) NOT NULL,
  `name_word_1` varchar(100) NOT NULL,
  KEY `name_id` (`name_id`),
  KEY `name_word_0` (`name_word_0`),
  KEY `name_word_1` (`name_word_1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing.
Optimizing Queries with EXPLAINIt displays information from a built-in MySQL optimizer regarding the statement execution plan and the number of rows scanned in each table. Thus we can determine the cost of the query. The query below shows how EXPLAIN works with the SELECT statement.
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.
In MySQL, EXPLAIN can be used in front of a query beginning with SELECT , INSERT , DELETE , REPLACE , and UPDATE . For a simple query, it would look like the following: EXPLAIN SELECT * FROM foo WHERE foo. bar = 'infrastructure as a service' OR foo.
This is because mysql uses only one index for a simple query.
If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index).
However mysql can and will use two indexes since version 5.0 through Index merge optimization. Unfortunately it doesn't always happen and even when it does happen the results aren't all that great.
The first explain output shows that index merge optimization isn't being used in your query with the OR clause. It's only using the index on name_word_0
name_word_0 is great for WHERE name_word_0 = 'the' AND name_word_1 = 'glazier'; but as the output for the third explain shows, it's not at all suited for WHERE name_word_0 = 'bridgewaters' AND name_word_1 = '';
Therefore the combined query is really slow. YOu can overcome that by creating a composite index that spans name_word_0 and name_word_1. I noticed that your key len is really long. YOu can create a partial index and probably speed things up even further.
CREATE INDEX word01 ON staging_company_search (name_word_0(20), name_word_1(20))
Your use of an OR clause in the first query defeats the use of your index. MySQL is a little bit dumb that way. It's doing a full table scan: looking at every row. You're much better off using AND-only queries UNIONed together.
To make your AND-only queries even faster, create a compound index on (name_word_0, name_word_1, name_id). Your queries can be completely satisfied by a random access to that index, and should run subsecond with a two megarow table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With