Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize 5 table SQL query (stores => items => words)

Tables

stores (100,000 rows): id (pk), name, lat, lng, ...

store_items (9,000,000 rows): store_id (fk), item_id (fk)

items (200,000 rows): id(pk), name, ...

item_words (1,000,000 rows): item_id(fk), word_id(fk)

words (50,000 rows): id(pk), word VARCHAR(255)

Note: all ids are integers.

========

Indexes

CREATE UNIQUE INDEX storeitems_storeid_itemid_i ON store_items(store_id,item_id);

CREATE UNIQUE INDEX itemwords_wordid_itemid_i ON item_words(word_id,item_id);

CREATE UNIQUE INDEX words_word_i ON words(word);

Note: I prefer multi column indexes (storeitems_storeid_itemid_i and itemwords_wordid_itemid_i) because: http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/

QUERY

select s.name, s.lat, s.lng, i.name
from words w, item_words iw, items i, store_items si, stores s
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and s.id=si.store_id
and w.word='MILK';

Problem: elapsed time is 20-120 sec (depending on the word)!!!

explain $QUERY$
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                         | key                         | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | w     | const  | PRIMARY,words_word_i                                  | words_word_i                | 257     | const                       |    1 | Using index |
|  1 | SIMPLE      | iw    | ref    | itemwords_wordid_itemid_i,itemwords_itemid_fk         | itemwords_wordid_itemid_i   | 4       | const                       |    1 | Using index |
|  1 | SIMPLE      | i     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | iw.item_id                  |    1 |             |
|  1 | SIMPLE      | si    | ref    | storeitems_storeid_itemid_i,storeitems_itemid_fk      | storeitems_itemid_fk        | 4       | iw.item_id                  |   16 | Using index |
|  1 | SIMPLE      | s     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | si.store_id                 |    1 |             |

I want elapsed time to be less than 5 secs!!! Any ideas???

==============

What I tried

I tried to see when increase in the execution time happens by adding tables to the query.

1 table

select * from words where word='MILK';

Elapsed time: 0.4 sec

2 tables

select count(*)
from words w, item_words iw
where iw.word_id=w.id
and w.word='MILK';

Elapsed time: 0.5-2 sec (depending on word)

3 tables

select count(*)
from words w, item_words iw, items i
where iw.word_id=w.id
and i.id=iw.item_id
and w.word='MILK';

Elapsed time: 0.5-2 sec (depending on word)

4 tables

select count(*)
from words w, item_words iw, items i, store_items si
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and w.word='MILK';

Elapsed time: 20-120 sec (depending on word)

I guess the problem with the indexes or with the design of query/database. But there must be a way to make it work fast. Google does it somehow and their tables are much bigger!

like image 900
Ivan Avatar asked Jul 06 '11 06:07

Ivan


2 Answers

a) You're actually writing queries to do FTS inside mysql -> use real FTS like lucene instead.

b) Clearly, adding the 9M row join is the performance issue

c) How about limiting that join (maybe it's being done in full with the current query plan) like this :

SELECT
    s.name, s.lat, s.lng, i.name
FROM
    (SELECT * FROM words WHERE word='MILK') w
INNER JOIN
    item_words iw
ON
    iw.word_id=w.id
INNER JOIN
    items i
ON
    i.id=iw.item_id
INNER JOIN
    store_items si
ON
    si.item_id=i.id
INNER JOIN
    stores s
ON
    s.id=si.store_id;

The logic behind this is that instead of joining full tables and then limiting the results, you start by limiting the tables on which you will join, this (if the join order happens to be the one I wrote) will greatly reduce your working set and inner queries running time.

d) Google does NOT use mysql for FTS

like image 131
Morg. Avatar answered Sep 26 '22 02:09

Morg.


Consider de-normalising the structure - the first candidate is the 1 million record item_words table - bring the words directly into the table. Creating a list of unique words might be more easily achieved through a view (depends on how often you need this data compared to, for example, your need to extract a list of shops with products associated with a keyword). Secondly - create indexed views (not an option in MySQL, but certainly an option on other commercial databases).

like image 37
Pieter Wessels Avatar answered Sep 25 '22 02:09

Pieter Wessels