A colleague asked me to explain how indexes (indices?) boost up performance; I tried to do so, but got confused myself.
I used the model below for explanation (an error/diagnostics logging database). It consists of three tables:
System and TraceTypes tablesI used MySQL for the demo, however I don't recall the table types I used. I think it was InnoDB.
 System                                TraceTypes
-----------------------------         ------------------------------------------
| ID          | Name        |         | ID    | Code   | Description           |
-----------------------------         ------------------------------------------
| 1           | billing     |         | 1     | Info   | Informational mesage  |
| 2           | hr          |         | 2     | Warning| Warning only          |
-----------------------------         | 3     | Error  | Failure               |
           |                          ------------------------------------------
           |                ------------|
 Traces    |                |            
 --------------------------------------------------
 | ID | System_ID | TraceTypes_ID | Message       |
 --------------------------------------------------
 | 1  |  1        |  1            | Job starting  |
 | 2  |  1        |  3            | System.nullr..|
 --------------------------------------------------
First, i added some records to all of the tables and demonstrated that the query below executes in 0.005 seconds:
select count(*) from Traces 
  inner join System on Traces.System_ID = System.ID
  inner join TraceTypes on Traces.TraceTypes_ID = TraceTypes.ID
where 
  System.Name='billing' and TraceTypes.Code = 'Info'
Then I generated more data (no indexes yet)
Now the previous query took 8-10 seconds.
I created indexes on Traces.System_ID column and Traces.TraceTypes_ID column. Now this query executed in milliseconds: 
select count(*) from Traces where System_id=1 and TraceTypes_ID=1;
This was also fast:
select count(*) from Traces 
  inner join System on Traces.System_ID = System.ID
where System.Name='billing' and TraceTypes_ID=1;
but the previous query which joined all the three tables still took 8-10 seconds to complete.
Only when I created a compound index (both System_ID and TraceTypes_ID columns included in index), the speed went down to milliseconds.
The basic statement I was taught earlier is "all the columns you use for join-ing, must be indexed".
However, in my scenario I had indexes on both System_ID and TraceTypes_ID, however MySQL didn't use them. The question is - why? My bets is - the item count ratio 100:10,000,000:50 makes the single-column indexes too large to be used. But is it true?
First, the correct, and the easiest, way to analyze a slow SQL statement is to do EXPLAIN. Find out how the optimizer chose its plan and ponder on why and how to improve that. I'd suggest to study the EXPLAIN results with only 2 separate indexes to see how mysql execute your statement.
I'm not very familiar with MySQL, but it seems that there's restriction of MySQL 4 of using only one index per table involved in a query. There seems to be improvements on this since MySQL 5 (index merge), but I'm not sure whether it applies to your case. Again, EXPLAIN should tell you the truth.
Even with using 2 indexes per table allowed (MySQL 5), using 2 separate indexes is generally slower than compound index. Using 2 separate indexes requires index merge step, compared to the single pass of using a compound index.
Multi Column indexes vs Index Merge might be helpful, which uses MySQL 5.4.2.
It's not the size of the indexes so much as the selectivity that determines whether the optimizer will use them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With