I just stumbled upon a few lines of code in a system I just started working with that I don't really get. The system has a large table that saves lots of entities with unique IDs and removes them once they're not longer needed but it never reuses them. So the table looks like this
------------------------
| id |info1|info2|info3|
------------------------
| 1 | foo1| foo2| foo3|
------------------------
| 17 | bar1| bar2| bar3|
------------------------
| 26 | bam1| bam2| bam3|
------------------------
| 328| baz1| baz2| baz3|
------------------------
etc.
In one place in the codebase there is a while loop whose purpose it is to loop through all entities in the DB and do things to them and right now this is solved like this
int lastId = fetchMaxId()
int id = 0
while (id = fetchNextId()){
doStuffWith(id)
}
where fetchMaxId is straight forward
int fetchMaxId(){
return sqlQuery("SELECT MAX(id) FROM Table")
}
but fetchNextId confuses me. It is implemented as
int fetchNextId(currentId, maxId){
return sqlQuery("
SELECT id FROM Table where id > :currentId and id <= :maxId LIMIT 1
")
}
This system has been in production for several years so it obviously works but when I tried searching for a solution to why this works I only found people saying the same thing that I already thought i knew. The order in which a MySQL DB returns the result is not easily determined and should not be relied upon so if you wan't a particular order use a ORDER BY clause. But are there some times you can safely ignore the ORDER BY? This code has worked for 12 years and continued to work through several DB updates. Are we just lucky or am I missing something here? Before I saw this code I would have said that if you called
fetchNextId(1, 328)
you could end up with either 17 or 26 as the answer.
Some clues to why this works may be that the id column is the primary key of the Table in question and it's set to auto increment but I can't find any documentation that would explain why
fetchNextId(1, 328)
should always returns 17 when called on the table-snippet given above.
The short answer is yes, the primary key has an order, all indexes have an order, and a primary key is simply a unique index.
As you have rightly said, you should not rely on data being returned in the order the data is stored in, the optimiser is free to return it in any order it likes, and this will be dependent on the query plan. I will however attempt to explain why your query has worked for 12 years.
Your clustered index is just your table data, and your clustering key defines the order that it is stored in. The data is stored on the leaf, and the clustering key helps the root (and intermediate notes) act as pointers to quickly get to the right leaf to retrieve the data. A nonclustered index is a very similar structure, but the lowest level simply contains a pointer to the correct position on the leaf of the clustered index.
In MySQL the primary key and the clustered index are synonymous, so the primary key is ordered, however they are fundamentally two different things. In other DBMS you can define both a primary key and a clustered index, when you do this your primary key becomes a unique nonclustered index with a pointer back to the clustered index.
In it's simplest terms you can imagine a table with an ID column that is the primary key, and another column (A), your B-Tree structure for your clustered index would be something like:
Root Node
+---+
| 1 |
+---+
Intermediate Nodes
+---+ +---+ +---+
| 1 | | 4 | | 7 |
+---+ +---+ +---+
Leaf
+-----------+ +-----------+ +-----------+
ID -> | 1 | 2 | 3 | | 4 | 5 | 6 | | 7 | 8 | 9 |
A -> | A | B | C | | D | E | F | | G | H | I |
+-----------+ +-----------+ +-----------+
In reality the leaf pages will be much bigger, but this is just a demo. Each page also has a pointer to the next page and the previous page for ease of traversing the tree. So when you do a query like:
SELECT ID, A
FROM T
WHERE ID > 5
LIMIT 1;
you are scanning a unique index so it is very likely this will be a sequential scan. Very likely is not guaranteed though.
MySQL will scan the Root node, if there is a potential match it will move on to the intermediate nodes, if the clause had been something like WHERE ID < 0
then MySQL would know that there were no results without going any further than the root node.
Once it moves on to the intermediate node it can identify that it needs to start on the second page (between 4 and 7) to start searching for an ID > 5
. So it will sequentially scan the leaf starting on the second leaf page, having already identified the LIMIT 1
it will stop once it finds a match (in this case 6) and return this data from the leaf. In such a simple example this behaviour appears to be reliable and logical. I have tried to force exceptions by choosing an ID value I know is at the end of a leaf page to see if the leaf will be scanned in the reverse order, but as yet have been unable to produce this behaviour, this does not however mean it won't happen, or that future releases of MySQL won't do this in the scenarios I have tested.
In short, just add an order by, or use MIN(ID) and be done with it. I wouldn't lose too much sleep trying to delve into the inner workings of the query optimiser to see what kind of fragmentation, or data ranges would be required to observe different ordering of the clustered index within the query plan.
The answer to your question is yes. If you look at MySQL documentation you will see that whenever a table has a primary key it has an associated index.
When looking at the documentation for indexes you will see that they will mention primary keys as a type of index.
So in case of your particular scenario:
SELECT id FROM Table where id > :currentId and id <= :maxId LIMIT 1
The query will stop executing as soon as it has found a value because of the LIMIT 1
.
Without the LIMIT 1
it would have returned 17, 24 and 328.
However will all that said I don't think you will run into any order problems when the primary key is auto incrementing but whenever there is a scenario were the primary key is a unique employee no. instead of an auto incrementing field I would not trust the order of the result because the documentation also notes that MySQL reads sequentially, so the possibility is there that a primary key could fall out of the WHERE
clause conditions and be skipped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With