Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

runtime of using indexing in mongodb

Based on mongodb documentation

The ensureIndex() function only creates the index if it does not exist.

Once a collection is indexed on a key, random access on query expressions which match the specified key are fast. Without the index, MongoDB has to go through each document checking the value of specified key in the query:

db.things.find({j:2});  // fast - uses index
db.things.find({x:3});  // slow - has to check all because 'x' isn't 

Does that mean the 1st line of code runtime is big_theta = 1, and 2nd line of code is big_theta = n ?

like image 426
bouncingHippo Avatar asked Jan 28 '26 15:01

bouncingHippo


1 Answers

MongoDB uses B-tree for indexing, as can be seen in the source code for index.cpp. This means that lookups can be expressed as O(log N) where N is the number of documents, but it is also O(D) if D is the depth of the tree (assuming the tree is somewhat balanced). D is usually very small because each node will have many children.

The number of children in a node in MongoDB is about 8192 (btree.h) so an index with a few billion documents may fit in a tree with only 3 levels! You easily realize that the logarithm is not log_2 (as in binary trees) but instead log_8192, which grows extremely slowly.

Because of this, b-trees are usually regarded as constant-time lookup, O(1), in practice.

Another good reason for keeping many children in each node is that the index is stored on disk. You want to try to utilize all the space in a disk block for one node to improve cache performance and reduce disk seeks. B-trees have very good disk performance because you only need to visit very few nodes to find what you are looking for.

like image 190
Emil Vikström Avatar answered Jan 30 '26 11:01

Emil Vikström



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!