Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does order of indexes matter in MongoDB?

Tags:

mongodb

Is tip #25 in Tips and Tricks for MongoDB Developers correct?

It says that this query:

collection.find({"x" : criteria, "y" : criteria, "z" : criteria})

can be optimized with

collection.ensureIndex({"y" : 1, "z" : 1, "x" : 1})

I think it's false because for this to work, x should be in front. I thought the order of indexes matter.

So where did I go wrong?

like image 803
user4951 Avatar asked Oct 17 '25 19:10

user4951


2 Answers

The order of the fields in the index only matters if the query doesn't include all of the fields in the index. This query is referencing all three fields so the order of the fields in the index doesn't matter.

See more details in the docs on compound indexes.

The order of the fields in the find query object is not relevant.

like image 102
JohnnyHK Avatar answered Oct 20 '25 19:10

JohnnyHK


  • For beginners who wants to understand it better
  • Mongodb says The index contains references to documents sorted first by the values of the item field and, within each value of the item field, sorted by values of the stock field." What does this mean ????
  • let's create a compound index on fields a, b, c, and d in ascescending order(1)
Model.createIndex({ a: 1, b: 1, c: 1, d: 1 });
  • I visualize it as:

    • at level-1, list of references sorted in a specified order(1) based on the value of the first index field(a)
    • at level-2, each reference at level-1, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(b).
    • at level-3, each reference at level-2, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(c).
    • at level-4, each reference at level-3, holds another set of references from thier location in a specified order(1) based on the value of the second field in the chain(d).
  • This chain forms a tree structure thus is chosen to store in B-TREE data structure.

  • I would love to call this storage system a compound-index-chain in this context.

  • Normally we build indices to perform two types of operations 1. Query Operation like find() and 2. Non-query operation like Sort()

  • Now you created compound index on { a: 1, b: 1, c: 1, d: 1 }. But only index creation is not enought. It becomes inefficient and sometimes useless if you don't structure your database operatons(find and sort) in a way that use those indexes.

  • Let's dig deeper into what kinds of query supports what kind of index ?

  • find():

    • The following prefixes of the compound index supports also indexed find() query operation on fields
    {a:1},
    {a:1, b:1},
    {a:1, b:1, c:1}
    // Index prefixes are the beginning subsets of indexed fields
    
    • @JohnnyHK already said "The order of the fields in the find query object is not relevant."

      • The fields could be in ANY ORDER like {b:1, a1} instead of {a1:, b:1}. Index will still be utilized as long as it is find() operation being operated on compound index or the prefix of the compound index.
      • However the performance of the query will not be same(may degrade) even though the find() query is using the same index and index is being utilized if the order of the the fields in find() is not highly selective than other subsequent fields.
      • Meaning, if the first field in a query say find({a: 'red', b: tshrt}), has HIGH SELECTIVITY, the query will be less efficient than find({a: 'tshirt', b: 'red'}) as this query hs LOWER SELECTIVITY even though both queries are using one index {a:1, b:1}.
      • However the HIGHLY SELECTIVE query will perform better than not having any index at all.
      • I think @Sushil tried to touch this topic.
        • In case if you are still wondering, Query selectivity refers to how well the query predicate excludes or filters out documents in a collection. Query selectivity can determine whether or not queries can use indexes effectively or even use indexes at all.
    • Now Let's come to the prefixes of compound indexs

    • Note:find() behaves differently on this {a:1, c:1} prefix of the compound index {a:1, b:1, c:1, d1} than rest of its prefixes?

    • In this case, The find() operation will not be able to utilize our compound index efficiently.

    • What happens is a:1 field index will only be able to support the find query. index on c:1 field will not be used at all because compound-index-chain has been broken in between due to the absence of b:1 index field in the prefix.

    • So if find() query operates on a and c field together, for field a:1 IXSCAN( i.e use of index on a) and field c COLLSCAN(i.e no use of index) will be used. Meaining the query will be slower than having separate compound index on {a:1,c:1} but faster than not having any index at all.

    • Conclusion is Index fields are parsed in order; if a query omits a particular index prefix, it is unable to make use of any index fields that follow that prefix.

  • 2. Sort():

    • For non-query-operation(i.e Sort), the subsets of the compount index must in the same order of the index as well as must also be in the either same or oposite direction of the direction specified for each field while creating the compound index.

    • Let's see how the our compound index { a: 1, b: 1, c: 1, d: 1 } with ascescending direction behave with sort() operation:

    • Let's look at the direction of the indexed fields in sorting.

      • As we know on single field index on {a:1} can support sort on {a:1} same-direction and {a:-1} reverse-direction,
      • Compound indexes follow the same rules while sorting.
      {a:1, b:1, c:1, d:1} // in same-direction as of our compound index
      {a:-1, b:-1, c:-1, d:-1 } // in reverse-direction of our compound index
      
      // But these field have neither same-direction nor reverse-direction but is ARBITARY/MIXED. Thus
      // Index will be discarded while performing sorting with these fields and directions
      {a:-1:, b:1, c:1, d:1}
      
      • Another example would be compound index on {a:1, b:-1} can support indexed sorting on {a:1,b:-1} (same-direction) and on {a:-1,b:1}(reverse-direction) BUT NOT support {a:-1, b:-1}.
    • Now let's look at the order of the indexed fields in sorting

    • OPTIMUM SORTING:

      • When a Sort operation using the compound index or using the prefix of the compound index, examining the result set in the memory(RAM) is not needed. Such sorting operation is solely sattisfied by the fields available in the index, gives optimum performance in sorting operation.
      • For instance:
      // compound index
      {a:1, b:1, c:1, d:1}
      
      // prefix of the compound index
      {a:1},
      {a:1, b:1},
      {a:1, b:1, c:1}
      
      
    • Compound-Index-Chain-Break:

    • When a sort operation is partially covered by the compound index, may require to examine the non-indexed matched result set in the memory.

    Model.find({ a: 2 }).sort({ c: 1 }); // will not use index for sorting using field c. But will be used for finding
    Model.find({ a: { $gt: 2 } }).sort({ c: 1 }); // will not use index for sorting But will be used for finding
    // because compound-index-chain-break due to absence of field b of the prefix {a:1, b:1, c:1} of our compound index {a:1, b:1, c:1, d:1}
    
    • Sort on Non-prefix Subset:

    • When prefix keys of the index appear in both the query predicate(i.e find()) and the sort(), that index fields which precedes(or overlap) the sort subset MUST have the equality conditions($eq,$gte,$lte) in the query. So

    • A compound index can support indexed query on the its index prefixes as well.

    Model.find({ c: 5 }).sort({ c: 1 }); // will not use index at all because it does not belongs to any of the prefix of our compound index
    Model.find({ b: 3, a: 4 }).sort({ c: 1 }); // will use the index for both finding and sorting as it belongs to one our index prexfix ie. {a:1, b:1, c:1}
    
    Model.find({ a: 4 }).sort({ a: 1, b: 1 }); // will use index for finding but not use index for sorting because a field is overlapped.
    Model.find({ a: { $gt: 4 } }).sort({ a: 1, b: 1 }); // will use index for both finding and sorting because overlapped field (a) in the predicate uses equality operator and it belongs to the prefix {a:1, b:1}
    
    Model.find({ a: 5, b: 3 }).sort({ b: 1 }); // will not use index for sorting
    Model.find({ a: 5, b: { $lt: 3 } }).sort({ b: 1 }); // will use index for both finding and sorting
    
  • Hope this helps somebody

like image 27
Firoj Siddiki Avatar answered Oct 20 '25 20:10

Firoj Siddiki



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!