Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB poor write performance on large collections with 50.000.000 documents plus

Tags:

mongodb

I have got a MongoDB which store product data for 204.639.403 items, those data has already spitted up, by the item's country, into four logical databases running on the same physical machine in the same MongoDB process.

Here is a list with the number of documents per logical database:

  • CoUk: 56.719.977
  • De: 61.216.165
  • Fr: 52.280.460
  • It: 34.422.801

My problem is that the database write performance is getting worser, especially writes to the largest of the four databases (De) has become really bad, according to iotop the mongod process uses 99% of the IO time with less than 3MB writes and 1.5MB reads per second. This leads to long locking databases, 100%+ lock become normally according to mongostat - even if all processes writing and reading to the other country databases has been stopped. The current slave reaches a LOAD up to 6, the replica set master has a load of 2-3 at the same time, therefore it leads to a replication lag, too.

Each databases has the same data and index structure, I am using the largest database (De) for further examples only.

This is a random item taken from the database, just as example, the structure is optimized to gather all important data with a single read:

{
    "_id" : ObjectId("533b675dba0e381ecf4daa86"),
    "ProductId" : "XGW1-E002F-DW",
    "Title" : "Sample item",
    "OfferNew" : {
        "Count" : 7,
        "LowestPrice" : 2631,
        "OfferCondition" : "NEW"
    },
    "Country" : "de",
    "ImageUrl" : "http://….jpg",
    "OfferHistoryNew" : [ 
        … 
        {
            "Date" : ISODate("2014-06-01T23:22:10.940+02:00"),
            "Value" : {
                "Count" : 10,
                "LowestPrice" : 2171,
                "OfferCondition" : "NEW"
            }
        }
    ],
    "Processed" : ISODate("2014-06-09T23:22:10.940+02:00"),
    "Eans" : [ 
        "9781241461959"
    ],
    "OfferUsed" : {
        "Count" : 1,
        "LowestPrice" : 5660,
        "OfferCondition" : "USED"
    },
    "Categories" : [ 
        NumberLong(186606), 
        NumberLong(541686), 
        NumberLong(288100), 
        NumberLong(143), 
        NumberLong(15777241)
    ]
}

Typical querys are form simple one like by the ProductId or an EAN only to refinements by the category and sorted by its A rank or refinements by the category and an A rank range (1 up to 10.000 for example) and sorted by the B rank… .

This are the stats from the largest db:

{
    "ns" : "De.Item",
    "count" : 61216165,
    "size" : 43915150656,
    "avgObjSize" : 717,
    "storageSize" : 45795192544,
    "numExtents" : 42,
    "nindexes" : 6,
    "lastExtentSize" : 2146426864,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 1,
    "totalIndexSize" : 41356824320,
    "indexSizes" : {
        "_id_" : 2544027808,
        "RankA_1" : 1718096464,
        "Categories_1_RankA_1_RankB_-1" : 16383534832,
        "Eans_1" : 2846073776,
        "Categories_1_RankA_-1" : 15115290064,
        "ProductId_1" : 2749801376
    },
    "ok" : 1
}

It is mentionable that the index size is nearly half of the storage size.

Each country DB has to handle 3-5 million updates/inserts per day, my target is to perform the write operations in less than five hours during the night.

Currently it's a replica set with two servers, each has 32GB RAM and a RAID1 with 2TB HDDs. Simple optimizations like the deadlock scheduler and noatime has already been made.

I have worked out some optimizations strategies:

  • Reducing the number indexes:
    • the default _id could use the ProductId instead of the default MongoId which would save 6-7% per DB per total nixes size.
    • Trying to remove the Categories_1_RankA_-1 index maybe the BrowseNodes_1_RankA_1_RankB_-1 index could handle the query, too. Does sorting still performs well when not the complete index is used? Another way would be storing the index matching Categories_1_RankA_1_RankB_-1 in another collection which refers to the main collection.
  • Reducing the amount of raw data by using smaller keys, instead of 'Categories', 'Eans', 'OfferHistoryNew'… I could use 'a', 'b', 'c'… this should be easy since I used http://mongojack.org/ but I don't now how worthwhile it will be.
  • Replacing the RAID1 with a RAID0, could be easily tested by taken the slave down, reinstalling and reading it to the replica set… .
  • Testing stronger Hardware SSDs and more memory which should handle the reads and writes faster.
  • Use MongoDB's shading capabilities:
    • I read that each shard has to hold the whole database index?
    • I have concerns that the query structure might not fit into a sharing environment well. Using the product id as shard key seems to fit not all query types and sharding by the category is complicated, too. A single item can be listed in multiple main and sub categories … . My concerns could be wrong, I never used it in a production environment.

But there should be other optimization strategies, too did not comes to my mind I would like to hear about!
Which optimization strategy sound most promising or is a mixture of several optimizations is needed here?

like image 667
Guido Krömer Avatar asked Sep 08 '25 17:09

Guido Krömer


1 Answers

Most likely you are running into issues due to record growth, see http://docs.mongodb.org/manual/core/write-performance/#document-growth.

Mongo prefers records of fixed (or at least bounded) size. Increasing the record size beyond the pre-allocated storage will cause the document to be moved to another location on disk, multiplying your I/O with each write. Consider pre-allocating "enough" space for your average document on insert, if your document sizes are relatively homogenous. Otherwise consider splitting rapidly growing nested arrays into a separate collection, thereby replacing updates with inserts. Also check your fragmentation and consider compacting your databases from time to time, so that you have a higher density of documents per block which will cut down on hard page faults.

like image 132
qSlug Avatar answered Sep 10 '25 08:09

qSlug