We have a replica set of 1 primary, 1 secondary, and 1 arbiter. We delete collections often, so I am looking for a fast way to reclaim disk space used by deleted collections with no downtime, current database size is close to 3TB. I've been researching various ways of doing this, 2 common approaches are:
repairDatabase(): which needs free space equal the size of used space to be able to run, it will take the primary offline, then start initial Sync on the secondary,which is very lengthy process, during which only one node is available for read only from secondary during repairDatabase, and read/write during initial Sync.
run initial Sync on a new node, then claim as primary and retire the old one. Repeat the process for secondary. With this option, both primary and secondary are available, but very lengthy process and take almost 1 week to run initial Sync twice.
is there a better solution to reclaim disk space on a regular basis and relatively faster than the above solutions.
Note that every single collection is subject to deletion.
Thanks
there's no easy way to achieve this, unless you design your DB structure to keep different collections in different databases, which in turn will mean storing them in different paths in your HDD as long as you have the directoryPerDB set to true in your mongo.conf. This is a workaround and depending on your app it might be unpractical.
While it's true that dropping a collection won't free the hdd space, it's also true that the used space it's not lost. It will be eventually reused for new collections.
That being said, unless you are really short on space, don't reclain that space. The CPU and I/O cost of doing that regularly is far more expensive than the storage capacity cost in every provider I know of.
I'd take a look at using MongoDB's sharding functionality to address some of your issues. To quote from the documentation:
Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
While sharding is frequently used to balance thru put for a large collection across more servers, to avoid hot spots and spread the overall load, it's also useful for managing storage for large collections. In your specific case I'd investigate the use of shard tags to pin a collection to a specific shard.
Again, to quote the documentation, shard tags are useful to
isolate a specific subset of data on a specific set of shards.
For example, let's say you split your production environment into a couple of shards, shard1 and shard2. You could, using shard tags and the sharding tools, pin the collections that you frequently delete onto shard2. In this use case shard1 contains all your normal collections. When you then choose to reclaim disk storage via your second option, you'd perform this only on the shard that has the deleted collections - that way you avoid have to recreate more static data. It should run faster that way (how much faster is a function of how much data is in the deleted collections shard at any given time).
It also has the secondary benefit that as each shard (actually replica set within each shard) requires smaller servers as they only contain a subset of the overall data.
The specifics of the best way to do this will be driven by your exact use case - number and size of collections, insert, update, query and deletion frequency, etc. I described a simple 2 shard case but you can do this with many more shards. You can also have some shards running on higher performance hardware for collections that have more transaction volume.
I can't really do sharding justice within the limited space here other than to point you in the right direction to investigate it. MongoDB has a lot of good information within their documentation and their 2 online DBA courses (Which are free) get into this in some detail.
Some useful links:
http://docs.mongodb.org/manual/core/sharding-introduction/
http://docs.mongodb.org/manual/core/tag-aware-sharding/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With