Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Way to iterate through entire mongo database that is too large to load? [duplicate]

I must do some data processing for one of my company's clients. They have a database of about 4.7GB of data. I need to add a field to each of these documents calculated using two properties of the mongo documents and an external reference.

My problem is, I can not do collection.find() because Node.js runs out of memory. What is the best way to iterate through an entire collection that is too large to load with a single call to find?

like image 392
awimley Avatar asked Oct 19 '25 08:10

awimley


1 Answers

yes, there is a way. Mongo is designed to handle large datasets.

You are probably running out of memory, not because of db.collection.find(), but because you are trying to dump it all at once with something like db.collection.find().toArray().

The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:

var outsidevars = {
   "z": 5
};

var manipulator = function(document,outsidevars) {
    var newfield = document.x + document.y + outsidevars.z;
    document.newField = newfield;
    return document;
};

var cursor = db.collection.find();

while (cursor.hasNext()) {
    // load only one document from the resultset into memory
    var thisdoc = cursor.next();
    var newnoc = manipulator(thisdoc,outsidevars);
    d.collection.update({"_id": thisdoc['_id']},newdoc);
};
like image 125
code_monk Avatar answered Oct 21 '25 23:10

code_monk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!