I must do some data processing for one of my company's clients. They have a database of about 4.7GB of data. I need to add a field to each of these documents calculated using two properties of the mongo documents and an external reference.
My problem is, I can not do collection.find() because Node.js runs out of memory. What is the best way to iterate through an entire collection that is too large to load with a single call to find?
yes, there is a way. Mongo is designed to handle large datasets.
You are probably running out of memory, not because of db.collection.find()
, but because you are trying to dump it all at once with something like db.collection.find().toArray()
.
The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:
var outsidevars = {
"z": 5
};
var manipulator = function(document,outsidevars) {
var newfield = document.x + document.y + outsidevars.z;
document.newField = newfield;
return document;
};
var cursor = db.collection.find();
while (cursor.hasNext()) {
// load only one document from the resultset into memory
var thisdoc = cursor.next();
var newnoc = manipulator(thisdoc,outsidevars);
d.collection.update({"_id": thisdoc['_id']},newdoc);
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With