Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort huge file with low RAM on node.js

We have 500GB file with integer rows. How we can sort it with only 512Mb RAM using Node.js? I think something like that:

  1. Break main file on chunks by 256Mb
  2. Sort every chunk
  3. Get first row of every chunk, sort and push it to final file
  4. Do the 3th step for every rows in chunks.

Some ideas?

UPDATE: Thanks to user some-random-it-boy This solution based on child-process with native sort util. I think it should work)

var fs = require('fs'),
    spawn = require('child_process').spawn,
    sort = spawn('sort', ['in.txt']);

var writer = fs.createWriteStream('out.txt');

sort.stdout.on('data', function (data) {
  writer.write(data)
});

sort.on('exit', function (code) {
  if (code) console.log(code); //if some error
  writer.end();
});
like image 600
Dan Mishin Avatar asked Sep 09 '25 14:09

Dan Mishin


1 Answers

I hate to give a non-js solution to a js question. But since you are using the node environment why not delegate this task to a process that has been designed just for that?

With your package child-process, call the sort (docs here) command with whatever parameters you need.

Quoting from this answer:

According to the algorithm used by sort, it will use memory according to what is available: half of the biggest number between TotalMem/8 and AvailableMem. So, for example, if you have 4 GB of available mem (out of 8 GB), sort will use 2GB of RAM. It should also create many 2 GB files in /bigdisk and finally merge-sort them.

Which essentially was what you suggested doing, already implemented and in C running on bare hardware without any interpreters in between. I guess you can't get faster than that within your constraints :)

like image 66
Some random IT boy Avatar answered Sep 12 '25 04:09

Some random IT boy