Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js event loop not making sense to me

I'm new to Node.js. I've been working my way through "Node.js the Right Way" by Jim R. Wilson and I'm running into a contradiction in the book (and in Node.js itself?) that I haven't been able to reconcile to my satisfaction with any amount of googling.

It's stated repetitively in the book and in other resources I have looked at online that Node.js runs callbacks in response to some event line-by-line until completion, then the event loop proceeds with waiting or invoking the next callback. And because Node.js is single-threaded (and short of explicitly doing anything with the cluster module, also runs as a single process), my understanding is that there is only ever, at most, one chunk of JavaScript code executing at a time.

Am I understanding that correctly? Here's the contradiction (in my mind). How is Node.js so highly concurrent if this is the case?

Here is an example straight from the book that illustrates my confusion. It is intended to walk a directory of many thousands of XML files and extract the relevant bits of each into a JSON document.

First the parser:

'use strict';
const
  fs = require('fs'),
  cheerio = require('cheerio');

module.exports = function(filename, callback) {
  fs.readFile(filename, function(err, data){
    if (err) { return callback(err); }
    let
      $ = cheerio.load(data.toString()),
      collect = function(index, elem) {
        return $(elem).text();
      };

    callback(null, {
      _id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''), 
      title: $('dcterms\\:title').text(), 
      authors: $('pgterms\\:agent pgterms\\:name').map(collect), 
      subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect) 
    });
  });
};

And the bit that walks the directory structure:

'use strict';
const

  file = require('file'),
  rdfParser = require('./lib/rdf-parser.js');

console.log('beginning directory walk');

file.walk(__dirname + '/cache', function(err, dirPath, dirs, files){
  files.forEach(function(path){
    rdfParser(path, function(err, doc) {
      if (err) {
        throw err;
      } else {
        console.log(doc);
      }
    });
  });
});

If you run this code, you will get an error resulting from the fact that the program exhausts all available file descriptors. This would seem to indicate that the program has opened thousands of files concurrently.

My question is... how can this possibly be, unless the event model and/or concurrency model behave differently than how they have been explained?

I'm sure someone out there knows this and can shed light on it, but for the moment, color me very confused!

like image 496
Kent Rancourt Avatar asked Oct 14 '25 09:10

Kent Rancourt


2 Answers

Am I understanding that correctly?

Yes.

How is Node.js so highly concurrent if this is the case?

Not the javascript execution itself is concurrent - the IO (and other heavy tasks) is. When you call an asynchronous function, it will start the task (for example, reading a file) and return immediately to "run the next line of the script" as you put it. The task however will continue in the background (concurrently) to read the file, and once it's finished it will put the callback that has been assigned to it onto the event loop queue which will call it with the then available data.

For details on this "in the background" processing, and how node actually manages to run all these asynchronous tasks in parallel, have a look at the question Nodejs Event Loop.

like image 149
Bergi Avatar answered Oct 16 '25 22:10

Bergi


This is a pretty simple description, and skips a lot of things.

files.forEach is not asynchnous. Therefore the code goes through the list of files in the directory, calling fs.readFile on each one, then returns to the event loop.

The loop then has a load of file open events to process, which will then queue up file read events. Then the loop can start going through and calling the callbacks to fs.readFile with the data that's been read. These can only be called one at a time: as you say there's only one thread executing javascript at any one time.

However, before any of these callbacks are called, you've already opened every file in that original list, leading to file handle exhaustion if there were too many.

like image 31
OrangeDog Avatar answered Oct 16 '25 23:10

OrangeDog



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!