As you can see with the sample code below, I'm using Puppeteer with a cluster of workers in Node to run multiple requests of websites screenshots by a given URL:
const cluster = require('cluster'); const express = require('express'); const bodyParser = require('body-parser'); const puppeteer = require('puppeteer');  async function getScreenshot(domain) {     let screenshot;     const browser = await puppeteer.launch({ args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage'] });     const page = await browser.newPage();      try {         await page.goto('http://' + domain + '/', { timeout: 60000, waitUntil: 'networkidle2' });     } catch (error) {         try {             await page.goto('http://' + domain + '/', { timeout: 120000, waitUntil: 'networkidle2' });             screenshot = await page.screenshot({ type: 'png', encoding: 'base64' });         } catch (error) {             console.error('Connecting to: ' + domain + ' failed due to: ' + error);         }      await page.close();     await browser.close();      return screenshot; }  if (cluster.isMaster) {     const numOfWorkers = require('os').cpus().length;     for (let worker = 0; worker < numOfWorkers; worker++) {         cluster.fork();     }      cluster.on('exit', function (worker, code, signal) {         console.debug('Worker ' + worker.process.pid + ' died with code: ' + code + ', and signal: ' + signal);         Cluster.fork();     });      cluster.on('message', function (handler, msg) {         console.debug('Worker: ' + handler.process.pid + ' has finished working on ' + msg.domain + '. Exiting...');         if (Cluster.workers[handler.id]) {             Cluster.workers[handler.id].kill('SIGTERM');         }     }); } else {     const app = express();     app.use(bodyParser.json());     app.listen(80, function() {         console.debug('Worker ' + process.pid + ' is listening to incoming messages');     });      app.post('/screenshot', (req, res) => {         const domain = req.body.domain;          getScreenshot(domain)             .then((screenshot) =>                 try {                     process.send({ domain: domain });                 } catch (error) {                     console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);                 }                  res.status(200).json({ screenshot: screenshot });             })             .catch((error) => {                 try {                     process.send({ domain: domain });                 } catch (error) {                     console.error('Error while exiting worker ' + process.pid + ' due to: ' + error);                 }                  res.status(500).json({ error: error });             });     }); } Some explanation:
My problem is that some legitimate domains get errors that I can't explain:
Error: Protocol error (Page.navigate): Target closed. Error: Protocol error (Runtime.callFunctionOn): Session closed. Most likely the page has been closed. I read at some git issue (that I can't find now) that it can happen when the page redirects and adds 'www' at the start, but I'm hoping it's false... Is there something I'm missing?
When you launch a browser via puppeteer.launch it will start a browser and connect to it. From there on any function you execute on your opened browser (like page.goto) will be send via the Chrome DevTools Protocol to the browser. A target means a tab in this context.
The Target closed exception is thrown when you are trying to run a function, but the target (tab) was already closed.
The error message was recently changed to give more meaningful information. It now gives the following message:
Error: Protocol error (Target.activateTarget): Session closed. Most likely the page has been closed.
There are multiple reasons why this could happen.
You used a resource that was already closed
Most likely, you are seeing this message because you closed the tab/browser and are still trying to use the resource. To give an simple example:
const browser = await puppeteer.launch(); const page = await browser.newPage();  await browser.close(); await page.goto('http://www.google.com'); In this case the browser was closed and after that, a page.goto was called resulting in the error message. Most of the time, it will not be that obvious. Maybe an error handler already closed the page during a cleanup task, while your script is still crawling.
The browser crashed or was unable to initialize
I also experience this every few hundred requests. There is an issue about this on the puppeteer repository as well. It seems to be the case, when you are using a lot of memory or CPU power. Maybe you are spawning a lot of browser? In these cases the browser might crash or disconnect.
I found no "silver bullet" solution to this problem. But you might want to check out the library puppeteer-cluster (disclaimer: I'm the author) which handles these kind of error cases and let's you retry the URL when the error happens. It can also manage a pool of browser instances and would also simplify your code.
I was just experiencing the same issue every time I tried running my puppeteer script*. The above did not resolve this issue for me.
I got it to work by removing and reinstalling the puppeteer package:
npm remove puppeteer npm i puppeteer *I only experienced this issue when setting the headless option to 'false`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With