I've got a simple node.js script to capture screenshots of a few web pages. It appears I'm getting tripped up somewhere along the line with my use of async/await, but I can't figure out where. I'm currently using puppeteer v1.11.0.
const puppeteer = require('puppeteer'); //a list of sites to screenshot const papers = { nytimes: "https://www.nytimes.com/", wapo: "https://www.washingtonpost.com/" }; //launch puppeteer, do everything in .then() handler puppeteer.launch({devtools:false}).then(function(browser){ //create a load_page function that returns a promise which resolves when screenshot is taken async function load_page(paper){ const url = papers[paper]; return new Promise(async function(resolve, reject){ const page = await browser.newPage(); await page.setViewport({width:1024, height: 768}); //screenshot on first console message page.once("console", async console_msg => { await page.pdf({path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"} }); //close page await page.close(); //resolve promise resolve(); }); //go to page await page.goto(url, {"waitUntil":["load", "networkidle0"]}); }) } //step through the list of papers, calling the above load_page() async function stepThru(){ for(var p in papers){ if(papers.hasOwnProperty(p)){ //wait to load page and screenshot before loading next page await load_page(p); } } //close browser after loop has finished (and all promises resolved) await browser.close(); } //kick it off stepThru(); //getting this error message: //UnhandledPromiseRejectionWarning: Error: Navigation failed because browser has disconnected! });
The Navigation failed because browser has disconnected error usually means that the node scripts that launched Puppeteer ends without waiting for the Puppeteer actions to be completed. Hence it's a problem with some waitings as you told.
About your script, I made some changes to make it work:
stepThru function so changestepThru(); to
await stepThru(); and
puppeteer.launch({devtools:false}).then(function(browser){ to
puppeteer.launch({devtools:false}).then(async function(browser){ (I added async)
goto and page.once promisesThe PDF promise is now:
new Promise(async function(resolve, reject){ //screenshot on first console message page.once("console", async () => { await page.pdf({ path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: { top:"0px", right:"0px", bottom:"0px", left:"0px" } }); resolve(); }); }) and it has a single responsibility, just the PDF creation.
page.goto and PDF promises with a Promise.all await Promise.all([ page.goto(url, {"waitUntil":["load", "networkidle2"]}), new Promise(async function(resolve, reject){ // ... pdf creation as above }) ]); page.close after the Promise.all await Promise.all([ // page.goto // PDF creation ]); await page.close(); resolve(); And now it works, here the full working script:
const puppeteer = require('puppeteer'); //a list of sites to screenshot const papers = { nytimes: "https://www.nytimes.com/", wapo: "https://www.washingtonpost.com/" }; //launch puppeteer, do everything in .then() handler puppeteer.launch({devtools:false}).then(async function(browser){ //create a load_page function that returns a promise which resolves when screenshot is taken async function load_page(paper){ const url = papers[paper]; return new Promise(async function(resolve, reject){ const page = await browser.newPage(); await page.setViewport({width:1024, height: 768}); await Promise.all([ page.goto(url, {"waitUntil":["load", "networkidle2"]}), new Promise(async function(resolve, reject){ //screenshot on first console message page.once("console", async () => { await page.pdf({path: paper + '.pdf', printBackground:true, width:'1024px', height:'768px', margin: {top:"0px", right:"0px", bottom:"0px", left:"0px"} }); resolve(); }); }) ]); await page.close(); resolve(); }) } //step through the list of papers, calling the above load_page() async function stepThru(){ for(var p in papers){ if(papers.hasOwnProperty(p)){ //wait to load page and screenshot before loading next page await load_page(p); } } await browser.close(); } await stepThru(); }); Please note that:
I changed networkidle0 to networkidle2 because the nytimes.com website takes a very long time to land a 0 network requests state (because of the AD etc.). You can wait for networkidle0 obviously but it's up to you, it's out of the scope of your question (increase the page.goto timeout in that case).
The www.washingtonpost.com site goes to TOO_MANY_REDIRECTS error so I changed to washingtonpost.com but I think that you should investigate more about it. To test the script I used more times the nytimes site and other websites. Again: it's out of the scope of your question.
Let me know if you need some more help 😉
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With