hello so am doing some web automation and I want to open run puppeteer multithreaded what I mean like open the same page 10s of times and what I understood of what I read the worker thread is the best solution I guess? but I didn't get how to use it properly and I will put a sample code of what I did
const { Worker, isMainThread } = require('worker_threads');
const puppeteer = require('puppeteer') ;
let scrapt = async()=>{
/* -------------------------------------------------------------------------- */
/* Launching puppeteer */
/* -------------------------------------------------------------------------- */
try{
const browser = await puppeteer.launch({headless: true }) ;
const page = await browser.newPage();
await page.setUserAgent(
`Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36`
);
let Browser_b = new Date()
await page.goto('https://www.supremenewyork.com/')
let browser_e = new Date()
console.log(browser_e - Browser_b)
}
catch(e){
console.log(e)
}
let ex = [1,2,3,4]
if (isMainThread) {
// This re-loads the current file inside a Worker instance.asdasd
new Worker(__filename);
} else {
for(let val of ex) {
scrapt();
}
}
this script opens 4 browsers but if I open more the pc lag ALOT since I think it's only using one thread not using them all? Thank u in advance and sorry for my stupidity
ever tried using Cluster? it's a good way for multi_processing and easier to use than worker_threads in my opinion here is an example from HERE
const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
console.log(`Master ${process.pid} is running`);
// Fork workers.
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`worker ${worker.process.pid} died`);
});
} else {
// Workers can share any TCP connection
// In this case it is an HTTP server
http.createServer((req, res) => {
res.writeHead(200);
res.end('hello world\n');
}).listen(8000);
console.log(`Worker ${process.pid} started`);
}
The popular npm package "puppeteer-cluster" can handle this:
https://www.npmjs.com/package/puppeteer-cluster
As Taha Daboussi suggest - a cluster is better than a regular worker, as the cluster worker will stay online, and there is less resources and time needed in starting / stopping puppeteer.
You stay within the "page" object and do your operations there, the stuff outside is already managed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With