Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web Crawler with JavaScript support in Perl?

I want to code a perl application that would crawl some websites and collect images and links from such webpages. Because the most of pages use JavaScript that generate a HTML content, I need to code quasi a client browser with JavaScript support to be able to parse a final HTML code that is generated and/or modified by JavaScript. What are my options?

If possible, please publish some implementation code or link to some example(s).

like image 647
Ωmega Avatar asked Dec 06 '25 08:12

Ωmega


2 Answers

There are several options.

  • Win32::IE::Mechanize on Windows
  • Mozilla::Mechanize
  • WWW::Mechanize::Firefox
  • WWW::Selenium
  • Wight
like image 107
2 revsQuentin Avatar answered Dec 08 '25 20:12

2 revsQuentin


Options that spring to mind:

  • You could have Perl use Selenium and have a full-blown browser do the work for you.

  • You can download and compile V8 or another open source JavaScript engine and have Perl call an external program to evaluate the JavaScript.

  • I don't think Perl's LWP module supports JavaScript, but you might want to check that if you haven't done so already.

like image 39
Trott Avatar answered Dec 08 '25 21:12

Trott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!