Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Headless Browser and scraping - solutions [closed]

People also ask

What is headless browser scraping?

A headless browser is a web browser with no user interface (UI) whatsoever. Instead, it follows instructions defined by software developers in different programming languages. Headless browsers are mostly used for running automated quality assurance tests, or to scrape websites.

What is happening when the browser is running in headless mode?

A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication.

Is Selenium a headless web browser?

Selenium supports headless browser testing using HtmlUnitDriver. HtmlUnitDriver is based on java framework HtmlUnit and is the one of the lightweight and fastest among all headless browser.


If Ruby is your thing, you may also try:

  • https://github.com/chriskite/anemone (dev stopped)
  • https://github.com/sparklemotion/mechanize
  • https://github.com/postmodern/spidr
  • https://github.com/stewartmckee/cobweb
  • http://watirwebdriver.com/ (Selenium)

also, Nokogiri gem can be used for scraping:

  • http://nokogiri.org/

there is a dedicated book about how to utilise nokogiri for scraping by packt publishing


http://triflejs.org/ is like phantomjs but based on IE


A kind of JS-based Selenium is Dalek.js. It not only aims for automated frontend-tests, you can also do screenshots with it. It has webdrivers for all important browsers. Unfortunately those webdrivers seem to be worth improving (just not to say "buggy" to Firefox).