Selenium Headless vs Non Headless

Question

I am currently working on a webscraping project using Selenium in python.

My code works as intended when run from a web driver in non-headless mode. However, it is not the case when it is run in headless mode. For instance, if I try to extract text from a website, the non-headless mode returns the text, while the headless mode returns None. (I have included some code below for reference).

First, I constructed the webdriver with the following code (the opt.headless is set to True or False in order to switch between headless and non-headless)

def getHeadlessDriver():
     opts = webdriver.ChromeOptions()
     opts.headless = False
     driver = webdriver.Chrome(ChromeDriverManager().install(), options=opts)
     return driver

Then, I used the find_elements_by_xpath function to extract texts data from a website. A sample code is provided below:

driver = getHeadlessDriver()
feedbacks = driver.find_elements_by_xpath(
    "//div[contains(@class, 'LiveFeedbackSectionViewController__LiveFeedbackStatusItem-sc-1ahetk9-4 cUJPkM')]")
for feedback in feedbacks:
     print(feedback.text)

I did some googling to find explanation for why the headless mode does not work, but I am still not sure. From my understanding, a headless mode "acts the same", but just without a Graphical User Interface.

Could there be a problem with the implementation of my code? Or does headless mode have other differences other than not having a graphical user interface?

Thank you.

Zhor · Accepted Answer

If the website you are trying to scrape has dynamic elements rendered by javascript you will need Xvfb.

sudo apt-get install -y xvfb

"Xvfb or X virtual framebuffer is a display server implementing the X11 display server protocol. In contrast to other display servers, Xvfb performs all graphical operations in virtual memory without showing any screen output."

In python, there are two wrappers for Xvfb.

1- xvfbwrapper

pip install xvfbwrapper

Then add in your python file:

from xvfbwrapper import Xvfb

display = Xvfb()
display.start()

2- pyvirtualdisplay

pip install PyVirtualDisplay

And then in your code:

from pyvirtualdisplay import Display

display = Display(visible=0, size=(1024, 768))
display.start()

Jules · Answer

I can usually bypass this problem with time.sleep(10), however, I got one particular website that I can't scrape with either time.sleep(10) or driver.implicitly_wait(10).

I think that the website has a system that checks the user-agent of the browser.

To try and bypass this issue I've added the user agent to the headless window and it worked.

browser_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.30'

options_edge.add_argument(f'user-agent={self.user_agent}')

You can get your user agent from websites like this: https://whatmyuseragent.com/ (not affiliated)

Selenium Headless vs Non Headless

Tags:

python

selenium

selenium-webdriver

selenium-chromedriver

headless-browser

checodes

2 Answers

Zhor

Jules

Recent Activity

Donate For Us

Selenium Headless vs Non Headless

Tags:

python

selenium

selenium-webdriver

selenium-chromedriver

headless-browser

checodes

2 Answers

Zhor

Jules

Related questions

Recent Activity

Donate For Us