Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium webdriver returns empty list from find_elements_by_X

My goal is to get a list of the names of all the new items that have been posted on https://www.prusaprinters.org/prints during the full 24 hours of a given day.

Through a bit of reading I've learned that I should be using Selenium because the site I'm scraping is dynamic (loads more objects as the user scrolls).

Trouble is, I can't seem to get anything but an empty list from webdriver.find_elements_by_ with any of the suffixes listed at https://selenium-python.readthedocs.io/locating-elements.html.

On the site, I see "class = name" and "class = clamp-two-lines" when I inspect the element I want to get the title of (see screenshot), but I can't seem to return a list of all the elements on the page with that name class or the clamp-two-lines class.

prusaprinters inspect element

Here's the code I have so far (the lines commented out are failed attempts):

from timeit import default_timer as timer
start_time = timer()
print("Script Started")

import bs4, selenium, smtplib, time
from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(r'D:\PortableApps\Python Peripherals\chromedriver.exe')

url = 'https://www.prusaprinters.org/prints'
driver.get(url)
# foo = driver.find_elements_by_name('name')
# foo = driver.find_elements_by_xpath('name')
# foo = driver.find_elements_by_class_name('name')
# foo = driver.find_elements_by_tag_name('name')
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[class*=name]')]
# foo = [i.get_attribute('href') for i in driver.find_elements_by_css_selector('[id*=clamp-two-lines]')]
# foo = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.XPATH, '//*[@id="printListOuter"]//ul[@class="clamp-two-lines"]/li')))
print(foo)
driver.quit()

print("Time to run: " + str(round(timer() - start_time,4)) + "s")

My research:

  1. Selenium only returns an empty list
  2. Selenium find_elements_by_css_selector returns an empty list
  3. Web Scraping Python (BeautifulSoup,Requests)
  4. Get HTML Source of WebElement in Selenium WebDriver using Python
  5. How to get Inspect Element code in Selenium WebDriver
  6. Web Scraping Python (BeautifulSoup,Requests)
  7. https://chrisalbon.com/python/web_scraping/monitor_a_website/
  8. https://www.codementor.io/@gergelykovcs/how-and-why-i-built-a-simple-web-scrapig-script-to-notify-us-about-our-favourite-food-fcrhuhn45
  9. https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_dynamic_websites.htm
like image 329
TempleGuard527 Avatar asked Sep 18 '25 19:09

TempleGuard527


2 Answers

To get text wait for visibility of the elements. Css selector for titles is #printListOuter h3:

titles = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))

for title in titles:
    print(title.text)

Shorter version:

wait = WebDriverWait(driver, 10)
titles = [title.text for title in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, '#printListOuter h3')))]
like image 137
Sers Avatar answered Sep 21 '25 12:09

Sers


This is xpath of the name of the items:

.//div[@class='print-list-item']/div/a/h3/span
like image 34
Pratik Avatar answered Sep 21 '25 14:09

Pratik