I am trying to scrape an a few elements and return the displayed text on the webpage. I believe I can find the elements fine through css_selectors and xpaths, but i cannot return the desired text. Here is my program below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
import time
import threading
import pandas as pd
threadLocal = threading.local()
def instantiate_chrome():
driver = getattr(threadLocal, 'driver', None)
if driver is None:
options = webdriver.ChromeOptions()
options.add_argument('log-level=3')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
driver = webdriver.Chrome(executable_path = r'path/to/chrome', options = options)
setattr(threadLocal, 'driver', driver)
return driver
def search_stock(driver, stock):
search_url = r'https://www.forbes.com/search/?q=' + stock
driver.get(search_url)
time.sleep(2)
driver.find_element_by_xpath(r'/html/body/div[1]/main/div[1]/div[1]/div[4]/div/div[1]/div/div[1]/a[1]').click()
def get_q_score(stock, driver):
df = pd.DataFrame(columns = ['stock','overall_score','quality', 'momentum','growth','technicals'])
time.sleep(3)
overall_score = driver.find_element_by_css_selector(r'.q-factor-total .q-score-bar__grade-label').text
quality_score = driver.find_element_by_xpath(r'/html/body/div[1]/main/div/div[1]/div[4]/div[2]/div[2]/div[1]/div[2]/div[1]').text
return print('overall score is '+ overall_score, ' quality score is ' + quality_score)
def main(stock):
driver = instantiate_chrome()
print('attempting to get q score for ' + stock)
search_stock(driver, stock)
print('found webpage for ' + stock)
get_q_score(stock, driver)
main('AAPL')
I believe the issue to be that i am attempting to scrape the text via selenium's .text method, but there is no text to scrape. Any thoughts?
You were on the right path except for the text that you mentioned aren't actually text
. These texts
are actually rendered by a CSS
property called the content
which can only be used with the pseudo-elements :before
and :after
. You can read here on how it works if you are interested.
The text are rendered as icons; this is sometimes done by organizations to avoid sensible values being scraped. However, there is a way(somewhat hard) to get around this. Using Selenium
and javascript
you can individually target the CSS
values of the property content
in which it holds the values you are after.
Having looked into it for an hour this is simplest pythonic
way of getting the values you desire
overall_score = driver.execute_script("return [...document.querySelectorAll('.q-score-bar__grade-label')].map(div => window.getComputedStyle(div,':before').content)") #key line in the problem
The code simply creates a javascript
code that targets the classes
of the elements and then maps the div
elements to the values of the CSS
properties.
This returns a list
['"TOP BUY"', '"B"', '"B"', '"B"', '"A"']
the values, corresponding in the following order
Q-Factor Score/Quality/Momentum/Growth/Technicals
To access the values of a list you can use a for
loop and indexing
to select the value. You can see more on that here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With