I am trying to use selenium to help retrieve data from a website that uses javascript to load the information.
You can see the link here: Animal population
The page shows some selectable fields, for my purpose I am trying to retrieve the data of population of Bees, in the United Kingdom for the year 2011.
Once the selectable fields are submitted the page will load a table with the correspondent data. I only want to get the Population and Density numbers for The Whole Country.
My code so far only selects the year, country and species fields and after the table is returned it locates the 'Whole Country' field (feel free to advise me how to improve my existing code too).
I haven't been able to retrieve the population and density fields for the whole country, i have tried with xpath and 'following sibling' but it shows and exception to locate the elements.
I also don't want to rely on the position of the rows/cells since i will also try to get this information for the following years and the table fields will change position.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('https://www.oie.int/wahis_2/public/wahid.php/Countryinformation/Animalpopulation')
select = Select(driver.find_element_by_id('country6'))
select.select_by_value('GBR')
select = Select(driver.find_element_by_id('year'))
select.select_by_value('2011')
try:
element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
print element
select = Select(driver.find_element_by_id('selected_species'))
select.select_by_value('1')
except:
print "Not found"
country_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]')
#population_td = driver.find_element(By.XPATH, '//td/b[text()="The Whole Country"]/following-sibling::text()')
print country_td.text
Thank you for the help.
You need to go one level up in order to get the data using following-sibling
population = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[1]')
density = driver.find_element(By.XPATH, ('//td[b[text()="The Whole Country"]]/following-sibling::td[2]')
Or using country_td
population = country_td.find_element(By.XPATH, ('/../following-sibling::td[1]')
density = country_td.find_element(By.XPATH, ('/../following-sibling::td[2]')
What following-sibling does in your example is looking for the next sibling of an element of type <b>. What you want is an element of the type <td>. But you can also use the parent element.
The xpath for population
//b[text()="The Whole Country"]/../../td[4]/b
Or
//td/b[text()="The Whole Country"]/../following-sibling::td[1]/b
The xpath for density
//b[text()="The Whole Country"]/../../td[5]/b
Or
//td/b[text()="The Whole Country"]/../following-sibling::td[2]/b
Both kind of xpaths are working. Using .. will lead your xpath to the parent element, which you need to do and than you can progress to either the sibling or locate the element by using td[X].
In this example you can also omit the last /b at each xpath.
Note: this is really nasty, best practice is to always use unambiguous attributes to find an element. However this isn't always possible as seen in this example.
Also, you should select Bees first and than wait for the table to be present, since the table gets reloaded between selecting year/country and selecting Bees, which could lead to inconsistent data.
select = Select(driver.find_element_by_id('selected_species'))
select.select_by_value('1')
element = WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.CLASS_NAME, "TableContent ")))
print element
PS: There is a chrome extension called XPath Helper which you can use to test your xpaths on the website you are visiting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With