Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I return a web page's table cell value without ID, using column and row names (not index)

Most questions about Python and Selenium scraping a web page's table data involve a table with an ID or Class, and some index technique using a count of rows and columns. The Xpath technique is usually not explained either.

Say I have a table without an element ID or class, let's use this one for example.

I want to return the value 'Johnson', without counting row or column numbers.

Here's my attempt (edited)...

import selenium.webdriver as webdriver
import contextlib
url = 'http://www.w3schools.com/html/html_tables.asp'

with contextlib.closing(webdriver.Firefox()) as driver:
    driver.get(url)
    columnref = 3
    rowref = 4
    xpathstr = '//tr[position()=' + str(rowref) + ']//td[position()=' + str(columnref) + ']'
    data = driver.find_element_by_xpath(xpathstr).text
print data

I have gotten some good help here already, but am still using an index. I need to generate 'columnref' and 'rowref' by looking up their values. 'Last Name', and '3' respectively.

like image 581
square_eyes Avatar asked Dec 07 '25 05:12

square_eyes


1 Answers

Just use this css selector to reach the cell you want tbody > tr:nth-child(4) > td:nth-child(3), and you can generate css selector for any cell with the same way. See below:

>>> driver.find_element_by_css_selector("tbody > tr:nth-child(4) > td:nth-child(3)")
<selenium.webdriver.remote.webelement.WebElement object at 0x10fdd4510>
>>> driver.find_element_by_css_selector("tbody > tr:nth-child(4) > td:nth-child(3)").text
u'Johnson'

Alternatively, you can use position() tag to locate cell position. See below:

>>> driver.find_element_by_xpath("//tr[position()=4]//td[position()= 3]").text
u'Johnson'
>>> driver.find_element_by_xpath("//tr[position()=5]//td[position()= 3]").text
u'Smith'

If you want to get the text by column name and row number you can write a function that returns the value by finding the index of the column then getting the text as below:

def get_text_column_row(table_css, header, row):
    table = driver.find_element_by_css_selector(table_css)
    table_headers = table.find_elements_by_css_selector('tbody > tr:nth-child(1) > th')
    table_rows = table.find_elements_by_css_selector("tbody > tr > td:nth-child(1)")

    index_of_column = None
    index_of_row = None

    for i in range(len(table_headers)):
        if table_headers[i].text == header:
            index_of_column = i + 1

    for i in range(len(table_rows)):
        if table_rows[i].text == row:
            index_of_row = i + 1

    xpath = '//tr[position() = %d]//td[position() = %d]' %(index_of_row, index_of_column)

    return driver.find_element_by_xpath(xpath).text 

and use it like below:

>>> get_text_column_row('#main > table:nth-child(6)', 'Points', '3')
u'80'
>>> get_text_column_row('#main > table:nth-child(6)', 'Last Name', '3')
u'Doe'
>>> get_text_column_row('#main > table:nth-child(6)', 'Last Name', '4')
u'Johnson'
like image 91
Mesut GUNES Avatar answered Dec 08 '25 22:12

Mesut GUNES



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!