Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract attribute's value with XPath in Python

I have the HTML:

<table>
<tbody>
<tr>
<td align="left" valign="top" style="padding: 0 10px 0 60px;">
<img src="/files/39.jpg" width="64" height="64">
</td>
<td align="left" valign="middle"><h1>30 Rock</h1></td>
</tr>
</tbody>
</table>

Using Python and LXML I need to extract the value from the attribute src of the <img> element. Here's what I've tried:

import lxml.html
import urllib

# make HTTP request to site
page = urllib.urlopen("http://my.url.com")
# read the downloaded page
doc = lxml.html.document_fromstring(page.read())

txt1 = doc.xpath('/html/body/table[2]/tbody/tr/td[1]/img')

When I print txt1 I get the empty list only []. How can I correct this?

like image 419
Eugene Shmorgun Avatar asked Jun 27 '26 10:06

Eugene Shmorgun


1 Answers

Use this XPath:

//img/@src

Selects the src attributes of all img elements in the entire input XML document

like image 174
Kirill Polishchuk Avatar answered Jun 28 '26 23:06

Kirill Polishchuk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!