The question may sound easy, but I am facing difficulty in solving it. I have a table like following:
<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>
My code is following:
from lxml import etree
for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):
for c in elem.xpath("//td"):
if(c.getchildren()): # for the <span> thing
text = c.xpath("//span/text()")
else:
text = c.text
But I am unable to iterate over the "td" elements. I have been trying this whole day but of no avail!! I want to get 2003. 1.19, and -0.48.
Kindly help!
It looks like you have HTML, not XML. Therefore, use lxml.html, not lxml.etree
to parse the data. If data.html looks like this:
<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>
then
import lxml.html as LH
tree = LH.parse('data.html')
print([td.text_content() for td in tree.xpath('//td')])
yields
['2003', '1.19 ', '-0.48 ']
If
for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):
is not returning any elems, then you need to show us enough HTML to help us debug why this XPath is not working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With