Scraping dynamic html fields with lxml

Question

I have been trying to scrape a dynamic field of an HTML page using lxml The code is pretty simple and is below:

from lxml import html
import requests
page = requests.get('http://www.airmilescalculator.com/distance/blr-to-cdg/')
tree = html.fromstring(page.content)
miles = tree.xpath('//input[@class="distanceinput2"]/text()')
print miles

The result that I derive is just an empty list [] The result is expected to be a number in the list. However I am able to scrape static fields of the same page.

Thanks in advance for the help.

Kenly · Accepted Answer

you can't select text nodes from input fields because there is no text node.

<input type="text" class="distanceinput2" .. />

To get value from an input field use:

miles = [node.value for node in tree.xpath('//input[@class="distanceinput2"]')]

and you should get them.

The desired values are computed so we need to visit the page and simulate a Click to get them.splinter package is made for that.

from pyvirtualdisplay import Display
display = Display(visible=0)
display.start()

from splinter import Browser

url = 'http://www.airmilescalculator.com/distance/blr-to-cdg/'

browser = Browser()
browser.visit(url)
browser.find_by_id('haemulti')[0].click()

print browser.find_by_id('totaldistancemiles')[0].value
print browser.find_by_id('totaldistancekm')[0].value
print browser.find_by_id('nauticalmiles')[0].value

browser.quit()


display.stop()

pyvirtualdisplay is used to hide the browser.

OUTPUT:

$python test.py 
4868
7834
4230

Scraping dynamic html fields with lxml

Tags:

python

html

web-scraping

lxml

lxml.html

Tauseef Hussain

1 Answers

Kenly

Recent Activity

Donate For Us

Scraping dynamic html fields with lxml

Tags:

python

html

web-scraping

lxml

lxml.html

Tauseef Hussain

1 Answers

Kenly

Related questions

Recent Activity

Donate For Us