Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping dynamic html fields with lxml

I have been trying to scrape a dynamic field of an HTML page using lxml The code is pretty simple and is below:

from lxml import html
import requests
page = requests.get('http://www.airmilescalculator.com/distance/blr-to-cdg/')
tree = html.fromstring(page.content)
miles = tree.xpath('//input[@class="distanceinput2"]/text()')
print miles

The result that I derive is just an empty list [] The result is expected to be a number in the list. However I am able to scrape static fields of the same page.

Thanks in advance for the help.

like image 653
Tauseef Hussain Avatar asked Jan 28 '26 02:01

Tauseef Hussain


1 Answers

you can't select text nodes from input fields because there is no text node.

<input type="text" class="distanceinput2" .. />

To get value from an input field use:

miles = [node.value for node in tree.xpath('//input[@class="distanceinput2"]')]

and you should get them.

The desired values are computed so we need to visit the page and simulate a Click to get them.splinter package is made for that.

from pyvirtualdisplay import Display
display = Display(visible=0)
display.start()

from splinter import Browser

url = 'http://www.airmilescalculator.com/distance/blr-to-cdg/'

browser = Browser()
browser.visit(url)
browser.find_by_id('haemulti')[0].click()

print browser.find_by_id('totaldistancemiles')[0].value
print browser.find_by_id('totaldistancekm')[0].value
print browser.find_by_id('nauticalmiles')[0].value

browser.quit()


display.stop()

pyvirtualdisplay is used to hide the browser.

OUTPUT:

$python test.py 
4868
7834
4230
like image 173
Kenly Avatar answered Jan 30 '26 14:01

Kenly