I have been trying to scrape a dynamic field of an HTML page using lxml
The code is pretty simple and is below:
from lxml import html
import requests
page = requests.get('http://www.airmilescalculator.com/distance/blr-to-cdg/')
tree = html.fromstring(page.content)
miles = tree.xpath('//input[@class="distanceinput2"]/text()')
print miles
The result that I derive is just an empty list []
The result is expected to be a number in the list.
However I am able to scrape static fields of the same page.
Thanks in advance for the help.
you can't select text nodes from input fields because there is no text node.
<input type="text" class="distanceinput2" .. />
To get value from an input field use:
miles = [node.value for node in tree.xpath('//input[@class="distanceinput2"]')]
and you should get them.
The desired values are computed so we need to visit the page and simulate a Click to get them.splinter package is made for that.
from pyvirtualdisplay import Display
display = Display(visible=0)
display.start()
from splinter import Browser
url = 'http://www.airmilescalculator.com/distance/blr-to-cdg/'
browser = Browser()
browser.visit(url)
browser.find_by_id('haemulti')[0].click()
print browser.find_by_id('totaldistancemiles')[0].value
print browser.find_by_id('totaldistancekm')[0].value
print browser.find_by_id('nauticalmiles')[0].value
browser.quit()
display.stop()
pyvirtualdisplay is used to hide the browser.
OUTPUT:
$python test.py
4868
7834
4230
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With