Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting timeouts to parse webpages using python lxml

Tags:

python

lxml

I am using python lxml library to parse html pages:

import lxml.html

# this might run indefinitely
page = lxml.html.parse('http://stackoverflow.com/')

Is there any way to set timeout for parsing?

like image 958
saurabh Avatar asked Feb 03 '26 09:02

saurabh


1 Answers

It looks to be using urllib.urlopen as the opener, but the easiest way to do this would just to modify the default timeout for the socket handler.

import socket
timeout = 10
socket.setdefaulttimeout(timeout)

Of course this is a quick-and-dirty solution.

like image 130
jathanism Avatar answered Feb 04 '26 21:02

jathanism



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!