I'd like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn't change my /etc/resolv.conf, however.
One possible solution is to use a tool like dnspython to query addresses and httplib to build a custom url opener. I'd prefer telling urlopen to use a custom nameserver though. Any suggestions?
Looks like name resolution is ultimately handled by socket.create_connection.
-> urllib2.urlopen
-> httplib.HTTPConnection
-> socket.create_connection
Though once the "Host:" header has been set, you can resolve the host and pass on the IP address through down to the opener.
I'd suggest that you subclass httplib.HTTPConnection, and wrap the connect method to modify self.host before passing it to socket.create_connection.
Then subclass HTTPHandler (and HTTPSHandler) to replace the http_open method with one that passes your HTTPConnection instead of httplib's own to do_open.
Like this:
import urllib2
import httplib
import socket
def MyResolver(host):
if host == 'news.bbc.co.uk':
return '66.102.9.104' # Google IP
else:
return host
class MyHTTPConnection(httplib.HTTPConnection):
def connect(self):
self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
class MyHTTPSConnection(httplib.HTTPSConnection):
def connect(self):
sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self,req):
return self.do_open(MyHTTPConnection,req)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self,req):
return self.do_open(MyHTTPSConnection,req)
opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://news.bbc.co.uk')
data = f.read()
from lxml import etree
doc = etree.HTML(data)
>>> print doc.xpath('//title/text()')
['Google']
Obviously there are certificate issues if you use the HTTPS, and you'll need to fill out MyResolver...
Another (dirty) way is monkey-patching socket.getaddrinfo.
For example this code adds a (unlimited) cache for dns lookups.
import socket
prv_getaddrinfo = socket.getaddrinfo
dns_cache = {} # or a weakref.WeakValueDictionary()
def new_getaddrinfo(*args):
try:
return dns_cache[args]
except KeyError:
res = prv_getaddrinfo(*args)
dns_cache[args] = res
return res
socket.getaddrinfo = new_getaddrinfo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With