I'm getting some curious behaviour parsing when parsing a URL. I was expecting to receive an invalid URL exception, but instead, the hostname of the following URL returns the URL in '[]' brackets:
from urllib.parse import urlparse
print(urlparse('http://myurl.com[notmyurl.com]').hostname)
Output:
>>> notmyurl.com
Is this expected behaviour?
This is expected behavior running your code through a debugger and stepping through the steps in the parse.py of urllib we see the following
@property
def _hostinfo(self):
netloc = self.netloc
_, _, hostinfo = netloc.rpartition('@')
_, have_open_br, bracketed = hostinfo.partition('[')
if have_open_br:
hostname, _, port = bracketed.partition(']')
_, _, port = port.partition(':')
else:
hostname, _, port = hostinfo.partition(':')
if not port:
port = None
return hostname, port
So you can see the _hostinfo method call will check for brackets in the url in return you the value from inside the brackets. Below is a screen shot of running your code through the pycharm debugger as you see in the code window it tells you the value set for each parameter and where is starts striping out the not url to return.

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With