trying to filter out the user-password from a URL.
(I could've split it manually by the last '@' sign, but I'd rather use a parser)
Python gives a deprecation warning but urlparse()
doesn't handle user/password.
Should I just trust the last-@-sign, or is there a new version of split-user?
Python 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0] on linux
>>> url="http://usr:[email protected]/path&var=val"
>>> import urllib.parse
>>> urllib.parse.splituser(url)
<stdin>:1: DeprecationWarning: urllib.parse.splituser() is deprecated as of 3.8, use urllib.parse.urlparse() instead
('http://usr:pswd', 'www.site.com/path&var=val')
>>> urllib.parse.urlparse(url)
ParseResult(scheme='http', netloc='usr:[email protected]', path='/path&var=val', params='', query='', fragment='')
#neigher with allow_fragments:
>>> urllib.parse.urlparse(url,allow_fragments=True)
ParseResult(scheme='http', netloc='us:[email protected]', path='/all', params='', query='var=val', fragment='')
(Edit: the repr() output is partial & misleading; see my answer.)
It's all there, clear and accessible.
What went wrong: The repr() here is misleading, showing only few properties / values (why? it's another question).
The result is available with explicit property get:
>>> url = 'http://usr:[email protected]:8082/nativ/page?vari=valu'
>>> p = urllib.parse.urlparse(url)
>>> p.port
8082
>>> p.hostname
'www.sharat.uk'
>>> p.password
'pswd'
>>> p.username
'usr'
>>> p.path
'/nativ/page'
>>> p.query
'vari=valu'
>>> p.scheme
'http'
Or as a one-liner (I just needed the domain):
>>> urllib.parse.urlparse('http://usr:[email protected]:8082/nativ/page?vari=valu').hostname
www.shahart.uk
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With