from the documents, the urllib.unquote_plus should replce plus signs by spaces. but when I tried the below code in IDLE for python 2.7, it did not.
>>s = 'http://stackoverflow.com/questions/?q1=xx%2Bxx%2Bxx'
>>urllib.unquote_plus(s)
>>'http://stackoverflow.com/questions/?q1=xx+xx+xx'
I also tried doing something like urllib.unquote_plus(s).decode('utf-8').
is there a proper to decode the url component?
%2B is the escape code for a literal +; it is being unescaped entirely correctly.
Don't confuse this with the URL escaped +, which is the escape character for spaces:
>>> s = 'http://stackoverflow.com/questions/?q1=xx+xx+xx'
>>> urllib.unquote_plus(s)
'http://stackoverflow.com/questions/?q1=xx xx xx'
unquote_plus() only decodes encoded spaces to literal spaces ('+' -> ' '), not encoded + symbols ('%2B' -> '+').
If you have input to decode that uses %2B instead of + where you expected spaces, then those input values were perhaps doubly quoted, you'd need to unquote them twice. You'd see % escapes encoded too:
>>> urllib.quote_plus('Hello world!')
'Hello+world%21'
>>> urllib.quote_plus(urllib.quote_plus('Hello world!'))
'Hello%2Bworld%2521'
where %25 is the quoted % character.
Those aren't spaces, those are actual pluses. A space is %20, which in that part of the URL is indeed equivalent to +, but %2B means a literal plus.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With