Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an idempotent version of urllib.parse.quote?

Is there a version of urllib.parse.quote that is idempotent? This function should satisfy:

urllib.parse.quote(x) == urllib.parse.quote(urllib.parse.quote(x))

for a wide enough set of x strings.

If I test the function on the comma, for example:

x = urllib.parse.quote(",")
y = urllib.parse.quote("x")

then i get x = '%2C' but y = '%252C' so it is not idempotent for the comma.

If no such function exists already, could you describe an implementation? I was thinking of using:

my_unquote = lambda x: urllib.parse.quote(urllib.parse.unquote(x)) but not sure if this is even correct.

The question arose from handling urls that had been partially encoded.

like image 941
Juan Carlos Ramirez Avatar asked Oct 27 '25 08:10

Juan Carlos Ramirez


1 Answers

URL-encoding is an inherently non-idempotent operation, because the % sign is both a piece of input that needs to be encoded and a component of the output encoding (see the table here). This means that most (any?) URL-encoded strings will include characters (%) that would be re-encoded by a future encoding pass.

Put another way, it's not possible to know whether a given string has already been URL-encoded or not simply by examining the string itself. This makes writing an idempotent encoding function difficult, maybe impossible.

Depending on your use case, there are likely domain-specific workarounds you can use to simulate idempotence. For example, if you knew that the path portion of a given URL had been encoded but the scheme had not, you could run an encode on the scheme only.

like image 70
Emmett Butler Avatar answered Oct 29 '25 01:10

Emmett Butler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!