Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why java.net.URLEncoder gives different result for same string?

Tags:

java

url

encode

On the webapp server when I try encoding "médicaux_Jérôme.txt" using java.net.URLEncoder it gives following string:

me%CC%81dicaux_Je%CC%81ro%CC%82me.txt

While on my backend server when I try encoding the same string it gives following:

m%C3%A9dicaux_J%C3%A9r%C3%B4me.txt

Can someone help me understanding the different output for the same input? Also how can I get standardized output each time I decode the same string?

like image 983
dev2d Avatar asked Nov 23 '25 01:11

dev2d


1 Answers

The outcome depends on the platform, if you don't specify it.

See the java.net.URLEncoder javadocs:

encode(String s)

Deprecated

The resulting string may vary depending on the platform's default encoding. Instead, use the encode(String,String) method to specify the encoding.

So, use the suggested method and specify the encoding:

String urlEncodedString = URLEncoder.encode(stringToBeUrlEncoded, "UTF-8")

About different representations for the same string, if you specified "UTF-8":

The two URL encoded strings you gave in the question, although differently encoded, represent the same unencoded value, so there is nothing inherently wrong there. By writing both in a decode tool, we can verify that they are the same.

This is due, as we are seeing in this case, to the fact that there are multiple ways to URL encode the same string, specially if they have acute accents (due to the combining acute accent, precisely what happens in your case).

To your case, specifically, the first string encoded é as e + ´ (latin small letter e + combining acute accent) resulting in e%CC%81. The second encoded é directly to %C3%A9 (latin small letter e with acute - two % because in UTF-8 it takes two bytes).

Again, there is no problem with either representation. Both are forms of Unicode Normalization. It is known that Mac OS Xs tend to encode using the combining acute accent; in the end, it is a matter of preference of the encoder. In your case, there must be different JREs or, if that file name was user generated, then the user may have used a different OS (or tool) that generated that encoding.

like image 57
acdcjunior Avatar answered Nov 24 '25 13:11

acdcjunior



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!