Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java String.toUpperCase()

Just the other day I ran into a strange strange bug. I had a string of characters that I had to build. And as a delimiter the host system I was communicating with used char 254. Anyway I build out my string and sent it to the host. On the host I was receiving char 222 as the delimiter! After scratching my head and looking into it deeper it seemed odd that

hex : FE, binary: 11111110

was turning into

hex: DE, binary: 11011110

I tried the Locale.getDefault() and Locale.ENGLISH to no avail.

Could it be that the implementation of String.toUpperCase has a mask for ALL chars except specific hard coded ones?

For now I'm using the following to get around the problem:

public static String toUpperCase(String input) {

    char[] chars = input.toCharArray();


    for(int i = 0; i < chars.length; ++i ) {

        if( chars[i] > 96 && chars[i] < 123 ) {

            chars[i] &= 223;
        }

    }

    return new String(chars);

}

my question is am I missing something? Is there a better way that I am not aware of? Thanks a bunch!

like image 234
Arash Sharif Avatar asked Feb 26 '26 20:02

Arash Sharif


2 Answers

The Unicode character 254 is the lower case thorn, þ, a letter used in Icelandic that stands roughly for the "th" sound. Its upper case version is the character 222, upper case thorn Þ. What did you expect would happen?

like image 196
Joni Avatar answered Mar 01 '26 09:03

Joni


Java uses UTF-16 in general. The first 256 values of the char primitive type in Java are exactly the same as the Latin-1 character set, which is given here. On that chart you can see that capitalizing value 254 (Lower Icelandic thorn) will convert it to value 222 (Upper Icelandic thorn).

The moral is: don't use values which have case as delimiters in a String.

like image 23
Cory Kendall Avatar answered Mar 01 '26 09:03

Cory Kendall



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!