Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strlen function giving wrong length when there are non-english characters in string

I have a program that accepts non-english characters also as an input field. Because we use strlen, it has failed to give expected length while calculating the length of the string when there is a non-english character. For input nova, output is 4 whereas for input ñova, the output is 5 whereas the output should be 4.

  1. strlen("nova") = 4
  2. strlen("ñova") = 5

In the 2nd case, I would expect the output as 4 instead.

like image 656
Bover Avatar asked Dec 28 '25 06:12

Bover


1 Answers

Remember that strlen returns the count of char in the string, which is not necessarily the same as the number of visible glyphs when it's printed.

The result will depend on your system's character coding - with ISO-8859.1, "ñova" is the same as { 241, 111, 118, 97, 0} (length 4), but if you use UTF-8, for example, then ñ is a multi-byte character and the string is represented as {195, 177, 111, 118, 97, 0} (length 5).

If you want to count the number of codepoints, then you probably want to be using mbrlen() instead of strlen(). If you want to count the number of "user" characters, taking account of combining accents and the like, then you really need a character-handling library such as ICU.

like image 89
Toby Speight Avatar answered Dec 30 '25 21:12

Toby Speight



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!