Strlen function giving wrong length when there are non-english characters in string

Question

I have a program that accepts non-english characters also as an input field. Because we use strlen, it has failed to give expected length while calculating the length of the string when there is a non-english character. For input nova, output is 4 whereas for input ñova, the output is 5 whereas the output should be 4.

strlen("nova") = 4
strlen("ñova") = 5

In the 2nd case, I would expect the output as 4 instead.

strlen("nova") = 4
strlen("ñova") = 5

In the 2nd case, I would expect the output as 4 instead.

Toby Speight · Accepted Answer

Remember that strlen returns the count of char in the string, which is not necessarily the same as the number of visible glyphs when it's printed.

The result will depend on your system's character coding - with ISO-8859.1, "ñova" is the same as { 241, 111, 118, 97, 0} (length 4), but if you use UTF-8, for example, then ñ is a multi-byte character and the string is represented as {195, 177, 111, 118, 97, 0} (length 5).

If you want to count the number of codepoints, then you probably want to be using mbrlen() instead of strlen(). If you want to count the number of "user" characters, taking account of combining accents and the like, then you really need a character-handling library such as ICU.

Strlen function giving wrong length when there are non-english characters in string

Tags:

c

encoding

string-length

strlen

non-english

Bover

1 Answers

Toby Speight

Recent Activity

Donate For Us

Strlen function giving wrong length when there are non-english characters in string

Tags:

c

encoding

string-length

strlen

non-english

Bover

1 Answers

Toby Speight

Related questions

Recent Activity

Donate For Us