Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Some UTF-8 characters do not show up on browser

Some UTF-8 characters like the UTF-8 equivalent of C2 96 (hyphen). On the browser it displays it as (utf box with 00 96). And not as '-'(hyphen). Any reasons for this behavior? How do we correct this?

http://stuffofinterest.com/misc/utf8.php?s=128 (Refer this URL for the codes)

I found that this can be handled with html entities. Is there any way to display this without converting to html entities?

like image 410
Krishna Avatar asked Dec 18 '22 06:12

Krishna


2 Answers

The character you're talking about is an en-dash, not a hyphen. Its Unicode code point is U+2013, and its UTF-8 encoding is E2 80 93, not C2 96. That table you linked to is incorrect. The first two columns have nothing to do with UCS-2 or Unicode; they actually contain the windows-1252 encodings for the characters in question. The columns labeled "UTF-8 Hex" and "UTF-8 Native" are just plain wrong, at least for the rows labeled 128 to 159. The entities – and – represent an en-dash, but the UTF-8 sequence C2 96 represents a non-displayable control character.

You shouldn't need to encode those characters manually anyway. Just tell your text editor (or whatever you use to create the content) to save the file as UTF-8.

like image 192
Alan Moore Avatar answered Apr 09 '23 03:04

Alan Moore


I suspect this is because the characters between U+0080 and U+009F inclusive are control characters. I'm still slightly surprised that they show differently when encoded directly in the HTML than using entities, but basically you shouldn't be using them to start with. U+0096 isn't really "hyphen", it's "start of guarded area".

See the U+0080-U+00FF code chart for more information. Basically, try to avoid control characters...

like image 23
Jon Skeet Avatar answered Apr 09 '23 04:04

Jon Skeet