I have inherited a database which contains strings such as:
\u5353\u8d8a\u4e9a\u9a6c\u900a: \u7f51\u4e0a\u8d2d\u7269: \u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0cDVD\uff0cCD\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986
The question is, how do I get this to be displayed properly in an HTML page?
I'm using PHP5 to process the strings.
1) I downloaded and installed a unicode font named CODE2000
2) I wrote this:
<?php header('Content-Type: text/html;charset=utf-8'); ?>
<head></head>
<body style="font-family: CODE2000">
<?php
// I had to remove some strings like ': ', 'DVD', 'CD' to make it in \uXXXX format
$s = '\u5353\u8d8a\u4e9a\u9a6c\u900a\u7f51\u4e0a\u8d2d\u7269\u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0c\uff0c\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986';
$chars = explode('\\u', $s);
foreach ($chars as $char) {
$c = iconv('utf-16', 'utf-8', hex2str($char));
print $c;
}
function hex2str($hex) {
$r = '';
for ($i = 0; $i < strlen($hex) - 1; $i += 2)
$r .= chr(hexdec($hex[$i] . $hex[$i + 1]));
return $r;
}
?>
</body>
</html>
3) It produced this characters http://img267.imageshack.us/img267/9759/49139858.png which could be correct. E.g. the 1st character (5353) is indeed this while the 2nd one (8d8a) is this. Of course I cannot be 100% sure but it seems to fit. Maybe you can take it from here.
That was a good exercise :)
PHP < 6 is woefully unaware of Unicode, so you have to do everything yourself:
Let the browser know which encoding you are using. There are several ways of doing this:
Set a charset value in the Content-Type header. Something like header('Content-Type: text/html;charset=utf-8');
Use a <meta http-equiv> version of the above header.
Set the XML encoding parameter <?xml encoding="utf-8"?>
Option 1. takes precedence over 2. I'm not sure where 3. fits in.
If you need to do any string processing prior to displaying the data, make sure you use the multibyte (mb_*) string functions. If you have Unicode data coming from other sources in other encodings, you'll need to use mb_convert_encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With