Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get \uXXXX to display correctly, using PHP5

I have inherited a database which contains strings such as:

\u5353\u8d8a\u4e9a\u9a6c\u900a: \u7f51\u4e0a\u8d2d\u7269: \u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0cDVD\uff0cCD\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986

The question is, how do I get this to be displayed properly in an HTML page?

I'm using PHP5 to process the strings.


2 Answers

1) I downloaded and installed a unicode font named CODE2000

2) I wrote this:

<?php header('Content-Type: text/html;charset=utf-8'); ?>
<head></head>
<body style="font-family: CODE2000">
<?php
// I had to remove some strings like ': ', 'DVD', 'CD' to make it in \uXXXX format
$s = '\u5353\u8d8a\u4e9a\u9a6c\u900a\u7f51\u4e0a\u8d2d\u7269\u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0c\uff0c\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986';
$chars = explode('\\u', $s);
foreach ($chars as $char) {
  $c = iconv('utf-16', 'utf-8', hex2str($char));
  print $c;
}

function hex2str($hex) {
  $r = '';
  for ($i = 0; $i < strlen($hex) - 1; $i += 2)
    $r .= chr(hexdec($hex[$i] . $hex[$i + 1]));
  return $r;
}
?>
</body>
</html>

3) It produced this characters http://img267.imageshack.us/img267/9759/49139858.png which could be correct. E.g. the 1st character (5353) is indeed this while the 2nd one (8d8a) is this. Of course I cannot be 100% sure but it seems to fit. Maybe you can take it from here.

That was a good exercise :)

like image 83
daremon Avatar answered Dec 12 '25 04:12

daremon


PHP < 6 is woefully unaware of Unicode, so you have to do everything yourself:

  • Make sure that your database is using a Unicode-capable encoding for its connections. In MySQL for example, the directive is default-character-set = . UTF-8 is a reasonable choice
  • Let the browser know which encoding you are using. There are several ways of doing this:

    1. Set a charset value in the Content-Type header. Something like header('Content-Type: text/html;charset=utf-8');

    2. Use a <meta http-equiv> version of the above header.

    3. Set the XML encoding parameter <?xml encoding="utf-8"?>

Option 1. takes precedence over 2. I'm not sure where 3. fits in.

If you need to do any string processing prior to displaying the data, make sure you use the multibyte (mb_*) string functions. If you have Unicode data coming from other sources in other encodings, you'll need to use mb_convert_encoding.

like image 43
oggy Avatar answered Dec 12 '25 04:12

oggy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!