I have the following code:
header('Content-type: text/html; charset=utf-8');
$str = 'áá áá';
echo $str."\n";
echo mb_convert_case($str, MB_CASE_TITLE)."\n";
echo bin2hex($str)."\n";
echo bin2hex(mb_convert_case($str, MB_CASE_TITLE))."\n";
Using PHP 5.2.2, I get the following output:
áá áá
áá áá
c3a1c3a120c3a1c3a1
c3a1c3a120c3a1c3a1
Using PHP 5.4.3, I get this:
áá áá
á� á�
c3a1c3a120c3a1c3a1
c3a1e3a120c3a1e3a1
My expected output in both cases would have been:
áá áá
Áá Áá
c3a1c3a120c3a1c3a1
c381c3a120c381c3a1
So I have two questions:
Either pass in $encoding to every call to mb_ functions, or set:
mb_internal_encoding("UTF-8");
to make sure PHP knows what encoding you're working with. Otherwise the encoding comes from php.ini, or a default ISO-8859-1 if not included there either.
So your 5.4 installation is defaulting to ISO-8859-1 and so lowercasing the lead byte of the UTF-8 sequence, breaking it. The same happens for me in 5.2, so maybe there's something else about your 5.2 installation that's different - maybe internal_encoding in the ini being set to something else without letters in those byte positions?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With