I'm having problem with spliting word with utf-8 polish chars.
I've been checking php docs for str_split, but there's no parameter to set charset.
I've word: "mała" And i have to split it by letters to wrap each single letter with span and return html string in result.
Result of str_split('mała'):
array:5 [
0 => "m"
1 => "a"
2 => b"Å"
3 => b"‚"
4 => "a"
]
json_last_error_message() returns "Malformed UTF-8 characters, possibly incorrectly encoded" error, so as i thought it's problem related to polish letters, but i can't find a way to set str_split charset.
Here's prepared array to be JSON encoded:
array:2 [
"pieces" => array:6 [
0 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">m</span><span class="dropable">a</span>"
1 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">s</span><span class="dropable">a</span>"
2 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">p</span><span class="dropable">a</span>"
3 => b"<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">Å</span><span class="dropable">‚</span><span class="dropable">a</span>"
4 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">c</span><span class="dropable">a</span>"
5 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">t</span><span class="dropable">a</span>"
]
"engine" => "Wstepne"
]
Index number 3 contains weird "b" before string and these malformed characters.
Code to generate these strings is:
$htmlString = '';
foreach(str_split($piece) as $key => $letter){
$htmlString .= '<span class="dropable">'.$letter.'</span>';
}
return $htmlString;
Tried to use utf8_encode on $letter, it fixed problem with b in front of string, but still it creates 2 spans:
3 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">Å</span><span class="dropable">‚</span><span class="dropable">a</span>"
Any more ideas?
Thanks for help
str_split
works on byte level and not on character level (despite its name). So in fact you're splitting mała
along its bytes and not along its characters. That's why you're getting an array of five items instead of four. Index 2 and 3 together form the UTF-8 encoding of ł
.
You need to use either the mbstring
or the iconv
extension to split your string manually.
$str = 'mała';
$len = mb_strlen($str, 'UTF-8');
$result = [];
for ($i = 0; $i < $len; $i++) {
$result[] = mb_substr($str, $i, 1, 'UTF-8');
}
var_dump($result);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With