Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP str_split and UTF8 polish characters [duplicate]

Tags:

php

utf-8

I'm having problem with spliting word with utf-8 polish chars.

I've been checking php docs for str_split, but there's no parameter to set charset.

I've word: "mała" And i have to split it by letters to wrap each single letter with span and return html string in result.

Result of str_split('mała'):

array:5 [
  0 => "m"
  1 => "a"
  2 => b"Å"
  3 => b"‚"
  4 => "a"
]

json_last_error_message() returns "Malformed UTF-8 characters, possibly incorrectly encoded" error, so as i thought it's problem related to polish letters, but i can't find a way to set str_split charset.

Here's prepared array to be JSON encoded:

array:2 [
  "pieces" => array:6 [
    0 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">m</span><span class="dropable">a</span>"
    1 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">s</span><span class="dropable">a</span>"
    2 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">p</span><span class="dropable">a</span>"
    3 => b"<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">Å</span><span class="dropable">‚</span><span class="dropable">a</span>"
    4 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">c</span><span class="dropable">a</span>"
    5 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">t</span><span class="dropable">a</span>"
  ]
  "engine" => "Wstepne"
]

Index number 3 contains weird "b" before string and these malformed characters.

Code to generate these strings is:

$htmlString = '';
foreach(str_split($piece) as $key => $letter){ 
    $htmlString .= '<span class="dropable">'.$letter.'</span>';
}
return $htmlString;

Tried to use utf8_encode on $letter, it fixed problem with b in front of string, but still it creates 2 spans:

3 => "<span class="dropable">m</span><span class="dropable">a</span><span class="dropable">Å</span><span class="dropable">‚</span><span class="dropable">a</span>"

Any more ideas?

Thanks for help

like image 416
Michał Staniewski Avatar asked Sep 07 '25 12:09

Michał Staniewski


1 Answers

str_split works on byte level and not on character level (despite its name). So in fact you're splitting mała along its bytes and not along its characters. That's why you're getting an array of five items instead of four. Index 2 and 3 together form the UTF-8 encoding of ł.

You need to use either the mbstring or the iconv extension to split your string manually.

$str = 'mała';
$len = mb_strlen($str, 'UTF-8');
$result = [];
for ($i = 0; $i < $len; $i++) {
    $result[] = mb_substr($str, $i, 1, 'UTF-8');
}
var_dump($result);
like image 182
Stefan Gehrig Avatar answered Sep 10 '25 04:09

Stefan Gehrig