Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cut an arabic string

Tags:

php

I have a string in the arabic language like:

على احمد يوسف

Now I need to cut this string and output it like:

...على احمد يو

I tried this function:

function short_name($str, $limit) {
    if ($limit < 3) {
        $limit = 3;
    }

    if (strlen($str) > $limit) {
        if (preg_match('/\p{Arabic}/u', $str)) {
            return substr($str, 0, $limit - 3) . '...';
        }
        else {
            return '...'.substr($str, 0, $limit - 3);
        }
    }
    else {
        return $str;
    }
}

The problem is that sometimes it displays a symbol like this at the end of the string:

...�على احمد يو

Why does this happen?

like image 458
Thirty 5Seconds Avatar asked Sep 03 '25 13:09

Thirty 5Seconds


1 Answers

The symbol displayed after the cut is the result of substr() cutting in the middle of a character, resulting in an invalid character.

You need to use Multibyte String Functions to handle arabic strings, such as mb_strlen() and mb_substr().

You also need to make sure the internal encoding for those functions is set to UTF-8. You can set this globally at the top of your script:

mb_internal_encoding('UTF-8');

Which leads to this:

  • strlen('على احمد يوسف') returns 24, the size in octets
  • mb_strlen('على احمد يوسف') returns 13, the size in characters

Note that mb_strlen('على احمد يوسف') would also return 24 if the internal encoding was still set to the default ISO-8859-1.

like image 57
spenibus Avatar answered Sep 05 '25 16:09

spenibus