Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching Latin Letters, including Serbo-Croatian characters

I am trying to write a regex (regular expression) in PHP to match all Latin letters, including those specific to Serbo-Croatian, such as "ćčđšž"

Here is my code:

public function alpha_space( $str ) 
{
    return ( ! preg_match( "/^([-a-z0-9_ ])+$/i", $str ) ) ? FALSE : TRUE;
}

How should I modify this snippet so that the regex applies as well to Serbo-Croatian letters?

Thank you for any thoughts you wish to offer.

like image 563
Admir Husić Avatar asked Jan 31 '26 21:01

Admir Husić


2 Answers

These are members of the Unicode Latin Extended A block, which goes from 0x0100 to 0x017F. You can limit characters by code point by using the u flag:

$test = "ćčđšž";
$start = "100";
$finish = "17f";
$pattern = "/^[\x{{$start}}-\x{{$finish}}]*$/u";
$result = preg_match($pattern, $test);
var_dump($result);

So extending this to your original pattern would look something like this:

$pattern = "/^[-a-z0-9_ \x{100}-\x{17f}]+$/ui";
like image 105
miken32 Avatar answered Feb 03 '26 09:02

miken32


You can use the regex \p{L} to match any Unicode letter.

This changes your regex to: ^([-\p{L}0-9_ ])+$

public function alpha_space($str) 
{
return ( ! preg_match("/^([-\p{L}0-9_ ])+$/i", $str)) ? FALSE : TRUE;
}

Here's a fiddle.

like image 26
Daniel Avatar answered Feb 03 '26 09:02

Daniel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!