I am trying to write a regex (regular expression) in PHP to match all Latin letters, including those specific to Serbo-Croatian, such as "ćčđšž"
Here is my code:
public function alpha_space( $str )
{
return ( ! preg_match( "/^([-a-z0-9_ ])+$/i", $str ) ) ? FALSE : TRUE;
}
How should I modify this snippet so that the regex applies as well to Serbo-Croatian letters?
Thank you for any thoughts you wish to offer.
These are members of the Unicode Latin Extended A block, which goes from 0x0100 to 0x017F. You can limit characters by code point by using the u flag:
$test = "ćčđšž";
$start = "100";
$finish = "17f";
$pattern = "/^[\x{{$start}}-\x{{$finish}}]*$/u";
$result = preg_match($pattern, $test);
var_dump($result);
So extending this to your original pattern would look something like this:
$pattern = "/^[-a-z0-9_ \x{100}-\x{17f}]+$/ui";
You can use the regex \p{L} to match any Unicode letter.
This changes your regex to: ^([-\p{L}0-9_ ])+$
public function alpha_space($str)
{
return ( ! preg_match("/^([-\p{L}0-9_ ])+$/i", $str)) ? FALSE : TRUE;
}
Here's a fiddle.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With