As I get it \p{L} include all letters from Unicode symbols, \p{Alpha} is slightly the same but only for Latin letters(ASCII). At my work I have 'A' latin and 'A' cyrillic, and \p{Alpha} in old java code don't match cyrillic symbols as letters. As I test it the \p{L} is solution for me. Can you folks give me some advice for this situation and what i shoud use in java code? On this page http://www.regular-expressions.info/posixbrackets.html use \p{Alpha} for java code.
Actually, \p{Alpha} is a POSIX character class implementation that will match extended characters only when used in combination with UNICODE_CHARACTER_CLASS (or (?U) flag), while \p{L} will always match all Unicode letters from the BMP plane. Note you can write \p{L} as \pL or \p{IsL}.
See more reference details:
Both
\p{L}and\p{IsL}denote the category of Unicode letters.
POSIX character classes (US-ASCII only)\p{Lower}A lower-case alphabetic character:[a-z]\p{Upper}An upper-case alphabetic character:[A-Z]\p{Alpha}An alphabetic character:[\p{Lower}\p{Upper}]
Have a look at the following demo:
String l = "Abc";
String c = "Абв";
System.out.println(l.matches("\\p{Alpha}+")); // => true
System.out.println(c.matches("\\p{Alpha}+")); // => false
System.out.println(c.matches("(?U)\\p{Alpha}+")); // => true
System.out.println(l.matches("\\p{L}+")); // => true
System.out.println(c.matches("\\p{L}+")); // => true
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With