Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match word surrounded by non-alphanumeric characters

Tags:

java

string

regex

I want to match and find index of word either surrounded by space or special characters. For example:

To find: test
this is input test : True
this is#input_ : True
this isinput : False
thisisinputtest: False
this @test is right: True.

How do I match this and find index. My current regex fails: (?i)[^a-zA-Z0-9]test[^a-zA-Z0-9]

like image 924
Maxsteel Avatar asked Oct 25 '25 05:10

Maxsteel


1 Answers

I think what you need to use lookarounds in your case:

(?<!\p{Alnum})test(?!\p{Alnum})

The negative lookbehind (?<!\p{Alnum}) will fail the match if there is an alphanumeric char present to the left of the test, and the negative lookahead (?!\p{Alnum}) will fail the match if there is an alphanumeric char right after test.

See the testing screenshot:

enter image description here

Java demo:

String str = "this is#test_ :";
Pattern ptrn = Pattern.compile("(?<!\\p{Alnum})test(?!\\p{Alnum})");
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.start());
}

Alternative way: match and capture the search word, and print the start position of the 1st capturing group:

Pattern ptrn = Pattern.compile("\\P{Alnum}(test)\\P{Alnum}");
...
System.out.println(matcher.start(1));

See this Java demo

NOTE that in this scenario, the \P{Alnum} is a consuming pattern, and in some edge cases, test might not get matched.

like image 62
Wiktor Stribiżew Avatar answered Oct 26 '25 20:10

Wiktor Stribiżew