Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex negation - word parsing

I am trying to parse a phrase and exclude common words.

For instance in the phrase "as the world turns", I want to exclude the common words "as" and "the" and return only "world" and "turns".

(\w+(?!the|as))

Doesn't work. Feedback appreciated.

like image 981
Peter Avatar asked Feb 03 '26 14:02

Peter


1 Answers

The lookahead should come first:

(\b(?!(the|as)\b)\w+\b)

I have also added word boundaries to ensure that it only matches whole words otherwise it would fail to match the complete word "as" but it would successfully match the letter "s" of that word.

You might also want to consider what \w matches and if that meets your needs. If you are looking for words in English you probably are interested in letters but not digits and you may wish to include some punctuation characters that are excluded by \w, such as apostrophes. You could try something like this instead (Rubular):

/(\b(?!(?:the|as)\b)[a-z'-]+\b)/i

To match words more accurately in a human language you could consider using a natural language parsing library instead of regular expressions.

like image 113
Mark Byers Avatar answered Feb 05 '26 09:02

Mark Byers



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!