Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't know how to use lookarounds properly to achieve my Regex match

I'm writing a perl script and part of it requires that I match all occurrences of a certain pattern in a string. Naturally, a regular expression seems like it would be powerful enough, but I just can't get it right for this particular string.

A hypothetical example of the type of text the regex might be applied to would be:

1cat;2dog;!3monkey;!4horse;

As you can see, several data entries (1cat, 2dog, etc.) are present in the line, delimited by semicolons. The beginning of the line contains no semicolon, but the end does. I want to be able to match all the stuff which hasn't been not'ed by the !. In the above example, 1cat and 2dog would be matched and returned in list context, while 3monkey and 4horse would not.

What I have tried to do so far is use negative lookbehinds to notice only the entries without a !. Something like this:

m/(?<!\!)(\w+)\;/g

However, doesn't work because the for every !'ed entry, the regex just matches what comes after it, up to the semicolon. In the example, 1cat and 2dog are captured, but then so are monkey and horse.

I feel like this is easily doable, but I'm new to regular expressions and I can't think of anything else.

like image 462
DDP Avatar asked Jan 25 '26 01:01

DDP


1 Answers

Throw a word boundary (\b) in there and you should be good:

(?<!!)\b(\w+);

As you could tell your negative lookbehind was working, but it would still match everything after the next character (horse from !4horse). A word boundary is a zero-width assertion, kind of like a conditional that doesn't match anything (like anchors ^ and $). It asserts for this: (^\w|\w\W|\W\w|\w$). In other words, anytime a word character ([a-zA-Z0-9_]) is next to the beginning/end of string or a non-word character.

like image 105
Sam Avatar answered Jan 26 '26 16:01

Sam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!