Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex to match 2 distinct delimiters

I'm trying to craft a regular expression that will match something like this:

[[uid::page name|page alias]]

for example:

[[nw::Home|Home page]]

The uid and page alias are both optional.

I want to allow the delimiters :: or | to appear only once, and only in the order shown. However, the character : should be allowed anywhere after the uid. Herein lies the problem.

The following regex works pretty well, except that it matches strings where :: appears twice, or in the wrong place:

regex = r'\[\[([\w]+::)?([^|\t\n\r\f\v]+)(\|[^|\t\n\r\f\v]+)?\]\]'
re.match(regex, '[[Home]]') # matches, good
re.match(regex, '[[Home|Home page]]') # matches, good
re.match(regex, '[[nw::Home]]') # matches, good
re.match(regex, '[[nw::Home|Home page]]') # matches, good
re.match(regex, '[[nw|Home|Home page]]') # doesn't match, good
re.match(regex, '[[nw|Home::Home page]]') # matches, bad
re.match(regex, '[[nw::Home::Home page]]') # matches, bad

I have read all about negative lookahead and lookbehind expressions but I can't figure out how to apply them in this case. Any suggestions would be appreciated.

Edit: I would also like to know how to prevent the delimiters from being included in the match results as shown here:

('nw::', 'Home', '|Home page')

like image 362
nw. Avatar asked Dec 03 '25 06:12

nw.


1 Answers

If I understand your needs correctly, you could use this:

\[\[(?:(?<uid>\w+)::)?(?!.*::)(?<page>[^|\t\n\r\f\v]+)(?:\|(?<alias>[^|\t\n\r\f\v]+))?\]\]
                      ^^^^^^^^

See here for a demo. I added a negative lookahead after the uid capture.

I have given names to the captured groups but if you don't want them, that's the one without named captured groups:

\[\[(?:(\w+)::)?(?!.*::)([^|\t\n\r\f\v]+)(?:\|([^|\t\n\r\f\v]+))?\]\]
like image 92
Jerry Avatar answered Dec 05 '25 20:12

Jerry