regular expression match issue in Python

Question

For input string, want to match text which starts with {(P) and ends with (P)}, and I just want to match the parts in the middle. Wondering if we can write one regular expression to resolve this issue?

For example, in the following example, for the input string, I want to retrieve hello world part. Using Python 2.7.

python {(P)hello world(P)} java

Psidom · Accepted Answer

You can try {$P$(.*)$P$}, and use parenthesis in the pattern to capture everything between {(P) and (P)}:

import re
re.findall(r'{$P$(.*)$P$}', "python {(P)hello world(P)} java")

# ['hello world']

.* also matches unicode characters, for example:

import re
str1 = "python {(P)£1,073,142.68(P)} java"
str2 = re.findall(r'{$P$(.*)$P$}', str1)[0]

str2
# '\xc2\xa31,073,142.68'

print str2
# £1,073,142.68

Steven Doggart · Answer

You can use positive look-arounds to ensure that it only matches if the text is preceded and followed by the start and end tags. For instance, you could use this pattern:

(?<={$P$).*?(?=$P$})

See the demo.

(?<={$P$) - Look-behind expression stating that a match must be preceded by {(P).
.*? - Matches all text between the start and end tags. The ? makes the star lazy (i.e. non-greedy). That means it will match as little as possible.
(?=$P$}) - Look-ahead expression stating that a match must be followed by (P)}.

For what it's worth, lazy patterns are technically less efficient, so if you know that there will be no ( characters in the match, it would be better to use a negative character class:

(?<={$P$)[^(]*(?=$P$})

regular expression match issue in Python

Tags:

python

regex

python-2.7

Lin Ma

2 Answers

Psidom

Steven Doggart

Recent Activity

Donate For Us

regular expression match issue in Python

Tags:

python

regex

python-2.7

Lin Ma

2 Answers

Psidom

Steven Doggart

Related questions

Recent Activity

Donate For Us