Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regular expression match issue in Python

For input string, want to match text which starts with {(P) and ends with (P)}, and I just want to match the parts in the middle. Wondering if we can write one regular expression to resolve this issue?

For example, in the following example, for the input string, I want to retrieve hello world part. Using Python 2.7.

python {(P)hello world(P)} java
like image 583
Lin Ma Avatar asked Feb 28 '26 16:02

Lin Ma


2 Answers

You can try {\(P\)(.*)\(P\)}, and use parenthesis in the pattern to capture everything between {(P) and (P)}:

import re
re.findall(r'{\(P\)(.*)\(P\)}', "python {(P)hello world(P)} java")

# ['hello world']

.* also matches unicode characters, for example:

import re
str1 = "python {(P)£1,073,142.68(P)} java"
str2 = re.findall(r'{\(P\)(.*)\(P\)}', str1)[0]

str2
# '\xc2\xa31,073,142.68'

print str2
# £1,073,142.68
like image 161
Psidom Avatar answered Mar 02 '26 04:03

Psidom


You can use positive look-arounds to ensure that it only matches if the text is preceded and followed by the start and end tags. For instance, you could use this pattern:

(?<={\(P\)).*?(?=\(P\)})

See the demo.

  • (?<={\(P\)) - Look-behind expression stating that a match must be preceded by {(P).
  • .*? - Matches all text between the start and end tags. The ? makes the star lazy (i.e. non-greedy). That means it will match as little as possible.
  • (?=\(P\)}) - Look-ahead expression stating that a match must be followed by (P)}.

For what it's worth, lazy patterns are technically less efficient, so if you know that there will be no ( characters in the match, it would be better to use a negative character class:

(?<={\(P\))[^(]*(?=\(P\)})
like image 38
Steven Doggart Avatar answered Mar 02 '26 04:03

Steven Doggart



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!