Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex match only if multiple patterns found (python)

Tags:

python

regex

I'm trying to extract data from sentences such as:

"monthly payment of 525 and 5000 drive off"

using a python regex search function: re.search()

My regex query string is as follows for down payment:

match1 = "(?P<down_payment>\d+)\s*(|\$|dollars*|money)*\s*" + \
         "(down|drive(\s|-)*off|due\s*at\s*signing|drive\s*-*\s*off)*"

My problem is that it matches the wrong numerical value as down payment, it gets both 525, and 5000.

How can I improve my regex string such that it only matches an element if another element is successfully matched as well?

In this case, for example, both 5000 and drive-off matched so we can extract 5000 as down_payment, but 525 did not match with the any down payment values, so it should not even consider the 525.

Clearer explanation here

like image 670
NutellaAddict Avatar asked Jun 03 '26 05:06

NutellaAddict


1 Answers

The point is that you want to match a sequence of patterns. In order to make sure the trailing patterns are taken into account, they cannot be all optional. Look, \s*, (|\$|dollars*|money)*, \s*, (down|drive(\s|-)*off|due\s*at\s*signing|drive\s*-*\s*off)* can match empty strings.

I suggest removing the final * quantifier to match exactly one occurrence of the pattern:

(?P<down_payment>\d+)\s*(?:\$|dollars*|money)?\s*(down|drive[\s-]*off|due\s*at\s*signing|drive\s*-*\s*off)

See the regex demo

Also note that I contracted a (\s|-) group into a character class [\s-] as you only alternate single char patterns, and also turned (|\$|dollars*|money)* into a non-capturing optional group (?:\$|dollars*|money)? that matches just 1 or 0 occurrences of $, dollar(s) or money.

like image 112
Wiktor Stribiżew Avatar answered Jun 06 '26 07:06

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!