Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex to match the FIRST repetition of a digit

Tags:

python

regex

Examples:

  1. For 0123123123, 1 should be matched since the 2nd 1 appears before the repetition of any other digit.
  2. For 01234554321, 5 should be matched since the 2nd 5 appears before the repetition of any other digit.

Some regexes that I have tried:

  1. The below works for the 1st but not the 2nd example. It matches 1 instead because 1 is the first digit that appears in the string which is subsequently repeated.
import re
m = re.search(r"(\d).*?\1", string)
print(m.group(1))
  1. The below works for the 2nd but not the 1st example. It matches 3 instead - in particular the 2nd and 3rd occurrence of the digit. I do not know why it behaves that way.
import re
m = re.search(r"(\d)(?!(\d).*?\2).*?\1", string)
print(m.group(1))
like image 917
plsssineedheeeelp Avatar asked Nov 25 '25 04:11

plsssineedheeeelp


2 Answers

Regex may not be the right tool for the task. While the regex in @CasimiretHippolyte's answer works, it is rather inefficient having to scan the entire rest of the string for each character in the string until a matching character is found, costing an average time complexity of O(n ^ 2).

A more efficient approach with a linear time complexity would be to use a set to keep track of characters that have been encountered, and return the first character already added to the set:

def first_repeating_digit(string):
    seen = set()
    for digit in filter(str.isdigit, string):
        if digit in seen:
            return digit
        seen.add(digit)
    raise ValueError('No repeating digit found.')

so that:

for s in '0123123123', '01234554321':
    print(s, first_repeating_digit(s))

outputs:

0123123123 1 
01234554321 5

Demo here

Benchmark test result:

blhsing 0123123123 1.2911038296297193
blhsing 01234554321 1.3835312821902335
CasimiretHippolyte 0123123123 3.6279739402234554
CasimiretHippolyte 01234554321 4.1985282939858735
like image 56
blhsing Avatar answered Nov 26 '25 17:11

blhsing


One idea: capture the end of the string and add it in the negative lookahead (group 2 here):

(\d)(?=.*?\1(.*))(?!.*?(\d).*?\3.+?\2$)

This way you can control where the subpattern .*?(\d).*?\3 in the negative lookahead ends. If .+?\2$ succeeds, that means there's an other digit that is repeated before the one in group 1.

I anchored the pattern for the regex101 demo with ^.*?, but you don't need to do that with the re.search method.


Other way: reverse the string and find the last repeated digit:

re.search(r'^.*(\d).*?\1', string[::-1]).group(1)
like image 25
Casimir et Hippolyte Avatar answered Nov 26 '25 17:11

Casimir et Hippolyte