Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: using re.sub to replace multiple substring multiple times

Tags:

python

replace

I am trying to correct a text that has some very typical scanning errors (l mistaken for I and vice-versa). Basically I would like to have the replacement string in re.sub to depend on the number of times the 'I' is detected, something like that:

re.sub("(\w+)(I+)(\w*)", "\g<1>l+\g<3>", "I am stiII here.")

What's the best way to achieve this?

like image 909
Anne L. Avatar asked Mar 18 '26 00:03

Anne L.


2 Answers

Pass a function as the replacement string, as described in the docs. Your function can identify the mistake and create the best substitution based on that.

def replacement(match):
    if "I" in match.group(2):
        return match.group(1) + "l" * len(match.group(2)) + match.group(3)
    # Add additional cases here and as ORs in your regex

re.sub(r"(\w+)(II+)(\w*)", replacement, "I am stiII here.")
>>> I am still here.

(note that I modified your regex so the repeated Is would appear in one group.)

like image 94
DNS Avatar answered Mar 19 '26 12:03

DNS


You can use a lookaround to replace only Is followed by or preceded by another I:

print re.sub("(?<=I)I|I(?=I)", "l", "I am stiII here.")
like image 25
georg Avatar answered Mar 19 '26 12:03

georg