Python: using re.sub to replace multiple substring multiple times

Question

I am trying to correct a text that has some very typical scanning errors (l mistaken for I and vice-versa). Basically I would like to have the replacement string in re.sub to depend on the number of times the 'I' is detected, something like that:

re.sub("(\w+)(I+)(\w*)", "\g<1>l+\g<3>", "I am stiII here.")

What's the best way to achieve this?

re.sub("(\w+)(I+)(\w*)", "\g<1>l+\g<3>", "I am stiII here.")

What's the best way to achieve this?

DNS · Accepted Answer

Pass a function as the replacement string, as described in the docs. Your function can identify the mistake and create the best substitution based on that.

def replacement(match):
    if "I" in match.group(2):
        return match.group(1) + "l" * len(match.group(2)) + match.group(3)
    # Add additional cases here and as ORs in your regex

re.sub(r"(\w+)(II+)(\w*)", replacement, "I am stiII here.")
>>> I am still here.

(note that I modified your regex so the repeated Is would appear in one group.)

georg · Answer

You can use a lookaround to replace only Is followed by or preceded by another I:

print re.sub("(?<=I)I|I(?=I)", "l", "I am stiII here.")

Python: using re.sub to replace multiple substring multiple times

Tags:

python

replace

Anne L.

2 Answers

DNS

georg

Recent Activity

Donate For Us

Python: using re.sub to replace multiple substring multiple times

Tags:

python

replace

Anne L.

2 Answers

DNS

georg

Related questions

Recent Activity

Donate For Us