Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing multiple regex patterns together

I have a long string where I want to replace dozens of regex expressions so I created a dictionary like this:

replacements = { r'\spunt(?!\s*komma)' : r".",
                 r'punt komma' : r",",
                 r'(?<!punt )komma' : r",",
                 "paragraaf" : "\n\n" }

The above dictionary is a little selection.

How could i apply this to a document of strings? An example string:

text = ""a punt komma is in this case not a komma and thats it punt"

I tried something like this:

import re 

def multiple_replace(dict, text):
  # Create a regular expression  from the dictionary keys
  regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))

  # For each match, look-up corresponding value in dictionary
  return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text) 

if __name__ == "__main__": 

  text = "Larry Wall is the creator of Perl"

  dict = {
    "Larry Wall" : "Guido van Rossum",
    "creator" : "Benevolent Dictator for Life",
    "Perl" : "Python",
  } 

  print(multiple_replace(dict, text))

But this works only on string replacement and not a regex pattern like my dictionary.

like image 504
Geveze Avatar asked Sep 05 '25 03:09

Geveze


1 Answers

Iterate your dictionary, then make a substitution using each key, value pair:

replacements = { r'\spunt(?!\s*komma)' : r".",
                 r'punt komma' : r",",
                 r'(?<!punt )komma' : r",",
                 "paragraaf" : "\n\n" }

text = "a punt komma is in this case not a komma and thats it punt"
print(text)

for key, value in replacements.items():
    text = re.sub(key, value, text)

print(text)

This outputs:

a punt komma is in this case not a komma and thats it punt
a , is in this case not a , and thats it.

Note that you probably should be word boundaries \b around each key regex term, to avoid matching an unintentional substring.

like image 146
Tim Biegeleisen Avatar answered Sep 07 '25 20:09

Tim Biegeleisen