I have a long string where I want to replace dozens of regex expressions so I created a dictionary like this:
replacements = { r'\spunt(?!\s*komma)' : r".",
r'punt komma' : r",",
r'(?<!punt )komma' : r",",
"paragraaf" : "\n\n" }
The above dictionary is a little selection.
How could i apply this to a document of strings? An example string:
text = ""a punt komma is in this case not a komma and thats it punt"
I tried something like this:
import re
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
if __name__ == "__main__":
text = "Larry Wall is the creator of Perl"
dict = {
"Larry Wall" : "Guido van Rossum",
"creator" : "Benevolent Dictator for Life",
"Perl" : "Python",
}
print(multiple_replace(dict, text))
But this works only on string replacement and not a regex pattern like my dictionary.
Iterate your dictionary, then make a substitution using each key, value pair:
replacements = { r'\spunt(?!\s*komma)' : r".",
r'punt komma' : r",",
r'(?<!punt )komma' : r",",
"paragraaf" : "\n\n" }
text = "a punt komma is in this case not a komma and thats it punt"
print(text)
for key, value in replacements.items():
text = re.sub(key, value, text)
print(text)
This outputs:
a punt komma is in this case not a komma and thats it punt
a , is in this case not a , and thats it.
Note that you probably should be word boundaries \b
around each key regex term, to avoid matching an unintentional substring.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With