For machine learning porpuoses, I need to "clean" some text that I am extracting, so I've tried this:
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)
But the result was not the expected:
sdf sdf s _ sfsf sdfs _________ sfsdf
I would like:
sdf sdf s _ sfsf sdfs sfsdf
You could use
import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')
texto = rx.sub('', texto)
Which yields
sdf sdf s _ sfsf sdfs sfsdf
If you want to replace the trailing space(s) as well, change the expression to
rx = re.compile(r'_{2,}\s*')
Then the output would be
sdf sdf s _ sfsf sdfs sfsdf
# ^^^
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With