Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace two or more underscore using python?

For machine learning porpuoses, I need to "clean" some text that I am extracting, so I've tried this:

texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
texto = texto.replace(r"_{2,}"," ")
print(texto)

But the result was not the expected:

sdf sdf s _ sfsf sdfs _________ sfsdf

I would like:

sdf sdf s _ sfsf sdfs  sfsdf
like image 683
celsowm Avatar asked Oct 24 '25 14:10

celsowm


1 Answers

You could use

import re
texto = "sdf sdf s _ sfsf sdfs _________ sfsdf"
rx = re.compile(r'_{2,}')

texto = rx.sub('', texto)

Which yields

sdf sdf s _ sfsf sdfs  sfsdf

If you want to replace the trailing space(s) as well, change the expression to

rx = re.compile(r'_{2,}\s*')

Then the output would be

sdf sdf s _ sfsf sdfs sfsdf
#                   ^^^
like image 118
Jan Avatar answered Oct 26 '25 03:10

Jan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!