Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

insert space between regex match

Tags:

python

regex

I want to un-join typos in my string by locating them using regex and insert a space character between the matched expression.

I tried the solution to a similar question ... but it did not work for me -(Insert space between characters regex); solution- to use the replace string as '\1 \2' in re.sub .

import re

corpus = ''' 
This is my corpus1a.I am looking to convert it into a 2corpus 2b.
'''

clean = re.compile('\.[^(\d,\s)]')
corpus = re.sub(clean,' ', corpus)

clean2 = re.compile('\d+[^(\d,\s,\.)]')
corpus = re.sub(clean2,'\1 \2', corpus)

EXPECTED OUTPUT:

This is my corpus 1 a. I am looking to convert it into a 2 corpus 2 b.
like image 423
sheth7 Avatar asked May 16 '26 20:05

sheth7


1 Answers

You need to put the capture group parentheses around the patterns that match each string that you want to copy to the result.

There's also no need to use + after \d. You only need to match the last digit of the number.

clean = re.compile(r'(\d)([^\d,\s])')
corpus = re.sub(clean, r'\1 \2', corpus)

DEMO

like image 60
Barmar Avatar answered May 18 '26 10:05

Barmar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!