Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace the whitespace around certain characters?

Tags:

regex

I am working on some free text for that I need to do some data cleaning, I have a question (out of many, which I will ask later I am sure):

I need to replace the following combinations:

[ ; ] (space before and after the punctuation)

[;] (no space before and after the punctuation)

[ ;] (only space before the punctuation)

to

[; ] (only space after the punctuation)

...where the punctuation can be one of [;:,.]. How can I do this with a regex?

like image 297
lokheart Avatar asked Dec 05 '25 08:12

lokheart


1 Answers

A possible expression would be:

\s?([;:,.])\s?

and depending on the programming language or tool you are using, you have to use $1, \\1 or \1 for the backreference and the replacement would be e.g. $1 (there is a space after 1).

Explanation:

\s?      - match at most one whitespace character
 (...)   - capture group, storing the matched characters in a reference
  [...]  - character class, matching one of the characters inside

References: character class, capture group, quantifier

But again: The expression can differ, depending on the tool/language you are using. E.g. a similar expression for sed would look like:

/ *\([;:,.]\) */\1 /

but this would also trim the spaces around the punctuation (there is probably a better way, but I'm not so familiar with sed).

like image 94
Felix Kling Avatar answered Dec 07 '25 21:12

Felix Kling