I am working on some free text for that I need to do some data cleaning, I have a question (out of many, which I will ask later I am sure):
I need to replace the following combinations:
[ ; ] (space before and after the punctuation)
[;] (no space before and after the punctuation)
[ ;] (only space before the punctuation)
to
[; ] (only space after the punctuation)
...where the punctuation can be one of [;:,.]. How can I do this with a regex?
A possible expression would be:
\s?([;:,.])\s?
and depending on the programming language or tool you are using, you have to use $1, \\1 or \1 for the backreference and the replacement would be e.g. $1 (there is a space after 1).
Explanation:
\s? - match at most one whitespace character
(...) - capture group, storing the matched characters in a reference
[...] - character class, matching one of the characters inside
References: character class, capture group, quantifier
But again: The expression can differ, depending on the tool/language you are using. E.g. a similar expression for sed would look like:
/ *\([;:,.]\) */\1 /
but this would also trim the spaces around the punctuation (there is probably a better way, but I'm not so familiar with sed).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With