Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R regex remove apostrophes NOT between letters

Tags:

regex

r

I'm able to remove all punctuation from a string while keeping apostrophes, but I'm now stuck on how to remove any apostrophes that are not between two letters.

str1 <- "I don't know 'how' to remove these ' things"

Should look like this:

"I don't know how to remove these things"
like image 875
pheeper Avatar asked Dec 08 '25 07:12

pheeper


1 Answers

You may use a regex approach:

str1 <- "I don't know 'how' to remove these ' things"
gsub("\\s*'\\B|\\B'\\s*", "", str1)

See this IDEONE demo and a regex demo.

The regex matches:

  • \\s*'\\B - 0+ whitespaces, ' and a non-word boundary
  • | - or
  • \\B'\\s* - a non-word boundary, ' and 0+ whitespaces

If you do not need to care about the extra whitespace that can remain after removing standalone ', you can use a PCRE regex like

\b'\b(*SKIP)(*F)|'

See the regex demo

Explanation:

  • \b'\b - match a ' in-between word characters
  • (*SKIP)(*F) - and omit the match
  • | - or match...
  • ' - an apostrophe in another context.

See an IDEONE demo:

gsub("\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)

To account for apostrophes in-between Unicode letters, add (*UTF)(*UCP) flags at the start of the pattern and use a perl=TRUE argument:

gsub("(*UTF)(*UCP)\\s*'\\B|\\B'\\s*", "", str1, perl=TRUE)
      ^^^^^^^^^^^^                              ^^^^^^^^^     

Or

gsub("(*UTF)(*UCP)\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE) 
      ^^^^^^^^^^^^                                 

See another IDEONE demo

like image 71
Wiktor Stribiżew Avatar answered Dec 09 '25 19:12

Wiktor Stribiżew