I have asked related questions HERE and HERE. I tried to generalize these answers but have failed.
Basically I have a string I want to split into words, numbers and any sort of punctuation, yet, I want to retain the apostrophes. Here is what I've tried and I'm so close (I think):
x <- "Raptors don't like robots! I'd pay $500.00 to rid them."
strsplit(x, "(\\s+)|(?=[[:punct:]])", perl = TRUE)
## [[1]]
##  [1] "Raptors" "don"     "'"       "t"       "like"    "robots"  "!"             
##  [8] ""   "I"   "'"    "d"  "pay"     "$"       "500"     "."       "00"      "to"         
## [20] "rid"   "them"    "."  
Here's what I'm after:
## [[1]]
##  [1] "Raptors" "don't"       "like"    "robots"  "!"       ""        "I'd"      
##  [8] "pay"     "$"       "500"   "."   "00"  "to"      "rid"     "them"    "."  
While I want a base solution I would like to see other solutions (I'm sure someone has a stringr solution) which makes the question more generalizable to others.
Note: R has a specific regex system. You will want to be familiar with R to answer this question.
You could use a negative lookahead (?!'):
strsplit(x, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)
#  [1] "Raptors" "don't"   "like"    "robots"  "!"       ""        "I'd"     "pay"     "$"       "500"     "."       "00"      "to"      "rid"     "them"    "."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With