I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.
mytext="Caracas. Montevideo! Chicago."
split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."
My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.
mytext=gsub("\\."," .",mytext)
mytext=gsub("\\!"," !",mytext)
unlist(strsplit(mytext,split=' '))
So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.
Any help or suggestion is appreciated.
Look-ahead is what you're looking for here:
strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas" "." " Montevideo" "!" " Chicago" "."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With