Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Particular string split in R

Tags:

string

r

I'd like to split a text string in R but I want to take some aspects into consideration. For instance, if the string has a dot . or a !, I want my function to take them as elements of my split list. Below an example of what I want to get.

  mytext="Caracas. Montevideo! Chicago."  
  split= "Caracas", "." ,"Montevideo", "!", "Chicago", "."    

My current approach consists in replacing previously with the built-in R function gsub the "." by " . " and then I use strsplit function as well.

  mytext=gsub("\\."," .",mytext)
  mytext=gsub("\\!"," !",mytext)
  unlist(strsplit(mytext,split=' '))

So, my question is: is there another way of implementing this by configuring the parameters for the strsplit function or another approach you coonsider could be more efficient.

Any help or suggestion is appreciated.

like image 850
nhern121 Avatar asked Jun 26 '26 14:06

nhern121


1 Answers

Look-ahead is what you're looking for here:

strsplit(mytext, split = "(?=(\\.|!))", perl = TRUE)
#[[1]]
#[1] "Caracas"     "."           " Montevideo" "!"           " Chicago"    "." 
like image 61
eddi Avatar answered Jun 28 '26 03:06

eddi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!