What is the simplest way of removing text on both left and right side of a given character/text in r?
I have an example of the following dataset:
a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx", "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")
My expected output is to remain with: Gakenke, Gatsibo and Rutsiro.
I know, I can breakdown this task and handle it using mutate() as the following:
a %>% mutate(a = str_remove(a, "C.+/"), a = str_remove(a,"_.+")).
My question now is which simple pattern can I pass to that mutate function to remain with my intended results: Gakenke, Gatsibo and Rutsiro.
Any help is much appreciated. thank you!
You can use
a = c("C:\\final docs with data/Gakenke_New_Sanitation.xlsx", "C:\\final docs with data/Gatsibo_New_Sanitation.xlsx", "C:\\final docs with data/Rutsiro_New_Sanitation.xlsx")
library(stringr)
str_remove_all(a, "^.*/|_.*")
## => [1] "Gakenke" "Gatsibo" "Rutsiro"
The stringr::str_remove_all removes all occurrences of the found pattern. ^.*/|_.* matches a string from the start till the last / and then from the _ till end of the string (note the string is assumed to have no line break chars).
A possible solution, based on stringr::str_extract and lookaround:
library(tidyverse)
a %>%
str_extract("(?<=data\\/).*(?=\\_New)")
#> [1] "Gakenke" "Gatsibo" "Rutsiro"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With