I have a lot of strings like this:
2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0
I want to extract the substring that lays right after the last "/"
and ends with "_"
:
556662
I have found out how to extract: /01/01/07/556662
by using the following regex: (\/)(.*?)(?=\_)
Please advise how can I capture the right group.
You may use
x <- "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0"
regmatches(x, regexpr(".*/\\K[^_]+", x, perl=TRUE))
## [1] "556662"
See the regex and R demo.
Here, the regex matches and outputs the first substring that matches
.*/
- any 0+ chars as many as possible up to the last /
\K
- omits this part from the match [^_]+
- puts 1 or more chars other than _
into the match value.Or, a sub
solution:
sub(".*/([^_]+).*", "\\1", x)
See the regex demo.
Here, it is similar to the previous one, but the 1 or more chars other than _
are captured into Group 1 (\1
in the replacement pattern) and the trailing .*
make sure the whole input is matched (and consumed, ready to be replaced).
Alternative non-base R solutions
If you can afford or prefer to work with stringi
, you may use
library(stringi)
stri_match_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", ".*/([^_]+)")[,2]
## [1] "556662"
This will match a string up to the last /
and will capture into Group 1 (that you access in Column 2 using [,2]
) 1 or more chars other than _
.
Or
stri_extract_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", "(?<=/)[^_/]+")
## => [1] "556662"
This will extract the last match of a string that consists of 1 or more chars other than _
and /
after a /
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With