Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Regex capture group?

Tags:

string

regex

r

I have a lot of strings like this:

2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0

I want to extract the substring that lays right after the last "/" and ends with "_":

556662

I have found out how to extract: /01/01/07/556662

by using the following regex: (\/)(.*?)(?=\_)

Please advise how can I capture the right group.

like image 447
SteveS Avatar asked Sep 01 '25 16:09

SteveS


1 Answers

You may use

x <- "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0"
regmatches(x, regexpr(".*/\\K[^_]+", x, perl=TRUE))
## [1] "556662"

See the regex and R demo.

Here, the regex matches and outputs the first substring that matches

  • .*/ - any 0+ chars as many as possible up to the last /
  • \K - omits this part from the match
  • [^_]+ - puts 1 or more chars other than _ into the match value.

Or, a sub solution:

sub(".*/([^_]+).*", "\\1", x)

See the regex demo.

Here, it is similar to the previous one, but the 1 or more chars other than _ are captured into Group 1 (\1 in the replacement pattern) and the trailing .* make sure the whole input is matched (and consumed, ready to be replaced).

Alternative non-base R solutions

If you can afford or prefer to work with stringi, you may use

library(stringi)
stri_match_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", ".*/([^_]+)")[,2]
## [1] "556662"

This will match a string up to the last / and will capture into Group 1 (that you access in Column 2 using [,2]) 1 or more chars other than _.

Or

stri_extract_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", "(?<=/)[^_/]+")
## => [1] "556662"

This will extract the last match of a string that consists of 1 or more chars other than _ and / after a /.

like image 188
Wiktor Stribiżew Avatar answered Sep 04 '25 04:09

Wiktor Stribiżew