R regular expression: isolate parenthesized suffix

Question

I'm using regular expressions in R. I am trying to pick out parenthesized content that is at the end of some strings in a character vector. I'm able to find parenthesized content when it exists, but I'm failing to excluded non-parenthesized content in inputs that don't have parens.

Example:

> x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
> gsub("(.*?)($.*$)", "\2", x)
[1] "DECIMAL" "(14,5)"  "(1)"

The last 2 elements in output are correct, the first one is not. I want

c("", "(14,5)", "(1)")

The input can have anything, literally any word or number characters, before the parenthesized content.

Wiktor Stribiżew · Accepted Answer

You can use

sub("^.*?($.*$)?$", "\1", x, perl=TRUE)

See the regex demo. Details:

^ - start of string
.*? - any zero or more chars other than line break chars (since it is a PCRE regex, see perl=TRUE) as few as possible
($.*$)? - an optional Group 1: a (, then any zero or more chars other than line break chars, as many as possible, and then a )
$ - end of string.

See the R demo:

x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
sub("^.*?($.*$)?$", "\1", x, perl=TRUE)
## => [1] ""       "(14,5)" "(1)"

NOTE: perl=TRUE is very important in this case because the two parts in the regex have quantifiers of different greediness.

R regular expression: isolate parenthesized suffix

Tags:

regex

r

pauljohn32

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

R regular expression: isolate parenthesized suffix

Tags:

regex

r

pauljohn32

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us