Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove string after first number using r regex

Tags:

regex

r

How to remove everything contained after the first number of a string?

x <- c("Hubert 208 apt 1", "Mass Av 300, block 3")

After this question, I succeeded in removing everything before the first number, the first number inclusive:

gsub( "^\\D*\\d+", "", x )
[1] " apt 1"    ", block 3"

But the desired output looks like this:

[1] "Hubert 208"     "Mass Av 300"
> 
like image 419
NBK Avatar asked Oct 23 '25 14:10

NBK


1 Answers

In the OP's current code, a minor change can make it work i.e. to capture the matching pattern as a group ((...)) and replace with backreference (\\1)

sub("^(\\D*\\d+).*", "\\1", x)
#[1] "Hubert 208"  "Mass Av 300"

Here, the pattern from OP implies ("^\\D*\\d+") - zero or more characters that are not a digit (\\D*) from the start (^) of the string, followed by one or more digits (\\d+) and this is captured as a group with parens ((...)).

Also, instead of gsub (global substitution) we need only sub as we need to match only a single instance (from the beginning)

like image 145
akrun Avatar answered Oct 25 '25 05:10

akrun