Substitute based on regex [duplicate]

Question

relatively new to R, need help with applying a regex-based substitution. I have a data frame in one column of which I have a sequence of digits (my values of interest) followed by a string of all sorts of characters. Example:

4623(randomcharacters)

I need to remove everything after the initial digits to continue working with the values. My idea was to use gsub to remove the non-digit characters by positive lookbehind. The code I have is:

sub_function <- function() {
  gsub("?<=[[:digit:]].", " ", fixed = T)
}


data_frame$`x` <- data_known$`x` %>% 
  sapply(sub_function)

But I then get the error:

Error in FUN(X[[i]], ...) : unused argument (X[[i]])

Any help would be greatly appreciated!

Rui Barradas · Accepted Answer

Here is a base R function.
It uses sub, not gsub, since there will be only one substitution. And there's no need for look behind, the meta-character ^ marks the beginning of the string, followed by an optional minus sign, followed by at least one digit. Everything else is discarded.

sub_function <- function(x){
  sub("(^-*[[:digit:]]+).*", "\1", x)
}

data <- data.frame(x = c("4623(randomcharacters)", "-4623(randomcharacters)"))

sub_function(data$x)
#[1] "4623"  "-4623"

Edit

With this simple modification the function returns a numeric vector.

sub_function <- function(x){
  y <- sub("(^-*[[:digit:]]+).*", "\1", x)
  as.numeric(y)
}

Rory S · Answer

There are a few ways to accomplish this, but I like using functions from {tidyverse}:

library(tidyverse)

# Create some dummy data
df <- tibble(targetcol = c("4658(randomcharacters)", "5847(randomcharacters)", "4958(randomcharacters)"))

df <- mutate(df, just_digits = str_extract(targetcol, pattern = "^[[:digit:]]+"))

Output (contents of df):

  targetcol              just_digits
  <chr>                  <chr>      
1 4658(randomcharacters) 4658       
2 5847(randomcharacters) 5847       
3 4958(randomcharacters) 4958

Substitute based on regex [duplicate]

Tags:

string

regex

r

regex-lookarounds

Nuramon

2 Answers

Edit

Rui Barradas

Rory S

Recent Activity

Donate For Us

Substitute based on regex [duplicate]

Tags:

string

regex

r

regex-lookarounds

Nuramon

2 Answers

Edit

Rui Barradas

Rory S

Related questions

Recent Activity

Donate For Us