I have a simple dataframe:
df <- data.frame(test = c("test_A_1_1.txt", "test_A_2_1.txt", "test_A_3_1.txt"), value = c(0.51, 0.52, 0.56))
test value
1 test_A_1_1.txt 0.51
2 test_A_2_1.txt 0.52
3 test_A_3_1.txt 0.56
Expected output
I would like to copy the numbers on the end of the string in column 1 and placed it in column three or four respectively, like this:
test value new new
1 test_A_1.txt 0.51 1 1
2 test_A_2.txt 0.52 2 1
3 test_A_3.txt 0.56 3 1
Attempt
Using the following code, I am able to extract the numbers from the string:
library(stringr)
as.numeric(str_extract_all("test_A_3.txt", "[0-9]+")[[1]])[1] # Extracts the first number
as.numeric(str_extract_all("test_A_3.txt", "[0-9]+")[[1]])[2] # Extracts the second number
I would like to apply this code on all the values of the first column:
library(tidyverse)
df %>% mutate(new = as.numeric(str_extract_all(df$test, "[0-9]+")[[1]])[1])
However, this lead to a column new, with only the number 1.
What am I doing wrong?
We can use parse_number from readr
library(dplyr)
library(purrr)
library(stringr)
df %>%
mutate(new = readr::parse_number(as.character(test)))
Regarding the OP's issue, it is selecting only the first list element ([[1]]) from the str_extract_all (which returns a list). Instead, it is better to use str_extract as we need to extract only the first instance of one or more digits (\\d+)
df %>%
mutate(new = as.numeric(str_extract(test, "[0-9]+")))
If we need to get the output from str_extract_all (in case), unlist the list to a vector and then apply the as.numeric on that vector
df %>%
mutate(new = as.numeric(unlist(str_extract_all(test, "[0-9]+"))))
If there are multiple instances, then keep it as a list after converting to numeric by looping through the list elements with map
df %>%
mutate(new = map(str_extract_all(test, "[0-9]+"), as.numeric))
NOTE: The str_extract based solution was first posted here.
In base R, we can use regexpr
df$new <- as.numeric(regmatches(df$test, regexpr("\\d+", df$test)))
With the updated example, if we need to get two instances of numbers, the first one can be extracted with str_extract and the last (stri_extract_last - from stringi can be used as well), by providing a regex lookaround to check for digits followed by a . and 'txt'
df %>%
mutate(new1 = as.numeric(str_extract(test, "\\d+")),
new2 = as.numeric(str_extract(test, "\\d+(?=\\.txt)")))
# test value new1 new2
#1 test_A_1_1.txt 0.51 1 1
#2 test_A_2_1.txt 0.52 2 1
#3 test_A_3_1.txt 0.56 3 1
Slightly modifying your existing code:
df %>%
mutate(new = as.integer(str_extract(test, "[0-9]+")))
Or simply
df$new <- as.integer(str_extract(df$test, "[0-9]+"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With