Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R/dplyr: How to only keep integers in a data frame?

Tags:

string

integer

r

I have a data frame that has years in it (data type chr):

Years:
5 yrs
10 yrs
20 yrs
4 yrs

I want to keep only the integers to get a data frame like this (data type num):

Years:
5
10
20
4

How do I do this in R?

like image 529
questionmark Avatar asked Oct 22 '25 07:10

questionmark


2 Answers

you need to extract the numbers and treat them as type numeric

df$year <- as.numeric(sub(" yrs", "", df$year))
like image 63
Daniel O Avatar answered Oct 23 '25 21:10

Daniel O


Per your additional requirements a more general purpose solution but it has limits too. The nice thing about the more complicated years3 solution is it deals more gracefully with unexpected but quite possible answers.

library(dplyr)
library(stringr)
library(purrr)

Years <- c("5 yrs",
           "10 yrs",
           "20 yrs",
           "4 yrs",
           "4-5 yrs",
           "75 to 100 YEARS old",
           ">1 yearsmispelled or whatever")
df <- data.frame(Years)

# just the numbers but loses the -5 in 4-5
df$Years1 <- as.numeric(sub("(\\d{1,4}).*", "\\1", df$Years)) 
#> Warning: NAs introduced by coercion

# just the numbers but loses the -5 in 4-5 using str_extract
df$Years2 <- str_extract(df$Years, "[0-9]+")

# a lot more needed to account for averaging

df$Years3 <- str_extract_all(df$Years, "[0-9]+") %>%
  purrr::map( ~ ifelse(length(.x) == 1, 
                as.numeric(.x), 
                mean(unlist(as.numeric(.x)))))

df
#>                           Years Years1 Years2 Years3
#> 1                         5 yrs      5      5      5
#> 2                        10 yrs     10     10     10
#> 3                        20 yrs     20     20     20
#> 4                         4 yrs      4      4      4
#> 5                       4-5 yrs      4      4    4.5
#> 6           75 to 100 YEARS old     75     75   87.5
#> 7 >1 yearsmispelled or whatever     NA      1      1
like image 24
Chuck P Avatar answered Oct 23 '25 19:10

Chuck P