Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perform mutate function only if variable exists

Tags:

r

dplyr

tidyverse

I have a function that applies specific functions to multiple columns in a data frame. Each of these functions are unique and can only be applied to that column.

convert_columns <- function(df) {
    df %>% mutate(
        a = convert_a(a),
        b = convert_b(b),
        c = convert_c(c),
        d = convert_d(d),
        e = convert_e(e)
        )
}

However, it is possible that users may input a data frame that only have a subset of those columns (for example, only a, b, and c. I would like the function to mutate column a, b, and c if those columns exist in the inputted data frame and ignore columns d and e.

I have tried

convert_columns <- function(df) {
    df %>% mutate(across(any of(),
        a = convert_a(a),
        b = convert_b(b),
        c = convert_c(c),
        d = convert_d(d),
        e = convert_e(e)
        ))
}

and

convert_columns <- function(df) {
    df %>% mutate(across(any of(
        a = convert_a(a),
        b = convert_b(b),
        c = convert_c(c),
        d = convert_d(d),
        e = convert_e(e)
        )))
}

These do not work. Is there a simple way in the tidyverse syntax to accomplish what I am trying to do? In my actual use case, I have ~150 columns I will be mutating.

like image 409
Dylan Russell Avatar asked Sep 07 '25 04:09

Dylan Russell


2 Answers

Since functions are unique to each variable and you want to return remaining values if one of the columns fail can't really come up with better solution than to use tryCatch on individual columns.

library(dplyr)

convert_columns <- function(df) {
  df %>% 
    mutate(
    a = tryCatch(convert_a(a),error = function(z) return(NA)),
    b = tryCatch(convert_b(b),error = function(z) return(NA)),
    c = tryCatch(convert_c(c),error = function(z) return(NA)),
    #...
    #...
    )
}

This can be tested using the following mtcars example :

This works -

mtcars %>%
  mutate(a = n_distinct(cyl), 
         b = mean(mpg), 
         c = sd(am))

Now if we remove one of the column, the above fails :

mtcars %>%
  select(-am) %>%
  mutate(a = n_distinct(cyl), 
         b = mean(mpg), 
         c = sd(am))

Error: Problem with mutate() input c. x cannot coerce type 'closure' to vector of type 'double' ℹ Input c is sd(am).

Now using tryCatch

mtcars %>%
  select(-am) %>%
  mutate(a = tryCatch(n_distinct(cyl), error = function(e) return(NA)), 
         b = tryCatch(mean(mpg), error = function(e) return(NA)), 
         c = tryCatch(sd(am), error = function(e) return(NA)))

#   mpg cyl disp  hp drat  wt qsec vs gear carb a  b  c
#1   21   6  160 110  3.9 2.6   16  0    4    4 3 20 NA
#2   21   6  160 110  3.9 2.9   17  0    4    4 3 20 NA
#3   23   4  108  93  3.9 2.3   19  1    4    1 3 20 NA
#4   21   6  258 110  3.1 3.2   19  1    3    1 3 20 NA
#....
like image 71
Ronak Shah Avatar answered Sep 09 '25 02:09

Ronak Shah


You can use switch() to get a specific function based on column name. For instance, here, columns a, b, and c are either added, subtracted, or multiplied together, based on column name. We have to use dplyr::cur_column() to get the column name within across (deparse(substitute()) just returns "col").

Thus, with the below method, you can supply just a single function to across() but apply specific function to each column, while getting benefits of any_of()

library(dplyr)

ex <- function(x) {
  arg <- cur_column()
  fn <- switch(arg,
               a = `+`,
               b = `-`,
               c = `*`)
  fn(x, x)
}

df <- data.frame(a = c(1,2),
                 b = c(3,4))

mutate(df, across(any_of(c("a", "b", "c")), ex))
#>   a b
#> 1 2 0
#> 2 4 0
like image 25
caldwellst Avatar answered Sep 09 '25 01:09

caldwellst