I have a function that applies specific functions to multiple columns in a data frame. Each of these functions are unique and can only be applied to that column.
convert_columns <- function(df) {
df %>% mutate(
a = convert_a(a),
b = convert_b(b),
c = convert_c(c),
d = convert_d(d),
e = convert_e(e)
)
}
However, it is possible that users may input a data frame that only have a subset of those columns (for example, only a
, b
, and c
. I would like the function to mutate
column a
, b
, and c
if those columns exist in the inputted data frame and ignore columns d
and e
.
I have tried
convert_columns <- function(df) {
df %>% mutate(across(any of(),
a = convert_a(a),
b = convert_b(b),
c = convert_c(c),
d = convert_d(d),
e = convert_e(e)
))
}
and
convert_columns <- function(df) {
df %>% mutate(across(any of(
a = convert_a(a),
b = convert_b(b),
c = convert_c(c),
d = convert_d(d),
e = convert_e(e)
)))
}
These do not work. Is there a simple way in the tidyverse
syntax to accomplish what I am trying to do? In my actual use case, I have ~150 columns I will be mutating.
Since functions are unique to each variable and you want to return remaining values if one of the columns fail can't really come up with better solution than to use tryCatch
on individual columns.
library(dplyr)
convert_columns <- function(df) {
df %>%
mutate(
a = tryCatch(convert_a(a),error = function(z) return(NA)),
b = tryCatch(convert_b(b),error = function(z) return(NA)),
c = tryCatch(convert_c(c),error = function(z) return(NA)),
#...
#...
)
}
This can be tested using the following mtcars
example :
This works -
mtcars %>%
mutate(a = n_distinct(cyl),
b = mean(mpg),
c = sd(am))
Now if we remove one of the column, the above fails :
mtcars %>%
select(-am) %>%
mutate(a = n_distinct(cyl),
b = mean(mpg),
c = sd(am))
Error: Problem with
mutate()
inputc
. x cannot coerce type 'closure' to vector of type 'double' ℹ Inputc
issd(am)
.
Now using tryCatch
mtcars %>%
select(-am) %>%
mutate(a = tryCatch(n_distinct(cyl), error = function(e) return(NA)),
b = tryCatch(mean(mpg), error = function(e) return(NA)),
c = tryCatch(sd(am), error = function(e) return(NA)))
# mpg cyl disp hp drat wt qsec vs gear carb a b c
#1 21 6 160 110 3.9 2.6 16 0 4 4 3 20 NA
#2 21 6 160 110 3.9 2.9 17 0 4 4 3 20 NA
#3 23 4 108 93 3.9 2.3 19 1 4 1 3 20 NA
#4 21 6 258 110 3.1 3.2 19 1 3 1 3 20 NA
#....
You can use switch()
to get a specific function based on column name. For instance, here, columns a, b, and c are either added, subtracted, or multiplied together, based on column name. We have to use dplyr::cur_column()
to get the column name within across (deparse(substitute())
just returns "col"
).
Thus, with the below method, you can supply just a single function to across()
but apply specific function to each column, while getting benefits of any_of()
library(dplyr)
ex <- function(x) {
arg <- cur_column()
fn <- switch(arg,
a = `+`,
b = `-`,
c = `*`)
fn(x, x)
}
df <- data.frame(a = c(1,2),
b = c(3,4))
mutate(df, across(any_of(c("a", "b", "c")), ex))
#> a b
#> 1 2 0
#> 2 4 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With