I would like to create a variable x
that is the product of variable e
and one of the variables a, b, c
, or d
. The resulting variable should take the value a * e
, but if a
is NA
, then it should take the value b * e
and if a & b
are missing, then it should take the value c * e
and so on.
For example: If my data frame looks like this:
df <- data.frame(a = c(1, 2, NA, NA, 5), b = c(NA, 1, NA, 4, 6), c = c(NA, 3, 3, 3, 7), d = c(1, 1, 1, 1, 1), e = c(1, 2, 3, 4, NA))
I would like to get to having the following result:
df$x <- c(1, 4, 9, 16, NA)
which I am trying to achieve the following way:
df <- df %>% mutate(x = case_when(!is.na(a) ~ a * e, is.na(a) ~ b * e, is.na(a) & is.na(b) ~ c * e, is.na(a) & is.na(b) & is.na(c) ~ d * e))
This, unfortunately is not working yet, since R is somehow not understanding is.na(a) & is.na(b)
as both values missing at the same time.
You can use coalesce()
from dplyr
to find the first non-missing element.
library(dplyr)
df %>%
mutate(x = e * coalesce(a, b, c, d))
# a b c d e x
# 1 1 NA NA 1 1 1
# 2 2 1 3 1 2 4
# 3 NA NA 3 1 3 9
# 4 NA 4 3 1 4 16
# 5 5 6 7 1 NA NA
If you have lots of columns to be multiplied, you can automate it by using tidy-selection in across()
. (Edit: Thank @akrun for the improvement)
df %>%
mutate(x = e * do.call(coalesce, across(a:d)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With