How can I use different variables to calculate a new variable, depending on which variable has missing values?

Question

I would like to create a variable x that is the product of variable e and one of the variables a, b, c, or d. The resulting variable should take the value a * e, but if a is NA, then it should take the value b * e and if a & b are missing, then it should take the value c * e and so on.

For example: If my data frame looks like this:

df <- data.frame(a = c(1, 2, NA, NA, 5), b = c(NA, 1, NA, 4, 6), c = c(NA, 3, 3, 3, 7), d = c(1, 1, 1, 1, 1), e = c(1, 2, 3, 4, NA))

I would like to get to having the following result:

df$x <- c(1, 4, 9, 16, NA)

which I am trying to achieve the following way:

df <- df %>% mutate(x = case_when(!is.na(a) ~ a * e, is.na(a) ~ b * e, is.na(a) & is.na(b) ~ c * e, is.na(a) & is.na(b) & is.na(c) ~ d * e))

This, unfortunately is not working yet, since R is somehow not understanding is.na(a) & is.na(b) as both values missing at the same time.

Darren Tsai · Accepted Answer

You can use coalesce() from dplyr to find the first non-missing element.

library(dplyr)

df %>%
  mutate(x = e * coalesce(a, b, c, d))

#    a  b  c d  e  x
# 1  1 NA NA 1  1  1
# 2  2  1  3 1  2  4
# 3 NA NA  3 1  3  9
# 4 NA  4  3 1  4 16
# 5  5  6  7 1 NA NA

If you have lots of columns to be multiplied, you can automate it by using tidy-selection in across(). (Edit: Thank @akrun for the improvement)

df %>%
  mutate(x = e * do.call(coalesce, across(a:d)))

How can I use different variables to calculate a new variable, depending on which variable has missing values?

Tags:

dataframe

r

conditional-statements

na

Anton

1 Answers

Darren Tsai

Recent Activity

Donate For Us

How can I use different variables to calculate a new variable, depending on which variable has missing values?

Tags:

dataframe

r

conditional-statements

na

Anton

1 Answers

Darren Tsai

Related questions

Recent Activity

Donate For Us