I want to flag rows in a dataset where all entries in a subset of columns are NA. The caveat to this is that some rows are strings and some are numbers.
Doing this with columns that are all numbers works fine. Let's say we want an indicator variable that tells us whether columns a to c in the following dataset are all NA
df <- data.frame(a = c(1,4,5,NA),
b = c(NA,4,2,NA),
c = c(3,7,8,NA),
d = c(2,1,1,1))
df
# output
a b c d
1 1 NA 3 2
2 4 4 7 1
3 5 2 8 1
4 NA NA NA 1
Using dplyr we can create a variable indicating what we need
df %>%
rowwise() %>%
mutate(allNA = case_when(all(is.na(c_across(cols = a:c))) ~ "allNA",
TRUE ~ "notAllNA")) %>%
ungroup()
# output
# A tibble: 4 × 5
a b c d allNA
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 NA 3 2 notAllNA
2 4 4 7 1 notAllNA
3 5 2 8 1 notAllNA
4 NA NA NA 1 allNA
But if one of the variables is a string instead of a number
df <- data.frame(a = c(1,4,5,NA),
b = c(NA,"ava","dillion",NA),
c = c(3,7,8,NA),
d = c(2,1,1,1))
df
# output
a b c d
1 1 <NA> 3 2
2 4 ava 7 1
3 5 dillion 8 1
4 NA <NA> NA 1
The same code throws the following error
Error in `mutate()`:
ℹ In argument: `allNA = case_when(...)`.
ℹ In row 1.
Caused by error in `case_when()`:
! Failed to evaluate the left-hand side of formula 1.
Caused by error in `vec_c()`:
! Can't combine `a` <double> and `b` <character>.
Run `rlang::last_trace()` to see where the error occurred.
Any help appreciated.
One option to achieve your desired result which allows for columns of different types would be to use rowSums with across:
library(dplyr, warn.conflicts = FALSE)
df %>%
mutate(allNA = if_else(
rowSums(across(a:c, ~ !is.na(.x))) == 0, "allNA", "notAllNA"
))
#> a b c d allNA
#> 1 1 <NA> 3 2 notAllNA
#> 2 4 ava 7 1 notAllNA
#> 3 5 dillion 8 1 notAllNA
#> 4 NA <NA> NA 1 allNA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With