Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying conditional to a subset of rows where rows are different formats in dplyr

Tags:

r

dplyr

I want to flag rows in a dataset where all entries in a subset of columns are NA. The caveat to this is that some rows are strings and some are numbers.

Doing this with columns that are all numbers works fine. Let's say we want an indicator variable that tells us whether columns a to c in the following dataset are all NA

df <- data.frame(a = c(1,4,5,NA),
                 b = c(NA,4,2,NA),
                 c = c(3,7,8,NA),
                 d = c(2,1,1,1))

df

# output
   a  b  c d
1  1 NA  3 2
2  4  4  7 1
3  5  2  8 1
4 NA NA NA 1

Using dplyr we can create a variable indicating what we need

df %>%
  rowwise() %>%
    mutate(allNA = case_when(all(is.na(c_across(cols = a:c))) ~ "allNA",
                             TRUE ~ "notAllNA")) %>%
      ungroup()

# output
# A tibble: 4 × 5
      a     b     c     d allNA   
  <dbl> <dbl> <dbl> <dbl> <chr>   
1     1    NA     3     2 notAllNA
2     4     4     7     1 notAllNA
3     5     2     8     1 notAllNA
4    NA    NA    NA     1 allNA  

But if one of the variables is a string instead of a number

df <- data.frame(a = c(1,4,5,NA),
                 b = c(NA,"ava","dillion",NA),
                 c = c(3,7,8,NA),
                 d = c(2,1,1,1))

df

# output
   a       b  c d
1  1    <NA>  3 2
2  4     ava  7 1
3  5 dillion  8 1
4 NA    <NA> NA 1

The same code throws the following error

Error in `mutate()`:
ℹ In argument: `allNA = case_when(...)`.
ℹ In row 1.
Caused by error in `case_when()`:
! Failed to evaluate the left-hand side of formula 1.
Caused by error in `vec_c()`:
! Can't combine `a` <double> and `b` <character>.
Run `rlang::last_trace()` to see where the error occurred.

Any help appreciated.

like image 317
llewmills Avatar asked Oct 21 '25 06:10

llewmills


1 Answers

One option to achieve your desired result which allows for columns of different types would be to use rowSums with across:

library(dplyr, warn.conflicts = FALSE)

df %>%
  mutate(allNA = if_else(
    rowSums(across(a:c, ~ !is.na(.x))) == 0, "allNA", "notAllNA"
  ))
#>    a       b  c d    allNA
#> 1  1    <NA>  3 2 notAllNA
#> 2  4     ava  7 1 notAllNA
#> 3  5 dillion  8 1 notAllNA
#> 4 NA    <NA> NA 1    allNA
like image 189
stefan Avatar answered Oct 22 '25 20:10

stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!