Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I filter by NAs in R programming with Dplyr

Tags:

r

na

dplyr

I'm trying to filter by NAs (just keep the rows with NA in the specified column) by using Dplyr and the filter function. Using the code below, is just returning the column labels with no data. Am I writing the code correctly? Also, if it's possible (or easier) to do without dplyr that'd be interesting to know as well. Thanks.

filter(tata4, CompleteSolution == "NA", KeptInformed == "NA")
like image 589
Stephertless Avatar asked Sep 07 '25 05:09

Stephertless


1 Answers

You could use complete.cases()

dplyr::filter(df, !complete.cases(col1, col2))

Which gives:

#  col1 col2 col3
#1   NA    5    5
#2   NA    6    6
#3    5   NA    7

Benchmark

large_df <- df[rep(seq_len(nrow(df)), 10e5), ]

The results so far:

library(microbenchmark)
mbm <- microbenchmark(
  akrun1 = large_df[rowSums(is.na(large_df[1:2]))!=0, ],
  akrun2 = large_df[Reduce(`|`, lapply(large_df[1:2], is.na)), ],
  steven = filter(large_df, !complete.cases(col1, col2)),
  times = 10)

enter image description here

#Unit: milliseconds
#   expr      min       lq      mean    median        uq       max neval cld
# akrun1 814.0226 924.0837 1248.9911 1208.7924 1434.2415 2057.1338    10   c
# akrun2 499.3404 671.9900  736.2418  687.9194  861.4477 1068.1232    10  b 
# steven 112.9394 113.0604  214.1688  198.4542  299.7585  355.1795    10 a 

Data

df <- structure(list(col1 = c(1, 2, 3, 4, NA, NA, 5), col2 = c(1, 2, 
3, 4, 5, 6, NA), col3 = c(1, 2, 3, 4, 5, 6, 7)), .Names = c("col1", 
"col2", "col3"), row.names = c(NA, -7L), class = "data.frame")
like image 161
Steven Beaupré Avatar answered Sep 10 '25 07:09

Steven Beaupré