I'm trying to filter by NAs (just keep the rows with NA in the specified column) by using Dplyr and the filter function. Using the code below, is just returning the column labels with no data. Am I writing the code correctly? Also, if it's possible (or easier) to do without dplyr that'd be interesting to know as well. Thanks.
filter(tata4, CompleteSolution == "NA", KeptInformed == "NA")
You could use complete.cases()
dplyr::filter(df, !complete.cases(col1, col2))
Which gives:
# col1 col2 col3
#1 NA 5 5
#2 NA 6 6
#3 5 NA 7
Benchmark
large_df <- df[rep(seq_len(nrow(df)), 10e5), ]
The results so far:
library(microbenchmark)
mbm <- microbenchmark(
akrun1 = large_df[rowSums(is.na(large_df[1:2]))!=0, ],
akrun2 = large_df[Reduce(`|`, lapply(large_df[1:2], is.na)), ],
steven = filter(large_df, !complete.cases(col1, col2)),
times = 10)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# akrun1 814.0226 924.0837 1248.9911 1208.7924 1434.2415 2057.1338 10 c
# akrun2 499.3404 671.9900 736.2418 687.9194 861.4477 1068.1232 10 b
# steven 112.9394 113.0604 214.1688 198.4542 299.7585 355.1795 10 a
Data
df <- structure(list(col1 = c(1, 2, 3, 4, NA, NA, 5), col2 = c(1, 2,
3, 4, 5, 6, NA), col3 = c(1, 2, 3, 4, 5, 6, 7)), .Names = c("col1",
"col2", "col3"), row.names = c(NA, -7L), class = "data.frame")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With