I have a data.frame with complete and incomplete rows, something like :
dat <- data.frame(
"one" = c(1, 1, 1, 3, NA),
"two" = c(2, 2, NA, 4, 4),
"three" = c(1, 1, NA, 5, 5),
"four" = c(2, NA, 2, 6, 6)
)
I want to tag the rows that contain NA's as TRUE where their values equal values in the same column, on a row with fewer NA's, so
dat$redundant <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
Essentially I'm looking for duplicated(dat, MARGIN = 1) but with NA counting as a match to any present values. Is this possible?
We can use
library(dplyr)
library(tidyr)
dat %>%
fill(everything()) %>%
duplicated
#[1] FALSE TRUE TRUE FALSE TRUE
You can sweep the difference per row, build the rowSums ignoring NA, ignor the upper tri and diag. From this where the rowSums are >0 should indicate the duplicated rows ignoring NA.
tt <- sapply(seq_len(nrow(dat)),
function(i) rowSums(sweep(dat, 2, unlist(dat[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
rowSums(tt) > 0
#[1] FALSE TRUE TRUE FALSE TRUE
Adding line 2 to dat and try it again:
dat2 <- rbind(dat, dat[2,])
tt <- sapply(seq_len(nrow(dat2)),
function(i) rowSums(sweep(dat2, 2, unlist(dat2[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
dat2$redundant <- rowSums(tt) > 0
dat2
# one two three four redundant
#1 1 2 1 2 FALSE
#2 1 2 1 NA TRUE
#3 1 NA NA 2 TRUE
#4 3 4 5 6 FALSE
#5 NA 4 5 6 TRUE
#21 1 2 1 NA TRUE
And with the data from @Peter-Ellis (Thanks!):
dat <- data.frame("one" = c(1, 1, 1, 3, NA, NA), "two" = c(2, 2, NA, 4, 4, 9),
"three" = c(1, 1, NA, 5, 5, 5), "four" = c(2, NA, 2, 6, 6,6) )
tt <- sapply(seq_len(nrow(dat)),
function(i) rowSums(sweep(dat, 2, unlist(dat[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
rowSums(tt) > 0
#[1] FALSE TRUE TRUE FALSE TRUE FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With