flagging redundant rows with NA

Question

I have a data.frame with complete and incomplete rows, something like :

dat <- data.frame(
  "one" = c(1, 1, 1, 3, NA),
  "two" = c(2, 2, NA, 4, 4),
  "three" = c(1, 1, NA, 5, 5),
  "four"  = c(2, NA, 2, 6, 6)
)

I want to tag the rows that contain NA's as TRUE where their values equal values in the same column, on a row with fewer NA's, so

dat$redundant <- c(FALSE, TRUE, TRUE, FALSE, TRUE)

Essentially I'm looking for duplicated(dat, MARGIN = 1) but with NA counting as a match to any present values. Is this possible?

akrun · Accepted Answer

We can use

library(dplyr)
library(tidyr)
dat %>% 
     fill(everything()) %>% 
     duplicated
 #[1] FALSE  TRUE  TRUE FALSE  TRUE

GKi · Answer

You can sweep the difference per row, build the rowSums ignoring NA, ignor the upper tri and diag. From this where the rowSums are >0 should indicate the duplicated rows ignoring NA.

tt <- sapply(seq_len(nrow(dat)),
 function(i) rowSums(sweep(dat, 2, unlist(dat[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
rowSums(tt) > 0
#[1] FALSE  TRUE  TRUE FALSE  TRUE

Adding line 2 to dat and try it again:

dat2 <- rbind(dat, dat[2,])
tt <- sapply(seq_len(nrow(dat2)),
 function(i) rowSums(sweep(dat2, 2, unlist(dat2[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
dat2$redundant <- rowSums(tt) > 0
dat2
#   one two three four redundant
#1    1   2     1    2     FALSE
#2    1   2     1   NA      TRUE
#3    1  NA    NA    2      TRUE
#4    3   4     5    6     FALSE
#5   NA   4     5    6      TRUE
#21   1   2     1   NA      TRUE

And with the data from @Peter-Ellis (Thanks!):

dat <- data.frame("one" = c(1, 1, 1, 3, NA, NA), "two" = c(2, 2, NA, 4, 4, 9),   
  "three" = c(1, 1, NA, 5, 5, 5), "four"  = c(2, NA, 2, 6, 6,6) )
tt <- sapply(seq_len(nrow(dat)),
 function(i) rowSums(sweep(dat, 2, unlist(dat[i,])), TRUE)==0)
tt[upper.tri(tt, TRUE)] <- FALSE
rowSums(tt) > 0
#[1] FALSE  TRUE  TRUE FALSE  TRUE FALSE

flagging redundant rows with NA

Tags:

r

obrl_soil

2 Answers

akrun

GKi

Recent Activity

Donate For Us

flagging redundant rows with NA

Tags:

r

obrl_soil

2 Answers

akrun

GKi

Related questions

Recent Activity

Donate For Us