Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing duplicates if there is NA in one of the duplicates in R

Tags:

r

duplicates

I am trying to remove duplicates from a dataset (caused by merging). However, one row contains a value and one does not, in some cases both rows are NA. I want to keep the ones with data, and if there are on NAs, then it does not matter which I keep. How do I do that? I am stuck.

I unsuccessfully tried the solutions from here (also not usually working with data.table, so I dont understand whats what)

R data.table remove rows where one column is duplicated if another column is NA

Some minimum example data:

df <- data.frame(ID = c("A", "A", "B", "B", "C", "D", "E", "G", "H", "J", "J"),
                 value = c(NA, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, 1L))

ID value
A    NA
A     1
B    NA
B    NA
C     1
D     1
E     1
G     1
H     1
J    NA
J     1

and I want this:

ID value
A     1
B    NA
C     1
D     1
E     1
G     1
H     1
J     1
like image 740
H.Stevens Avatar asked Sep 07 '25 11:09

H.Stevens


1 Answers

One possibility using dplyr could be:

df %>%
 group_by(ID) %>%
 slice(which.max(!is.na(value)))

  ID    value
  <chr> <int>
1 A         1
2 B        NA
3 C         1
4 D         1
5 E         1
6 G         1
7 H         1
8 J         1
like image 169
tmfmnk Avatar answered Sep 10 '25 00:09

tmfmnk