I am trying to remove duplicates from a dataset (caused by merging). However, one row contains a value and one does not, in some cases both rows are NA. I want to keep the ones with data, and if there are on NAs, then it does not matter which I keep. How do I do that? I am stuck.
I unsuccessfully tried the solutions from here (also not usually working with data.table, so I dont understand whats what)
R data.table remove rows where one column is duplicated if another column is NA
Some minimum example data:
df <- data.frame(ID = c("A", "A", "B", "B", "C", "D", "E", "G", "H", "J", "J"),
value = c(NA, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, NA, 1L))
ID value
A NA
A 1
B NA
B NA
C 1
D 1
E 1
G 1
H 1
J NA
J 1
and I want this:
ID value
A 1
B NA
C 1
D 1
E 1
G 1
H 1
J 1
One possibility using dplyr
could be:
df %>%
group_by(ID) %>%
slice(which.max(!is.na(value)))
ID value
<chr> <int>
1 A 1
2 B NA
3 C 1
4 D 1
5 E 1
6 G 1
7 H 1
8 J 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With