I have the following dataset where ID has duplicates and other columns are categorical columns ranged from 0 t0 2. I'd like to select unique IDs with values of other columns that are not zero, if available. The data is as follows:
ID X Y R Z
1 0 2 0 1
1 0 2 0 0
2 1 0 0 1
3 1 1 0 1
3 1 1 1 1
4 0 0 1 0
4 0 1 1 0
My favourite outcome is:
ID X Y R Z
1 0 2 0 1
2 1 0 0 1
3 1 1 1 1
4 0 1 1 0
I am using dplyr and group_by
Thank you!
We can use a condition with if/else
after the group_by
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(across(everything(), ~ if(all(. == 0)) 0
else unique(.[. !=0])), .groups = 'drop')
-output
# A tibble: 4 x 5
# ID X Y R Z
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 0 2 0 1
#2 2 1 0 0 1
#3 3 1 1 1 1
#4 4 0 1 1 0
df1 <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L), X = c(0L,
0L, 1L, 1L, 1L, 0L, 0L), Y = c(2L, 2L, 0L, 1L, 1L, 0L, 1L), R = c(0L,
0L, 0L, 0L, 1L, 1L, 1L), Z = c(1L, 0L, 1L, 1L, 1L, 0L, 0L)),
class = "data.frame", row.names = c(NA,
-7L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With