Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting distinct values of a column with specific values of other columns

I have the following dataset where ID has duplicates and other columns are categorical columns ranged from 0 t0 2. I'd like to select unique IDs with values of other columns that are not zero, if available. The data is as follows:

 ID      X     Y     R      Z 
  1      0     2     0      1
  1      0     2     0      0
  2      1     0     0      1
  3      1     1     0      1
  3      1     1     1      1
  4      0     0     1      0
  4      0     1     1      0

My favourite outcome is:

 ID      X     Y     R      Z 
  1      0     2     0      1
  2      1     0     0      1
  3      1     1     1      1
  4      0     1     1      0

I am using dplyr and group_by

Thank you!

like image 346
Alex Avatar asked Sep 01 '25 20:09

Alex


1 Answers

We can use a condition with if/else after the group_by

library(dplyr)
df1 %>%
   group_by(ID) %>% 
   summarise(across(everything(), ~ if(all(. == 0)) 0 
       else unique(.[. !=0])), .groups = 'drop')

-output

# A tibble: 4 x 5
#     ID     X     Y     R     Z
#  <int> <dbl> <dbl> <dbl> <dbl>
#1     1     0     2     0     1
#2     2     1     0     0     1
#3     3     1     1     1     1
#4     4     0     1     1     0

data

df1 <- structure(list(ID = c(1L, 1L, 2L, 3L, 3L, 4L, 4L), X = c(0L, 
0L, 1L, 1L, 1L, 0L, 0L), Y = c(2L, 2L, 0L, 1L, 1L, 0L, 1L), R = c(0L, 
0L, 0L, 0L, 1L, 1L, 1L), Z = c(1L, 0L, 1L, 1L, 1L, 0L, 0L)),
class = "data.frame", row.names = c(NA, 
-7L))
like image 134
akrun Avatar answered Sep 03 '25 20:09

akrun