Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R select one row from duplicated rows after compare multi conditions

Tags:

r

I got these duplicated records from ton of data. Now, I need to choose one row from these duplicated rows.

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)

The data frame will look like this: enter image description here

I want to compare this: for each ID,if their dates are different, then keep all of them in df.right; if the date is same, then compare type, choose them in order of LC>MC>YA>ST (eg. choose MC over YA), put them into df.right; if type is same, then compare level, choose them in order of active>new>firsttime (eg. choose new over first time), and put the choosen into df.right.

I tried to use foreach, this is only on the first step, and it is not working for ID have 3 duplicated rows.

foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}

The result should be like this: enter image description here Does anybody knows how to do this? really appreciate it. Thank you

like image 988
Leah Liu Avatar asked Dec 07 '25 06:12

Leah Liu


1 Answers

The dplyr package is good for this sort of thing

Using factors, you can specify how you want your categories ordered. Then you can pick the first of each type and level for each unique ID/date pair.

library(dplyr)

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")

type <- factor(type, levels=c("LC", "MC", "YA", "ST"))

level <- factor(level, levels=c("active", "new", "firsttime"))

data <- data.frame(ID,date,type,level)

df.right <- data %>%
  group_by(ID, date) %>%
  filter(type == sort(type)[1]) %>%
  filter(level == sort(level)[1])
like image 163
C_Z_ Avatar answered Dec 08 '25 18:12

C_Z_