R select one row from duplicated rows after compare multi conditions

Question

I got these duplicated records from ton of data. Now, I need to choose one row from these duplicated rows.

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")
data <- data.frame(ID,date,type,level)

The data frame will look like this: enter image description here

I want to compare this: for each ID,if their dates are different, then keep all of them in df.right; if the date is same, then compare type, choose them in order of LC>MC>YA>ST (eg. choose MC over YA), put them into df.right; if type is same, then compare level, choose them in order of active>new>firsttime (eg. choose new over first time), and put the choosen into df.right.

I tried to use foreach, this is only on the first step, and it is not working for ID have 3 duplicated rows.

foreach (i=unique(data$ID), .combine='rbind') %do% {data[data$ID==i, "date"][1] == data[data$ID==i, "date"][2])
b <- data[data$ID==i,]}

The result should be like this: enter image description here Does anybody knows how to do this? really appreciate it. Thank you

C_Z_ · Accepted Answer

The dplyr package is good for this sort of thing

Using factors, you can specify how you want your categories ordered. Then you can pick the first of each type and level for each unique ID/date pair.

library(dplyr)

ID <- c("6820","6820","17413","17413","38553","38553","52760","52760","717841","717841","717841","747187","747187","747187")
date <- c("2014-06-12","2015-06-11","2014-05-01","2014-05-01","2014-06-12","2015-06-11","2014-10-24","2014-10-24","2014-05-01","2014-05-01","2014-12-02","2014-03-01","2014-05-12","2014-05-12")
type <- c("ST","ST","MC","MC","LC","LC","YA","YA","YA","YA","MC","LC","LC","MC")
level <-c("firsttime","new","new","active","active","active","firsttime","new","active","new","active","new","active","active")

type <- factor(type, levels=c("LC", "MC", "YA", "ST"))

level <- factor(level, levels=c("active", "new", "firsttime"))

data <- data.frame(ID,date,type,level)

df.right <- data %>%
  group_by(ID, date) %>%
  filter(type == sort(type)[1]) %>%
  filter(level == sort(level)[1])

R select one row from duplicated rows after compare multi conditions

Tags:

r

Leah Liu

1 Answers

C_Z_

Recent Activity

Donate For Us

R select one row from duplicated rows after compare multi conditions

Tags:

r

Leah Liu

1 Answers

C_Z_

Related questions

Recent Activity

Donate For Us