Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter grouped data with filter function and if else condition in R

Consider the data frame

id <-c(1,1,1,2,2,3,3,3,3)
x1 <-c("no","yes","yes","no","no","no","no","no","yes")
x2 <-c(9,3,7,4,5,2,5,6,11)
df <- data.frame(id, x1, x2)

For each id I wish to keep the first row where x1="yes" and where x1 is completely "no" for a particular id I want to retain the last row for that id. I wrote the code

df %>% group_by(id) %>% 
       filter(if(x1 == "yes") x1 == "yes" else row_number() %in% c(n()))

but it runs into an error. The expect is

id  x1 x2
 1 yes  3
 2  no  5
 3 yes 11

I was wondering if this code could be improved to get this issue resolved? I also appreciate alternative approaches (e.g., data.table approach) of handling this. Thank you!

like image 232
T Richard Avatar asked Dec 22 '25 22:12

T Richard


1 Answers

your code fails because filter produces a vector for each group instead of a single true/false.
so x1 == "yes" produces a vector like c(FALSE, TRUE, TRUE) while your if statement expects a singular TRUE or FALSE.

I recommend using slice for this use case as such:

df %>%
  group_by(id) %>%
  slice(if (any(x1 == "yes")) which.max(x1 == "yes") else n())

any(x1 == "yes") makes it so that it checks the entire group and returns a single TRUE or FALSE
and using slice() picks rows by their position (eg: 1st row, 5th row)

so which.max(x1 == "yes") finds the position of the first "yes" and selects it.
and if none exist, it gives the position of the last "no"

like image 110
Simon Avatar answered Dec 24 '25 12:12

Simon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!