Consider the data frame
id <-c(1,1,1,2,2,3,3,3,3)
x1 <-c("no","yes","yes","no","no","no","no","no","yes")
x2 <-c(9,3,7,4,5,2,5,6,11)
df <- data.frame(id, x1, x2)
For each id I wish to keep the first row where x1="yes" and where x1 is completely "no" for a particular id I want to retain the last row for that id. I wrote the code
df %>% group_by(id) %>%
filter(if(x1 == "yes") x1 == "yes" else row_number() %in% c(n()))
but it runs into an error. The expect is
id x1 x2
1 yes 3
2 no 5
3 yes 11
I was wondering if this code could be improved to get this issue resolved? I also appreciate alternative approaches (e.g., data.table approach) of handling this. Thank you!
your code fails because filter produces a vector for each group instead of a single true/false.
so x1 == "yes" produces a vector like c(FALSE, TRUE, TRUE) while your if statement expects a singular TRUE or FALSE.
I recommend using slice for this use case as such:
df %>%
group_by(id) %>%
slice(if (any(x1 == "yes")) which.max(x1 == "yes") else n())
any(x1 == "yes") makes it so that it checks the entire group and returns a single TRUE or FALSE
and using slice() picks rows by their position (eg: 1st row, 5th row)
so which.max(x1 == "yes") finds the position of the first "yes" and selects it.
and if none exist, it gives the position of the last "no"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With