I have a dataset in form of a Diary - i.e. i have multiple entries for the same ID. Apart from that, I have a categorical variable (Yes/no) that indicates whether the event occured or not.
ID <- c(1,1,1,2,2,2,2,3,3,3,3,3,3)
event <- c("No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "Yes", "No", "No", "Yes", "Yes")
df <- data.frame(ID, event)
ID event
1 No
1 No
1 No
2 Yes
2 No
2 No
2 Yes
3 Yes
3 Yes
3 No
3 No
3 Yes
3 Yes
I now want to delete those entries until the first "No", so every ID should start with a "No". However, after the first "No" there can still be a "Yes". So the desired output i want is:
ID event
1 No
1 No
1 No
2 No
2 No
2 Yes
3 No
3 No
3 Yes
3 Yes
Does anybody know how to achieve this? Thanks in advance for your time!
We can get the first "No" using which.max and select all the rows from there till last row.
library(dplyr)
df %>% group_by(ID) %>% slice(which.max(event == 'No') : n())
#Also
#df %>% group_by(ID) %>% slice(which(event == 'No')[1] : n())
# ID event
# <dbl> <chr>
# 1 1 No
# 2 1 No
# 3 1 No
# 4 2 No
# 5 2 No
# 6 2 Yes
# 7 3 No
# 8 3 No
# 9 3 Yes
#10 3 Yes
Try:
library(dplyr)
df %>%
group_by(ID) %>%
filter(cumsum(event == 'No') >= 1)
Output:
# A tibble: 10 x 2
# Groups: ID [3]
ID event
<int> <fct>
1 1 No
2 1 No
3 1 No
4 2 No
5 2 No
6 2 Yes
7 3 No
8 3 No
9 3 Yes
10 3 Yes
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With