I want to filter x rows per id, but the x differs for each id.
example dataset:
df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2',
'P2', 'P3', 'P3'),
points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))
The data and below code is adapted from here.
df %>%
group_by(id) %>%
filter(row_number() %in% c(1, 2))
This filters the first two rows for each id. So far so good.
But I want to filter different number of rows for each id based on the values stored in a vector like below
nrowtofilter <- c(3, 2, 1)
Thus, I want to filter 3 rows for P1, 2 for P2, and 1 for P3.
But when I do
df %>%
group_by(id) %>%
filter(row_number() %in% nrowtofilter)
I extract the first 3 rows of each ID.
How can I filter ids based on nrowtofilter?
A different approach with cur_group_id, which doesn't require breaking the dataset into a list of dataframes:
library(dplyr)
df %>%
group_by(id) %>%
filter(row_number() <= nrowtofilter[cur_group_id()])
#> # A tibble: 6 x 2
#> # Groups: id [3]
#> id points
#> <chr> <dbl>
#> 1 P1 56
#> 2 P1 94
#> 3 P1 17
#> 4 P2 55
#> 5 P2 15
#> 6 P3 55
First create a lookup table:
nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3
# 3 2 1
then group_modify():
library(dplyr)
df %>%
group_by(id) %>%
group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
ungroup()
# # A tibble: 6 × 2
# id points
# <chr> <dbl>
# 1 P1 56
# 2 P1 94
# 3 P1 17
# 4 P2 55
# 5 P2 15
# 6 P3 55
where .x refers to the subset of rows for the given group, and .y a one-row tibble with one column per grouping variable that identifies the group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With