Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter different number of rows per group

I want to filter x rows per id, but the x differs for each id.

example dataset:

df <- data.frame(id = c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2', 
   'P2', 'P3', 'P3'),
           points = c(56, 94, 17, 57, 55, 15, 37, 44, 55, 32))

The data and below code is adapted from here.

df %>%
  group_by(id) %>%
  filter(row_number() %in% c(1, 2))

This filters the first two rows for each id. So far so good.

But I want to filter different number of rows for each id based on the values stored in a vector like below

nrowtofilter <- c(3, 2, 1)

Thus, I want to filter 3 rows for P1, 2 for P2, and 1 for P3.

But when I do

df %>%
  group_by(id) %>%
  filter(row_number() %in% nrowtofilter)

I extract the first 3 rows of each ID.

How can I filter ids based on nrowtofilter?

like image 668
August Nilsson Avatar asked Oct 30 '25 05:10

August Nilsson


2 Answers

A different approach with cur_group_id, which doesn't require breaking the dataset into a list of dataframes:

library(dplyr)

df %>% 
  group_by(id) %>% 
  filter(row_number() <= nrowtofilter[cur_group_id()])

#> # A tibble: 6 x 2
#> # Groups:   id [3]
#>   id    points
#>   <chr>  <dbl>
#> 1 P1        56
#> 2 P1        94
#> 3 P1        17
#> 4 P2        55
#> 5 P2        15
#> 6 P3        55
like image 150
M-- Avatar answered Oct 31 '25 19:10

M--


First create a lookup table:

nrowtofilter <- setNames(c(3, 2, 1), c('P1', 'P2', 'P3'))
# P1 P2 P3 
#  3  2  1

then group_modify():

library(dplyr)

df %>%
  group_by(id) %>%
  group_modify(~ slice_head(.x, n = nrowtofilter[.y$id])) %>%
  ungroup()

# # A tibble: 6 × 2
#   id    points
#   <chr>  <dbl>
# 1 P1        56
# 2 P1        94
# 3 P1        17
# 4 P2        55
# 5 P2        15
# 6 P3        55

where .x refers to the subset of rows for the given group, and .y a one-row tibble with one column per grouping variable that identifies the group.

like image 45
Darren Tsai Avatar answered Oct 31 '25 20:10

Darren Tsai