Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tidyverse equivalent of which()

Tags:

r

dplyr

Lets say I have the following (dummy) df:

df <- data.frame(conference = c('East', 'East', 'East', 'West', 'West', 'East'),
                 team = c('A', 'A', 'A', 'B', 'B', 'C'),
                 points = c(11, 8, 10, 6, 6, 5),
                 rebounds = c(7, 7, 6, 9, 12, 8))

and I want to do some math to the points and rebounds column. In base R, I could do stuff like

a_val <- sum(df$points[which(df$team == "A")]) /
  sum(df$rebounds[which(df$team == "A")])
b_val <- sum(df$points[which(df$team == "B" & df$rebounds >= 7)]) /
  sum(df$rebounds[which(df$team == "B" & df$rebounds >= 7)])

What is the equivalent of which() in tidy-verse to make these kinds of operations more efficient?

like image 815
grace.cutler Avatar asked Oct 25 '25 05:10

grace.cutler


2 Answers

Easiest route is grouping the dataframe before doing the calculations with summarise:

library(tidyverse)

df <- data.frame(conference = c('East', 'East', 'East', 'West', 'West', 'East'),
                 team = c('A', 'A', 'A', 'B', 'B', 'C'),
                 points = c(11, 8, 10, 6, 6, 5),
                 rebounds = c(7, 7, 6, 9, 12, 8))

df |> 
  group_by(team) |> 
  summarise(a_val = sum(points)/sum(rebounds),
            b_val = sum(points[rebounds>=7])/sum(rebounds>=7))
#> # A tibble: 3 × 3
#>   team  a_val b_val
#>   <chr> <dbl> <dbl>
#> 1 A     1.45    9.5
#> 2 B     0.571   6  
#> 3 C     0.625   5
like image 151
Andy Baxter Avatar answered Oct 26 '25 20:10

Andy Baxter


We don't actually need which in the question's example. Without it we are left with logical indexing which works on both data frames and tibbles and gives the same answer. e.g.

library(tibble)

tib <- as_tibble(df)
a_val <- sum(tib$points[tib$team == "A"]) / sum(tib$rebounds[tib$team == "A"])
like image 20
G. Grothendieck Avatar answered Oct 26 '25 21:10

G. Grothendieck