Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the ratio between two groups

Essentially I have

Var1 Group Value
a    0     1 
b    0     2
c    0     3
a    1     2
b    1     10
c    1     9

I want to calculate a table with rows Var1 and a single column Div where for any value of Var1, the value of Div will be the division of the Value values for group 1 over group 2. Thus, the outcome should be

Var1 Div
a    .5
b    .2
c    .3333

I have achieved this by creating two tables (filter the table using Group, then divide the Value column of the two tables, and then create a new table). Is there a method or function in the tidyverse that achieves this using pipes?

like image 743
William M. Avatar asked Nov 05 '25 02:11

William M.


1 Answers

library(dplyr)
quux %>%
  group_by(Var1) %>%
  filter(all(0:1 %in% Group)) %>%
  summarize(Div = Value[Group==0]/Value[Group==1])
# # A tibble: 3 × 2
#   Var1    Div
#   <chr> <dbl>
# 1 a     0.5  
# 2 b     0.2  
# 3 c     0.333

The use of filter(all(0:1 %in% Group)) is done group-wise and ensures that we have both in order to do the ratio. (Depending on your confidence in the data, a slightly more complex test would be to guarantee that there is exactly one of each of 0:1, perhaps using

quux %>%
  group_by(Var1) %>%
  filter(identical(sort(intersect(Group, 0:1)), 0:1)) %>%
  summarize(...)

which guarantees both and no others are in Group. (Safeguards on data integrity.)

If you need to retain those groups, however, OR you think it's remotely possible that one might have more than one of each Group within a Var1-group, then you can do something like:

quux %>%
  group_by(Var1) %>%
  summarize(Div = Value[Group==0][1]/Value[Group==1][1])
# # A tibble: 3 × 2
#   Var1    Div
#   <chr> <dbl>
# 1 a     0.5  
# 2 b     0.2  
# 3 c     0.333

The addition of [1] guarantees that we'll get just one value; if that Group is found, then it's the respective Value; if that Group is not found, however, Value[Group==0][1] will return NA, which is safe for both the numerator and denominator.


Data

quux <- structure(list(Var1 = c("a", "b", "c", "a", "b", "c"), Group = c(0L, 0L, 0L, 1L, 1L, 1L), Value = c(1L, 2L, 3L, 2L, 10L, 9L)), class = "data.frame", row.names = c(NA, -6L))
like image 83
r2evans Avatar answered Nov 07 '25 14:11

r2evans