Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by two factors with dplyr

Tags:

r

group-by

dplyr

I am struggling a bit with the dplyr structure in R. I would like to successively group by two different factor levels in order to obtain the sum of another variable.

Here is a reproducible example

df <- data.frame(c("A", "A", "A", "B", "C", "C","C"),
                 c("1", "1", "3", "2", "3", "2","2"),
                 c(12, 45, 78, 32, 5, 7, 8))

colnames(df) <- c("factor1","factor2","values")

And here is my try so far

test <- df %>%
  group_by(factor1, factor2) %>%
  summarise(sum(values))

# A tibble: 5 x 3
# Groups:   factor1 [3]
factor1 factor2 `sum(values)`
<fct>   <fct>           <dbl>
1 A       1                  57
2 A       3                  78
3 B       2                  32
4 C       2                  15
5 C       3                   5

But it's not what I am looking for. I would like to have one row per factor 1, with results looking like this (and the 0 accounted for as well)

        1   2   3 
A       57  0   78           
B       0   32  0             
C       0   15  5    

any suggestions?

like image 486
ePoQ Avatar asked Dec 05 '25 05:12

ePoQ


2 Answers

Using pivot_Wider -

tidyr::pivot_wider(df, names_from = factor2, values_from = values, 
                    values_fn  =sum, values_fill = 0)

#  factor1   `1`   `3`   `2`
#  <chr>   <dbl> <dbl> <dbl>
#1 A          57    78     0
#2 B           0     0    32
#3 C           0     5    15

Or in data.table -

library(data.table)
dcast(setDT(df),factor1~factor2, value.var = 'values', fun.aggregate = sum)
like image 96
Ronak Shah Avatar answered Dec 07 '25 20:12

Ronak Shah


You need to "reshape" or "pivot" the data. Since you're already using dplyr, then you can use tidyr::pivot_wider. (Alternatively, reshape2::dcast will work similarly, though frankly I believe pivot_wider is more feature-full.)

library(dplyr)
test <- df %>%
  group_by(factor1, factor2) %>%
  summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
                   values_fill = 0)
# # A tibble: 3 x 4
# # Groups:   factor1 [3]
#   factor1   `1`   `3`   `2`
#   <chr>   <dbl> <dbl> <dbl>
# 1 A          57    78     0
# 2 B           0     0    32
# 3 C           0     5    15
like image 25
r2evans Avatar answered Dec 07 '25 19:12

r2evans



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!