I am struggling a bit with the dplyr structure in R. I would like to successively group by two different factor levels in order to obtain the sum of another variable.
Here is a reproducible example
df <- data.frame(c("A", "A", "A", "B", "C", "C","C"),
c("1", "1", "3", "2", "3", "2","2"),
c(12, 45, 78, 32, 5, 7, 8))
colnames(df) <- c("factor1","factor2","values")
And here is my try so far
test <- df %>%
group_by(factor1, factor2) %>%
summarise(sum(values))
# A tibble: 5 x 3
# Groups: factor1 [3]
factor1 factor2 `sum(values)`
<fct> <fct> <dbl>
1 A 1 57
2 A 3 78
3 B 2 32
4 C 2 15
5 C 3 5
But it's not what I am looking for. I would like to have one row per factor 1, with results looking like this (and the 0 accounted for as well)
1 2 3
A 57 0 78
B 0 32 0
C 0 15 5
any suggestions?
Using pivot_Wider -
tidyr::pivot_wider(df, names_from = factor2, values_from = values,
values_fn =sum, values_fill = 0)
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
#1 A 57 78 0
#2 B 0 0 32
#3 C 0 5 15
Or in data.table -
library(data.table)
dcast(setDT(df),factor1~factor2, value.var = 'values', fun.aggregate = sum)
You need to "reshape" or "pivot" the data. Since you're already using dplyr, then you can use tidyr::pivot_wider. (Alternatively, reshape2::dcast will work similarly, though frankly I believe pivot_wider is more feature-full.)
library(dplyr)
test <- df %>%
group_by(factor1, factor2) %>%
summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
values_fill = 0)
# # A tibble: 3 x 4
# # Groups: factor1 [3]
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
# 1 A 57 78 0
# 2 B 0 0 32
# 3 C 0 5 15
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With