I would like to be able to aggregate by multiple columns and get in the results not only the finest aggregations (one value from each grouping column) but also the higher-level aggregations (one value from one grouping column, no restriction on other grouping columns, etc.). I believe in Oracle and Hive this is possible via "grouping sets" (also "cube" and "rollup" in Hive).
This code achieves what I'm looking for:
rbind.fill(ddply(mtcars, .(cyl), summarize, agg=mean(mpg)),
ddply(mtcars, .(cyl, am), summarize, agg=mean(mpg)))[, c(1,3,2)]
cyl am agg
1 4 NA 26.66364
2 6 NA 19.74286
3 8 NA 15.10000
4 4 0 22.90000
5 4 1 28.07500
6 6 0 19.12500
7 6 1 20.56667
8 8 0 15.05000
9 8 1 15.40000
But this is ugly in several ways. For my actual application the definition of the aggregations to perform is long and I would really rather not repeat it. Is there an elegant way to do this?
As of v1.11.0, data.table has functions for rollup, cube and groupingsets. jangorecki provides an answer to this question in issue #1377:
library(data.table)
rollup(as.data.table(mtcars), j=.(agg=mean(mpg)), by=c("cyl", "am"))
I think this will do what you want:
library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
do.call(
rbind.fill,
lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With