Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi-level aggregations (like "grouping sets") via ddply or other R function?

Tags:

r

plyr

I would like to be able to aggregate by multiple columns and get in the results not only the finest aggregations (one value from each grouping column) but also the higher-level aggregations (one value from one grouping column, no restriction on other grouping columns, etc.). I believe in Oracle and Hive this is possible via "grouping sets" (also "cube" and "rollup" in Hive).

This code achieves what I'm looking for:

rbind.fill(ddply(mtcars, .(cyl), summarize, agg=mean(mpg)),
           ddply(mtcars, .(cyl, am), summarize, agg=mean(mpg)))[, c(1,3,2)]

  cyl am      agg
1   4 NA 26.66364
2   6 NA 19.74286
3   8 NA 15.10000
4   4  0 22.90000
5   4  1 28.07500
6   6  0 19.12500
7   6  1 20.56667
8   8  0 15.05000
9   8  1 15.40000

But this is ugly in several ways. For my actual application the definition of the aggregations to perform is long and I would really rather not repeat it. Is there an elegant way to do this?

like image 757
Aaron Schumacher Avatar asked Oct 27 '25 04:10

Aaron Schumacher


2 Answers

As of v1.11.0, data.table has functions for rollup, cube and groupingsets. jangorecki provides an answer to this question in issue #1377:

library(data.table)
rollup(as.data.table(mtcars), j=.(agg=mean(mpg)), by=c("cyl", "am"))
like image 142
dnlbrky Avatar answered Oct 28 '25 20:10

dnlbrky


I think this will do what you want:

library(plyr)
grp.cols <- c("vs", "am", "gear", "carb", "cyl")
do.call(
  rbind.fill,
  lapply(1:length(grp.cols), function(x) ddply(mtcars, grp.cols[1:x], summarize, agg=mean(mpg)))
)
like image 29
BrodieG Avatar answered Oct 28 '25 20:10

BrodieG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!