Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summary / Aggregate in R without dropping levels

I would like to summarise or aggregate tables without dropping empty levels. I wonder if anyone has any ideas on this?

As an example, Here is a data frame

df1<-data.frame(Method=c(rep("A",3),rep("B",2),rep("C",4)),
       Type=c("Fast","Fast","Medium","Fast","Slow","Fast","Medium","Slow","Slow"),
            Measure=c(1,1,2,1,3,1,1,2,2))

Two approaches using base and doBy package.

#base
aggregate(Measure~Method+Type,data=df1,FUN=length)

require(doBy)
summaryBy(Measure~Method+Type,data=df1,FUN=length)

They both give the same results sorted differently, but the issue is that I would like all combinations of Method and Type and missing measures inserted as NAs. Or all levels of both my factors must be maintained.

df1$Type
df1$Method

Maybe plyr has something, but I don't know how that works.

like image 781
rmf Avatar asked Oct 22 '25 15:10

rmf


2 Answers

Have a look at tapply:

with(df1, tapply(Measure, list(Method, Type), FUN = length))

#   Fast Medium Slow
# A    2      1   NA
# B    1     NA    1
# C    1      1    2
like image 145
Sven Hohenstein Avatar answered Oct 24 '25 04:10

Sven Hohenstein


Update for 2021

I think this can be accomplished now with stats::aggregate() using drop = FALSE. No extra packages needed. The result is a regular ole dataframe where empty levels are NA.

aggregate(Measure ~ Method + Type, data = df1, FUN = length, drop = FALSE)

  Method   Type Measure
1      A   Fast       2
2      B   Fast       1
3      C   Fast       1
4      A Medium       1
5      B Medium      NA
6      C Medium       1
7      A   Slow      NA
8      B   Slow       1
9      C   Slow       2
like image 36
Skaqqs Avatar answered Oct 24 '25 04:10

Skaqqs