Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - aggregate(), using na.action to not drop data

Tags:

r

aggregate

na

sum

I feel like I have done this before and have no idea why I am not figuring it out all of the sudden. I am simply trying to aggregate data using the aggregate() function, without dropping any rows where the grouping terms are NA. I also do not want to have to worry about converting NAs to anything like character strings... Given the following:

 FOO_BAR <- data.frame(Foo=c(rep("omg", 6), rep(NA, 6), rep("omg", 6), rep(NA, 6)), 
                       Bar=c(rep("This", 6), rep("is", 6), rep("so", 6), rep("annoying", 6)), 
                       Doh=rnorm(24))

I would like to use the following:

aggregate(data=FOO_BAR, Doh ~ ., FUN=sum, na.action=na.pass, na.rm=FALSE)

To yield something like this:

Foo Bar Doh
omg This ###
NA is ###
omg so ###
NA annoying ###

I have tried na.action=na.pass, and na.action=NULL. I have tried playing around with the class of the variable "Foo". I would like to solve it using aggregate(), and not another method such as summarize(). Any help appreciated.

like image 384
myt Avatar asked Feb 03 '26 21:02

myt


1 Answers

Your problem is not with aggregate, but with the default behavior of factor, which excludes NA:

FOO_BAR$Foo <- factor(FOO_BAR$Foo, exclude = NULL)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)

# OR

FOO_BAR$Foo <- addNA(FOO_BAR$Foo)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)

aggregate coerces your by groups to factors. ?factor shows the default is exclude = NA. If FOO_BAR$Bar also had NA you would need to do the same thing to include it.

Output

You can tell that Bar has been converted to a factor. By default, factors are ordered alphabetically, which is why annoying comes first and This last, so unfortunately Bar does not read as you intended :)

   Foo      Bar       Doh
1 <NA> annoying -1.520229
2 <NA>       is -1.690467
3  omg       so  2.588006
4  omg     This -4.424476

Of course we can fix this behavior -- we would not want your message to get lost. Rather than manually set the level argument of factor we can use forcast::fct_inorder:

FOO_BAR$Bar <- forcats::fct_inorder(FOO_BAR$Bar)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)
   Foo      Bar       Doh
1  omg     This -4.424476
2 <NA>       is -1.690467
3  omg       so  2.588006
4 <NA> annoying -1.520229

Alternatively you can simply do:

library(dplyr)

FOO_BAR |> 
  summarize(Doh = sum(Doh), 
            .by = c(Foo, Bar))
like image 144
LMc Avatar answered Feb 05 '26 13:02

LMc