I feel like I have done this before and have no idea why I am not figuring it out all of the sudden. I am simply trying to aggregate data using the aggregate() function, without dropping any rows where the grouping terms are NA. I also do not want to have to worry about converting NAs to anything like character strings... Given the following:
FOO_BAR <- data.frame(Foo=c(rep("omg", 6), rep(NA, 6), rep("omg", 6), rep(NA, 6)),
Bar=c(rep("This", 6), rep("is", 6), rep("so", 6), rep("annoying", 6)),
Doh=rnorm(24))
I would like to use the following:
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum, na.action=na.pass, na.rm=FALSE)
To yield something like this:
| Foo | Bar | Doh |
|---|---|---|
| omg | This | ### |
| NA | is | ### |
| omg | so | ### |
| NA | annoying | ### |
I have tried na.action=na.pass, and na.action=NULL. I have tried playing around with the class of the variable "Foo". I would like to solve it using aggregate(), and not another method such as summarize(). Any help appreciated.
Your problem is not with aggregate, but with the default behavior of factor, which excludes NA:
FOO_BAR$Foo <- factor(FOO_BAR$Foo, exclude = NULL)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)
# OR
FOO_BAR$Foo <- addNA(FOO_BAR$Foo)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)
aggregate coerces your by groups to factors. ?factor shows the default is exclude = NA. If FOO_BAR$Bar also had NA you would need to do the same thing to include it.
Output
You can tell that Bar has been converted to a factor. By default, factors are ordered alphabetically, which is why annoying comes first and This last, so unfortunately Bar does not read as you intended :)
Foo Bar Doh
1 <NA> annoying -1.520229
2 <NA> is -1.690467
3 omg so 2.588006
4 omg This -4.424476
Of course we can fix this behavior -- we would not want your message to get lost. Rather than manually set the level argument of factor we can use forcast::fct_inorder:
FOO_BAR$Bar <- forcats::fct_inorder(FOO_BAR$Bar)
aggregate(data=FOO_BAR, Doh ~ ., FUN=sum)
Foo Bar Doh
1 omg This -4.424476
2 <NA> is -1.690467
3 omg so 2.588006
4 <NA> annoying -1.520229
Alternatively you can simply do:
library(dplyr)
FOO_BAR |>
summarize(Doh = sum(Doh),
.by = c(Foo, Bar))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With