I'm trying to plot boxplot with ggplot2. I want to change the middle to mean.
I know people have asked similar questions before, but I'm asking this because the solution didn't work for me. Specifically I followed the first solution in this accepted answer
This is what I did with mpg test data:
library(ggplot2)
library(tidyverse)
mpg %>%
ggplot(aes(x = class, y = cty, middle = mean(cty))) +
geom_boxplot()
It has no effect.
graph plotting mean:
graph plotting with default median:
Can anyone help to point out what I did wrong? Thanks.
Messing around with another dataset, mtcars
, shows the same thing, defining middle
doesn't change it. And that one has some larger differences in mean to median. Another option is using stat_summary
, although I can't get the points function to work just right, and had to tweak it to not get a arguments imply differing number of rows: 1, 0
error.
BoxMeanQuant <- function(x) {
v <- c(min(x), quantile(x, 0.25), mean(x), quantile(x, 0.75), max(x))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
mpg %>%
ggplot(aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot")
Compared to the normal geom_boxplot
, which is not using the defined middle
.
mpg %>%
ggplot(aes(x = class, y = cty)) +
geom_boxplot(aes(middle = mean(cty)))
This is what I was using to plot the outliers as points, but they're different from whatever the default for geom_boxplot
is. You can adjust as necessary. Without using the if-else
it would throw an error.
BoxMeanQuant <- function(x) {
v <- c(quantile(x, 0.1), quantile(x, 0.25), mean(x), quantile(x, 0.75), quantile(x, 0.9))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
outliers <- function(x) {
if (length(x) > 5) {
subset(x, x < quantile(x, 0.1) | quantile(x, 0.9) < x)
} else {
return(NA)
}
}
ggplot(data = mpg, aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot") +
stat_summary(fun.y = outliers, geom = "point")
In the end I had to create a summary df to do this. It is not what I was originally looking for, but it works.
df <- mpg %>%
group_by(class) %>%
summarize(ymin = min(cty), ymax = max(cty), lower = quantile(cty, 0.25), upper = quantile(cty, 0.75), middle = mean(cty))
df %>%
ggplot(aes(class)) +
geom_boxplot(aes(ymin = ymin, ymax = ymax, lower = lower, upper = upper, middle = middle), stat = 'identity')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With