Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot geom_boxplot middle aes no effect

Tags:

r

ggplot2

boxplot

I'm trying to plot boxplot with ggplot2. I want to change the middle to mean.

I know people have asked similar questions before, but I'm asking this because the solution didn't work for me. Specifically I followed the first solution in this accepted answer

This is what I did with mpg test data:

library(ggplot2)
library(tidyverse)

mpg %>%
  ggplot(aes(x = class, y = cty, middle = mean(cty))) +
  geom_boxplot()

It has no effect.

graph plotting mean: enter image description here

graph plotting with default median: enter image description here

Can anyone help to point out what I did wrong? Thanks.

like image 960
EJAg Avatar asked Oct 21 '25 02:10

EJAg


2 Answers

Messing around with another dataset, mtcars, shows the same thing, defining middle doesn't change it. And that one has some larger differences in mean to median. Another option is using stat_summary, although I can't get the points function to work just right, and had to tweak it to not get a arguments imply differing number of rows: 1, 0 error.

BoxMeanQuant <- function(x) {
    v <- c(min(x), quantile(x, 0.25), mean(x), quantile(x, 0.75), max(x))
    names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
    v
  }

mpg %>%
  ggplot(aes(x = class, y = cty)) +
  stat_summary(fun.data = BoxMeanQuant, geom = "boxplot")

Compared to the normal geom_boxplot, which is not using the defined middle.

mpg %>% 
  ggplot(aes(x = class, y = cty)) +
  geom_boxplot(aes(middle = mean(cty)))

This is what I was using to plot the outliers as points, but they're different from whatever the default for geom_boxplot is. You can adjust as necessary. Without using the if-else it would throw an error.

BoxMeanQuant <- function(x) {
  v <- c(quantile(x, 0.1), quantile(x, 0.25), mean(x), quantile(x, 0.75), quantile(x, 0.9))
  names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
  v
}

outliers <- function(x) {
  if (length(x) > 5) {
  subset(x, x < quantile(x, 0.1) | quantile(x, 0.9) < x)
  } else {
    return(NA)
  }
}

ggplot(data = mpg, aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot") +
stat_summary(fun.y = outliers, geom = "point")

like image 107
Anonymous coward Avatar answered Oct 22 '25 16:10

Anonymous coward


In the end I had to create a summary df to do this. It is not what I was originally looking for, but it works.

df <- mpg %>%
  group_by(class) %>%
  summarize(ymin = min(cty), ymax = max(cty), lower = quantile(cty, 0.25), upper = quantile(cty, 0.75), middle = mean(cty)) 

df %>%
  ggplot(aes(class)) +
  geom_boxplot(aes(ymin = ymin, ymax = ymax, lower = lower, upper = upper, middle = middle), stat = 'identity')
like image 20
EJAg Avatar answered Oct 22 '25 18:10

EJAg