I asked this question a little bit ago. In that, the solution seems to work sometimes. Here is an example using the mpg data set.
My goal is to place a vertical line where the median of my data occur for each facet using stat_summary. Note that when I use the solution in the linked question on the displ column, the solution works as desired. But when I use it on the cty column, multiple lines are drawn. Why is this?
Shown below is a reprex of my problem.
library(tidyverse)
mpg %>%
ggplot(aes(x=displ, group=cyl))+
geom_histogram()+
facet_grid(~cyl)+
stat_summary(aes(xintercept=stat(x), y=0), fun = median, geom = 'vline')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

mpg %>%
ggplot(aes(x=cty, group=cyl))+
geom_histogram()+
facet_grid(~cyl)+
stat_summary(aes(xintercept=stat(x), y=0), fun = median, geom = 'vline')
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2020-04-01 by the reprex package (v0.3.0)
Demetri, here is the R code that will give you what you need:
library(tidyverse)
g <- mpg %>%
ggplot(aes(x=cty)) +
geom_histogram() +
stat_summary(aes(x = 0, xintercept = stat(y), y = cty),
fun.y = median, geom = "vline", colour = "red") +
facet_grid(~ cyl)
g
The stat_summary() function is set up to compute a summary (in this case, the median) for the variable specified in its y argument. In contrast, the geom_histogram() function creates a histogram for the variable specified in its x argument. So you have to be careful with how you specify the y argument for the stat_summary() function, as seen in the code above.
Note that you dont't need to use group = cyl in your ggplot() call if you are using facet_grid() or facet_wrap() to produce multiple graphical panels. Grouping and facetting are totally different plotting operations: grouping will show different data groups in the same panel; facetting will show different data groups in different panels.
Addendum 1
To check that the summary statistics were computed correctly for each panel, the command below will come in handy:
ggplot_build(g)$data
Scroll to the bottom of the output produced by this command to find the xintercept values used by R - these should be the medians plotted in the various panels. Alternatively, extract these values directly with:
ggplot_build(g)$data[[2]]
The xintercept values can be compared with independently computed median values of cty for each cyl level to ensure agreement.
Addendum 2
The default choice of binwidth for geom_histogram() needs some attention. You can try something like this to allow variable binwidth choice across your different panels:
theme_set(theme_bw())
g <- mpg %>%
ggplot(aes(x=cty)) +
geom_histogram(binwidth = function(x) 2 * IQR(x) / (length(x)^(1/3)),
fill = "lightblue3", colour = "white") +
stat_summary(aes(x = 0, xintercept = stat(y), y = cty),
fun.y = median, geom = "vline", colour = "red2") +
facet_wrap(~ cyl, scales = "free_x")
g
See this link for other possibilities of binwidth choice: https://github.com/tidyverse/ggplot2/issues/2312.
We can pre-compute the median using group_by and mutate, which I often find more reliable and easy to understand in its behavior, and then just use geom_vline. Can't answer on the stat_summary side, but interested to know the answer.
mpg %>%
group_by(cyl) %>%
mutate(cty_med = median(cty)) %>%
ggplot(aes(x=cty))+
geom_histogram()+
facet_grid(~cyl)+
geom_vline(aes(xintercept=cty_med))

If you want to generalize this, you can just create a wrapper function that does your calculation and faceting together.
f <- function(df, fct, var) {
df %>%
group_by({{fct}}) %>%
mutate(med = median({{var}})) %>%
ggplot(aes(x={{var}}))+
geom_histogram() +
facet_grid(cols = vars({{fct}})) +
geom_vline(aes(xintercept=med))
}
f(mpg, cyl, cty)
f(mpg, cyl, displ)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With