Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot: rescale axis (log) and cut axis

I want to plot a very simple boxplot like this in R:

desired graph

enter image description here

It is a log-link (Gamma distributed: jh_conc is a hormone concentration variable) Generalized linear model of a continuous dependent variable (jh_conc) for a categorical grouping variable (group: type of bee)

My script that I already have is:

> jh=read.csv("data_jh_titer.csv",header=T)
> jh
           group     jh_conc
1         Queens  6.38542714
2         Queens 11.22512563
3         Queens  7.74472362
4         Queens 11.56834171
5         Queens  3.74020100
6  Virgin Queens  0.06080402
7  Virgin Queens  0.12663317
8  Virgin Queens  0.08090452
9  Virgin Queens  0.04422111
10 Virgin Queens  0.14673367
11       Workers  0.03417085
12       Workers  0.02449749
13       Workers  0.02927136
14       Workers  0.01648241
15       Workers  0.02150754

fit1=glm(jh_conc~group,family=Gamma(link=log), data=jh) 

ggplot(fit, aes(group, jh_conc))+
      geom_boxplot(aes(fill=group))+
      coord_trans(y="log")

the resulting plot looks like this:

enter image description here

My question is: what (geom) extensions can I use to split the y-axis and rescale them different? Also how do I add the black circles (averages; which are calculated on a log scale and then back-transformed to the original scale) horizontal lines which are significance levels based on posthoc tests performed on log transformed data: ** : p<0.01, *** :p< 0.001?

like image 863
H.F.S C. Avatar asked Sep 06 '25 02:09

H.F.S C.


1 Answers

You can't create a broken numeric axis in ggplot2 by design, mainly because it visually distorts the data/differences being represented and is considered misleading.

You can however use scale_log10() + annotation_logticks() to help condense data across a wide range of values or better show heteroskedastic data. You can also use annotate to build out your p-value representation stars and bars.

Also you can easily grab information from a model using it's named attributes, here we care about fit$coef:

# make a zero intercept version for easy plotting
fit2 <- glm(jh_conc ~ 0 + group, family = Gamma(link = log), data = jh)
# extract relevant group means and use exp() to scale back
means <- data.frame(group = gsub("group", "",names(fit2$coef)), means = exp(fit2$coef))

ggplot(fit, aes(group, jh_conc)) +
    geom_boxplot(aes(fill=group)) +
    # plot the circles from the model extraction (means)
    geom_point(data = means, aes(y = means),size = 4, shape = 21, color = "black", fill = NA) +
    # use this instead of coord_trans
    scale_y_log10() + annotation_logticks(sides = "l") +
    # use annotate "segment" to draw the horizontal lines
    annotate("segment", x = 1, xend = 2, y = 15, yend = 15) +
    # use annotate "text" to add your pvalue *'s
    annotate("text", x = 1.5, y = 15.5, label = "**", size = 4) +
    annotate("segment", x = 1, xend = 3, y = 20, yend = 20) +
    annotate("text", x = 2, y = 20.5, label = "***", size = 4) +
    annotate("segment", x = 2, xend = 3, y = .2, yend = .2) +
    annotate("text", x = 2.5, y = .25, label = "**", size = 4) 

enter image description here

like image 186
Nate Avatar answered Sep 08 '25 01:09

Nate