Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a boxplot with customized quantiles in R?

I am now dealing with some data and I want to make a boxplot showing minimum, 2.5, 25, 50, 70, 75, 97.5, and maximum. The boxplot should also have a legend showing lines with different colors to represent each quantile. Is there any way to do this? Thanks for any help.

set.seed(123)
Mydata = sample(x=100:300, size = 500, replace = T)
Mydata = c(Mydata, 1, 500)
boxplot(Mydata)

PS. I have tried the code provided by @thelatemail, but get a totally different figure in RStudio. Any solution to this? Thanks. enter image description here

like image 675
Yang Yang Avatar asked Oct 17 '25 22:10

Yang Yang


1 Answers

What you want to do cannot be generated easily using the boxplot framework.

Underlying boxplots in R is the boxplot.stats() function. Let's run it on your data:

boxplot.stats(Mydata)

$stats
[1]   1 152 204 253 300

$n
[1] 502

$conf
[1] 196.8776 211.1224

$out
[1] 500

You can see that $stats returns in order: lower whisker, 25% quantile, median, 75% quantile, upper whisker. Compare with quantile:

quantile(Mydata)

  0%  25%  50%  75% 100% 
   1  152  204  253  500

If you use geom_boxplot() from ggplot2, it's possible to redefine the values used for the box. But you can only draw the same five values: they are called ymin, lower, middle, upper and ymax.

So for example if you wanted the 2.5% quantile as lower and the 97.5% quantile as upper, you could try:

data.frame(x = 1,
           y0 = min(Mydata),
           y025 = quantile(Mydata, 0.025),
           y50 = median(Mydata),
           y975 = quantile(Mydata, 0.975),
           y100 = max(Mydata)) %>%
  ggplot(df, aes(x)) +
  geom_boxplot(aes(ymin = y0, 
                   lower = y025, 
                   middle = y50, 
                   upper = y975, 
                   ymax = y100),
               stat = "identity")

enter image description here

However, you would want to make it clear (using labels perhaps) that this is not a "standard" boxplot.

Another ggplot2 idea is to use geom_jitter to plot the data points, then add lines for the desired quantiles using geom_hline. Something like this:

library(tibble)
library(ggplot2)

Mydataq <- quantile(Mydata, probs = c(0.025, 0.25, 0.5, 0.7, 0.75, 0.975)) %>%
  as.data.frame() %>% 
  setNames("value") %>% 
  rownames_to_column(var = "quantile")

Mydataq %>% 
  ggplot() + 
  geom_hline(aes(yintercept = value, color = quantile)) + 
  geom_jitter(data = tibble(x = "Mydata", y = Mydata), 
              aes(x = x, y = y))

enter image description here

like image 51
neilfws Avatar answered Oct 20 '25 13:10

neilfws



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!