Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Boxplots by base R and ggplot2 do not match

Tags:

r

ggplot2

boxplot

I have a simple dataset. When I generate boxplot for the data by base R and ggplot separately, they do not match. In fact the base R boxplot is consistent with the summary function.

library(tidyverse)
library(ggplotify)
library(patchwork)

df <- read.csv("test_boxplot_data.csv")

summary(df)

p1 <- as.ggplot(~boxplot(df$y, outline=FALSE))
p2 <- ggplot(df, aes(y=y)) + geom_boxplot(outlier.shape = NA) + ylim(0,100)

p1 + p2 + plot_layout(ncol = 2)


Generated plot kept here.

Any clue what is happening? It is also surprising that ggplot throws warning that "Removed 845 rows containing non-finite values (stat_boxplot)" but there is no NA in the data.

like image 519
Soumitra Avatar asked Oct 25 '25 10:10

Soumitra


1 Answers

From: "Removed 845 rows containing non-finite values (stat_boxplot)". It just so happens that the data contains 845 points > 100. These points are being deleted in the calculation of the box plot.

From the first line of help for ylim():
"This is a shortcut for supplying the limits argument to the individual scales. By default, any values outside the limits specified are replaced with NA. Be warned that this will remove data outside the limits and this can produce unintended results. For changing x or y axis limits without dropping data observations, see coord_cartesian()."

This should provide the desired graph:

ggplot(df, aes(y=y)) + geom_boxplot(outlier.shape = NA) + 
       coord_cartesian(ylim=c(0,100))

enter image description here

like image 67
Dave2e Avatar answered Oct 26 '25 22:10

Dave2e



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!