Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare boxplot with Wilcoxon test

I am comparing two groups of lengths (different individuals) with boxplots using ggplot2 package in R. I want to compare the two distributions but so far the only way I found to use a wilcoxon test is stat_compare_means from the "ggpubr" package. Is it the right way to compare the distributions? Can I compare the distribution and not the mean specifically? As you can see, I am a newby in the stat world. Thank you!

like image 527
FKM Avatar asked May 07 '26 15:05

FKM


1 Answers

Base R has a built-in function to do a Wilcoxon test: wilcox.test. You can feed it two numeric vectors or a formula relating a numeric variable to a factor variable (with two levels).

# vector input
setosa_SL <- iris$Sepal.Length[which(iris$Species == "setosa")]
versicolor_SL <- iris$Sepal.Length[which(iris$Species == "versicolor")]
wilcox.test(setosa_SL, versicolor_SL)

    Wilcoxon rank sum test with continuity correction

data:  setosa_SL and versicolor_SL
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0 

# formula input
wilcox.test(Sepal.Length ~ Species, data = iris[which(iris$Species != "virginica"),])

    Wilcoxon rank sum test with continuity correction

data:  Sepal.Length by Species
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0

However, iris$Species has three levels. What if we wanted to do all three?

The base stats package also has pairwise.wilcox.test.

pairwise.wilcox.test(iris$Sepal.Length, iris$Species)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  iris$Sepal.Length and iris$Species 

           setosa  versicolor
versicolor 1.7e-13 -         
virginica  < 2e-16 5.9e-07  

P value adjustment method: holm 

Now, I suspect you want to graph this. You need pairwise_wilcox_test and add_xy_position from the rstatix package and stat_pvalue_manual from the ggpubr package. The pairwise_wilcox_test function is an improvement over the base R pairwise.wilcox.text since returns a tibble rather than a list of class htest.

library(rtatix)
librarr(ggpubr)

iris %>% pairwise_wilcox_test(Sepal.Length ~ Species)

# A tibble: 3 x 9
  .y.          group1     group2        n1    n2 statistic        p    p.adj p.adj.signif
* <chr>        <chr>      <chr>      <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
1 Sepal.Length setosa     versicolor    50    50     168.  8.35e-14 1.67e-13 ****        
2 Sepal.Length setosa     virginica     50    50      38.5 6.40e-17 1.92e-16 ****        
3 Sepal.Length versicolor virginica     50    50     526   5.87e- 7 5.87e- 7 ****    

The function add_xy_positions adds x and y coordinate information to make this data more suitable for plotting, and stat_pvalue_manual adds a layer containing the p-value information.

ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot() +
  stat_pvalue_manual(iris %>% 
                       pairwise_wilcox_test(Sepal.Length ~ Species) %>% 
                       add_xy_position())

enter image description here

like image 117
Ben Norris Avatar answered May 10 '26 04:05

Ben Norris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!