I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.
Here is my code to create my boxplot
require(ggplot2) ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome",  y="Ratio of Portuguese to Dutch/British ships") +  geom_boxplot(outlier.size=2,outlier.colour="green") +  stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") +  ggtitle("Portugese Sea Battles") Can anyone help? I knew this is correct, I just want to label the outliers.
We can identify and label these outliers by using the ggbetweenstats function in the ggstatsplot package. To label outliers, we're specifying the outlier. tagging argument as "TRUE" and we're specifying which variable to use to label each outlier with the outlier.
boxplot() does not identify outliers, but it is quite easy to program, as boxplot. stats() supplies a list of outliers.. You can add a density plot (barcode plot) to the boxplot.
To highlight outliers in a boxplot, we can create the boxplot with the help of Boxplot function of car package by defining the id.
When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 - 1.5 * IQR or Q3 + 1.5 * IQR).
The following is a reproducible solution that uses dplyr and the built-in mtcars dataset. 
Walking through the code: First, create a function, is_outlier that will return a boolean TRUE/FALSE if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by our variable (cyl in this example, in your example, this would be PortugesOutcome) and we add a variable outlier in the call to mutate (if the drat variable is an outlier [note this corresponds to RatioPort2Dutch in your example], we will pass the drat value, otherwise we will return NA so that value is not plotted). Finally, we plot the results and plot the text values via geom_text and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust so that we can see the values next to, rather than on top of, the outlier points.
library(dplyr) library(ggplot2)  is_outlier <- function(x) {   return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) }  mtcars %>%   group_by(cyl) %>%   mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>%   ggplot(., aes(x = factor(cyl), y = drat)) +     geom_boxplot() +     geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3) 
You can do this simply within ggplot itself, using an appropriate stat_summary call.
ggplot(mtcars, aes(x = factor(cyl), y = drat, fill = factor(cyl))) +    geom_boxplot() +   stat_summary(     aes(label = round(stat(y), 1)),     geom = "text",      fun.y = function(y) { o <- boxplot.stats(y)$out; if(length(o) == 0) NA else o },     hjust = -1   ) 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With