Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python boxplot showing means and confidence intervals

How can I create a boxplot like the one below, in Python? I want to depict means and confidence bounds only (rather than proportions of IQRs, as in matplotlib boxplot).

example

I don't have any version constraints, and if your answer has some package dependency that's OK too. Thanks!

like image 389
America Avatar asked Sep 18 '25 00:09

America


2 Answers

Use errorbar instead. Here is a minimal example:

import matplotlib.pyplot as plt

x = [2, 4, 3]
y = [1, 3, 5]
errors = [0.5, 0.25, 0.75]

plt.figure()
plt.errorbar(x, y, xerr=errors, fmt = 'o', color = 'k')
plt.yticks((0, 1, 3, 5, 6), ('', 'x3', 'x2', 'x1','')) 

enter image description here

Note that boxplot is not the right approach; the conf_intervals parameter only controls the placement of the notches on the boxes (and we don't want boxes anyway, let alone notched boxes). There is no way to customize the whiskers except as a function of IQR.

like image 90
America Avatar answered Sep 19 '25 13:09

America


Thanks to America, I propose a way to automatize this kind of graph a little bit.

Below an example of code generating 20 arrays from a normal distribution with mean=0.25 and std=0.1. I used the formula W = t * s / sqrt(n), to calculate the margin of error of the confidence interval, with t the constant from the t distribution (see scipy.stats.t), s the standard deviation and n the number of values in an array.

list_samples=list() # making a list of arrays
for i in range(20):
    list.append(np.random.normal(loc=0.25, scale=0.1, size=20))

def W_array(array, conf=0.95): # function that returns W based on the array provided
    t = stats.t(df = len(array) - 1).ppf((1 + conf) /2)
    W = t * np.std(array, ddof=1) / np.sqrt(len(array))
    return W # the error

W_list = list()
mean_list = list()
for i in range(len(list_samples)):
    W_list.append(W_array(list_samples[i])) # makes a list of W for each array
    mean_list.append(np.mean(list_samples[i])) # same for the means to plot

plt.errorbar(x=mean_list, y=range(len(list_samples)), xerr=W_list, fmt='o', color='k')
plt.axvline(.25, ls='--') # this is only to demonstrate that 95%
                          # of the 95% CI contain the actual mean
plt.yticks([])
plt.show();

resulting figure

like image 45
Sébastien Wieckowski Avatar answered Sep 19 '25 14:09

Sébastien Wieckowski