To quickly visualize the differences between measurements, I want to use gnuplot to draw two (later multiple) boxplots combined in a single plot.
Basically I want to visualize the Five-number-summary (Min. 1st Qu. Median Mean 3rd Qu. Max.) of each measurement.
Each column in my 'datafile' represents samples of a measurement.
My data is in this form:
A B C D
1.008 1.008 . .
0.909 0.909 . .
0.975 0.975
2.647 2.647
6.530 1.901
1.819 0.909
1.819 0.909
2.695 0.909
0.529 0.529
0.964 0.964
2.728 0.909
1.819 0.909
4.133 1.108
11.275 6.133
5.920 5.920
. .
and I would like it to look like the boxplot demo.
However I cannot get the demo to work since they seem to use a third column to slide one boxplot to the right, but I do not really understand how that works.
For clarification, in R I would do something like this:
par(mfrow=c(1,3))
b1 <- boxplot(datafile$A)
b2 <- boxplot(datafile$B)
b3 <- boxplot(datafile$C)
I'm also wondering how I can plot the boxplots on the same scale. I'm worried that the few really high values might stretch the max. whiskers of the boxplot so much that the box itself becomes too tiny for me to see differences between the medians of the two boxes.
Edit:
The suggested solution was ok until I tried to also plot the rest of my data. If I plot my data the plots become so crowded that it's impossible to see something.
Below is an example with only the first 1000 entries of the rest of my data.

How can I include the outliers into the boxes? (I do not want to discard them.)
In the examples they use a fixed number to set each boxplot:
plot 'data.txt' using (0):1 with boxplot
plots the data in the first column placed at the x-value 0. For two plots it is accordingly:
set style data boxplot
plot 'data.txt' using (0):1, '' using (1):2
Gnuplot cannot determine automatically the number of columns, but you can achieve some kind of automatization as follows:
file = 'data.txt'
header = system('head -1 '.file);
N = words(header)
set xtics ('' 1)
set for [i=1:N] xtics add (word(header, i) i)
set style data boxplot
unset key
plot for [i=1:N] file using (i):i
If I duplicate the two columns you showed, and label them with A B C D, I get the following plot with gnuplot 4.6.3:

As you see, outliers aren't taken into account. To hide the outliers, use set style boxplot nooutliers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With