Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram (counts): Change Scale of y axis [closed]

Tags:

r

scale

histogram

Since I want to compare several distibutions, I am creating histrograms of the same variable but for different years. However, the scale of the y axis changes, because the highest point of the frequencies is different every year. I want to create histograms in which all y axis display the same range, even if there are no frequencies for that point.

More precisely, in one year the peak of the disribution is 30 counts, in another year it is 35. on the graphs, 30 looks the same as 35 in the other one because the scale of the y-axis changes.

I have tried ylim=(35), but that only leads the the error "invalid value for ylim".

Thanks!

like image 873
Liviliv Avatar asked Sep 17 '25 12:09

Liviliv


1 Answers

Type ?hist into your console to see the documentation. You'll see ylim is for "the range of ... y values". There is an example given showing how ylim is used, hist(x, freq = FALSE, ylim = c(0, 0.2)). There you can see that you need to give ylim a vector containing the lower limit and upper limit.

With a histogram you almost always want the lower limit to be zero (failure to do so is generally regarded as a statistical sin). So as pointed out in the comments above, you could do with setting ylim=c(0,35).

Sample with a minimal example:

#Sets frequencies with which x and y data will appear
yfreq <- c(1:10, 10:1) #frequencies go up to 10 and down again
xfreq <- c(1:7, rep(7, times=6), 7:1) #frequencies go up to 7 and down again

xdata <- rep(1:length(xfreq), times=xfreq)
ydata <- rep(1:length(yfreq), times=yfreq)

par(mfrow=c(2,2))
hist(ydata, breaks=((0:max(ydata)+1)-0.5), ylim=c(0,10),
     main="Hist of y with ylim set")
hist(xdata, breaks=((0:max(xdata)+1)-0.5), ylim=c(0,10),
     main="Hist of x with ylim set")
hist(ydata, breaks=((0:max(ydata)+1)-0.5),
     main="Hist of y without ylim set")
hist(xdata, breaks=((0:max(xdata)+1)-0.5),
     main="Hist of x without ylim set")

Histograms in R with and without setting ylim

So setting ylim appropriately makes the side-by-side comparison of histogram work better.

In practice it's convenient to do this automatically, just by finding what's the highest peak in both your datasets and using that in your ylim. How you do that depends on whether you are constructing a histogram of frequencies (which is what R does automatically if your breaks are equidistant, unless you specify otherwise) or of densities, but one way is to create — but not plot — histogram objects and extract either their counts or their density as appropriate.

#Make histogram object but don't draw it
yhist <- hist(ydata, breaks=((0:max(ydata)+1)-0.5), plot=FALSE)
xhist <- hist(xdata, breaks=((0:max(xdata)+1)-0.5), plot=FALSE)

#Find highest count, use it to set ylim of histograms of counts
highestCount <- max(xhist$counts, yhist$counts)
hist(ydata, breaks=((0:max(ydata)+1)-0.5), ylim=c(0,highestCount),
     main="Hist of y with automatic ylim")
hist(xdata, breaks=((0:max(xdata)+1)-0.5), ylim=c(0,highestCount),
     main="Hist of x with automatic ylim")

#Same but for densities
highestDensity <- max(xhist$density, yhist$density)
hist(ydata, breaks=((0:max(ydata)+1)-0.5), 
     freq=FALSE, ylim=c(0,highestDensity),
     main="Hist of y with automatic ylim")
hist(xdata, breaks=((0:max(xdata)+1)-0.5),
     freq=FALSE, ylim=c(0,highestDensity),
     main="Hist of x with automatic ylim")

Side by side histograms in R with automatic y limits on frequency or density

like image 144
Silverfish Avatar answered Sep 19 '25 03:09

Silverfish