Say I have this data frame df
:
structure(list(max.diff = c(6.02, 7.56, 7.79, 7.43, 7.21, 7.65,
8.1, 7.35, 7.57, 9.09, 6.21, 8.2, 6.82, 7.18, 7.78, 8.27, 6.85,
6.72, 6.67, 6.99, 7.32, 6.59, 6.86, 6.02, 8.5, 7.25, 5.18, 8.85,
5.44, 6.44, 7.85, 6.25, 9.06, 8.19, 5.08, 6.26, 8.92, 6.83, 6.5,
7.55, 7.31, 5.83, 5.55, 4.29, 8.29, 8.72, 9.5)), class = "data.frame", row.names = c(NA,
-47L), .Names = "max.diff")
I want to plot this as a density plot using ggplot2
:
p <- ggplot(df, aes(x = max.diff))
p <- p + geom_histogram(stat = "density")
print(p)
which gives,
Now, a naive question: why doesn't this give the same result?
p <- ggplot(df, aes(x = max.diff))
p <- p + geom_histogram(aes(y = ..density..))
print(p)
Is this because of the chosen binwidth
or number of bins
or some other parameter? So far, I haven't been able to tweak those parameters to make them the same. Or am I plotting something quite different?
The second example is rescaling the histogram counts so that bar areas integrate to 1, but is otherwise the same as the standard ggplot2 histogram. You can adjust the number of bars with the bins
or the binwidth
arguments.
The first example is calculating a kernel density estimate and plotting the output (the estimated density at each x-value) as a histogram. You can change the amount of smoothing of the density estimate with the adjust
argument, and the number of points at which the density is calculated using the n
argument.
The default for geom_histogram
is bins=30
. The default for stat="density"
is adjust=1
and n=512
(stat="density"
is using the density
function to generate the values). The stat="density"
output is much smoother than the histogram output due to the way density
chooses the bandwidth for the density estimate. Reducing the adjust
argument reduces the amount of smoothing.
The first two examples below are your plots. The second two use adjustments to the respective parameters to get two plots that are roughly similar, though not exactly the same because the kernel density estimate is still smoothing the output. This is just for illustration. The kernel density estimate and the histogram are two different, thought related, things.
ggplot(df, aes(x = max.diff)) +
geom_histogram(stat = "density") +
ggtitle("stat='density'; default paramters")
ggplot(df, aes(x = max.diff)) +
geom_histogram(aes(y = ..density..), colour="white") +
ggtitle("geom_histogram; default parameters")
ggplot(df, aes(x = max.diff)) +
geom_histogram(stat = "density", n=2^5, adjust=0.1) +
ggtitle("stat='density'; n=2^5; Adjust=0.1")
ggplot(df, aes(x = max.diff)) +
geom_histogram(aes(y = ..density..), bins=2^5, colour="white") +
ggtitle("geom_histogram; bins=2^5")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With