Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlaying data's density histogram with dlnorm in R, ggplot

I'm using Canada's census data with variables Wage on the x-axis and the density on the y-axis. I'm trying to overlay the graph I've created with the log-normal distribution dlnorm but I'm not sure what to use as the meanlog and sdlog parameter values. I've tried using mean(data$Wages) and sd(data$Wages), as well as taking the natural logarithm of both, etc. Nothing gives me a graph remotely similar to the density histogram I have generated.

Is this because my data is not log-normal? How can I find the correct meanlog and sdlog parameters?

This is my code:

inc_plot <- data_adults %>%
  ggplot(aes(x=Wages)) +
  geom_histogram(aes(y=..density..),  bins=100,fill="transparent", colour="black")+
  scale_x_continuous(labels=scales::comma) +
  stat_function(fun = dlnorm,
      args = list(meanlog = 48637.91, sdlog = 62459.15),
      col = "red")

inc_plot

The current parameters are by using the aforementioned mean() and sd() functions.

enter image description here

like image 570
Luke Avatar asked Nov 19 '25 07:11

Luke


1 Answers

If you set meanlog = mean(log(your_data)) and likewise sdlog = sd(log(your_data)) the density should approach the histogram.

library(ggplot2)


df <- data.frame(x = rlnorm(1e4))

ggplot(df, aes(x)) +
  geom_histogram(
    aes(y = after_stat(density)),
    bins = 100, fill = "transparent", colour = "black"
  ) +
  stat_function(
    fun = dlnorm,
    args = list(meanlog = mean(log(df$x)), sdlog = sd(log(df$x))),
    colour = "red"
  )

Created on 2021-08-23 by the reprex package (v2.0.1)

An alternative would be to use ggh4x::stat_theodensity(distri = "lnorm", colour = "red"). (disclaimer: I'm the author of ggh4x)

like image 171
teunbrand Avatar answered Nov 21 '25 21:11

teunbrand



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!