Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to draw fitted graph and actual graph of gamma distribution in one plot?

Step.1. Load the package needed.

library(ggplot2)
library(MASS)

Step.2. Generate 10,000 numbers fitted to gamma distribution.

x <- round(rgamma(100000,shape = 2,rate = 0.2),1)
x <- x[which(x>0)]

Step.3. Draw the pdf(probability density function), supposed we don't know which distribution x fitted to.

t1 <- as.data.frame(table(x))
names(t1) <- c("x","y")
t1 <- transform(t1,x=as.numeric(as.character(x)))
t1$y <- t1$y/sum(t1[,2])
ggplot() + geom_point(data = t1,aes(x = x,y = y)) + theme_classic()

enter image description here

Step.4. From the graph, we can learn that the distribution of x is quite like gamma distribution, so we use fitdistr() in package MASS to get the parameters of shape and rate of gamma distribution.

fitdistr(x,"gamma") 
##       output 
##       shape           rate    
##   2.0108224880   0.2011198260 
##  (0.0083543575) (0.0009483429)

Step.5. Draw the actual point(black dot) and fitted graph(red line) in the same plot, and here is the question, please look the plot first.

ggplot() + geom_point(data = t1,aes(x = x,y = y)) +     
geom_line(aes(x=t1[,1],y=dgamma(t1[,1],2,0.2)),color="red") + 
theme_classic()

enter image description here

Question 1: The real parameters are shape=2, rate=0.2, and the parameters I use the function fitdistr() to get are shape=2.01, rate=0.20. These two are nearly the same, but why the fitted graph don't fit the actual point well, there must be something wrong in the fitted graph, or the way I draw the fitted graph and actual points is totally wrong, what should I do?

Question 2: After I get the parameter of the model I establish, in which way I evaluate the model, something like RSS(residual square sum) for linear model, or the p-value of shapiro.test() , ks.test() and other test? I am poor in statistical knowledge, please help me out of both the question, thank you!(ps: I have search in the Google and stackoverflow many times, but it does not work, so do not vote this question is useless,Thx!)

like image 967
Ling Zhang Avatar asked Jan 27 '26 00:01

Ling Zhang


1 Answers

For some reason, your fitted line seems to be exactly 10 times to high:

ggplot() + geom_point(data = t1,aes(x = x,y = y)) +     
  geom_line(aes(x=t1[,1],y=(dgamma(t1[,1],2,0.2))/10),color="red") + 
  theme_classic()

gives a perfect fit: enter image description here

As jbaums stated this is caused by the density per dx that is 0.1 in this case.

like image 68
RHA Avatar answered Jan 28 '26 14:01

RHA