Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use scipy lognormal distribution to fit data with small values, then show in matplotlib

I have a data set which contains values from 0 to 1e-5. I guess the data can be described by lognormal distribution. So I use scipy.stats.lognorm to fit my data and want to plot the origin data and the fitting distribution on a same figure by using matplotlib.

Firstly, I plot the sample by histogram:
enter image description here

Then, I add the fitting distribution by line plot. However, this will change the Y-axis to a very large number:
enter image description here

So the origin data (sample) cannot be seen on the figure!

I've check all variables and I found that the variable pdf_fitted is so large (>1e7). I really don't understand why a simple fit scistats.lognorm.fit to a sample that was generated by the same distribution scistats.lognorm.pdf doesn't work. Here is the codes to demonstrate my problem:

from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np

# generate a sample for x between 0 and 1e-5
x = np.linspace(0, 1e-5, num=1000)
y = scistats.lognorm.pdf(x, 3, loc=0, scale=np.exp(10))
h = plt.hist(y, bins=40) # plot the sample by histogram
# plt.show()

# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(y)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(
    x, *param[:-2], loc=param[-2], scale=param[-1])
plt.plot(x, pdf_fitted, label="Fitted Lognormal distribution")
plt.ticklabel_format(style='sci', scilimits=(-3, 4), axis='x')
plt.legend()
plt.show()
like image 755
yoursbh Avatar asked Feb 03 '26 10:02

yoursbh


1 Answers

The problem

The immediate problem that you're having is that your fit is really, really bad. You can see this if you set the x and y scale on the plot to log, like with plt.xscale('log') and plt.yscale('log'). This lets you see both your histogram and your fitted data on a single plot:

enter image description here

so it's off by many orders of magnitude in both directions.

The fix

Your whole approach to generating a sample from the probability distribution represented by stats.lognorm and fitting it was wrong. Here's a correct way to do it, using the same parameters for the lognorm distribution that you supplied in your question:

from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np

plt.figure(figsize=(12,7))
realparam = [.1, 0, np.exp(10)]

# generate pdf data around the mean value
m = realparam[2]
x = np.linspace(m*.6, m*1.4, num=10000)
y = scistats.lognorm.pdf(x, *realparam)

# generate a matching random sample
sample = scistats.lognorm.rvs(*realparam, size=100000)
# plot the sample by histogram
h = plt.hist(sample, bins=100, density=True)

# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(sample)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(x, *param)
plt.plot(x, pdf_fitted, lw=5, label="Fitted Lognormal distribution")
plt.legend()
plt.show()

Output:

Log-normal distribution parameters :  (0.09916091013245995, -215.9562383088556, 22245.970148671593)

enter image description here

like image 127
tel Avatar answered Feb 05 '26 03:02

tel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!