I have 15 data sets each of which I have fitted with a curve. Now I am trying to determine the quality of fit by doing a chi-squared test; however, when I run my code:
chi, p_value = stats.chisquare(n, y)
where n
is the actual data and y
is the predicted data, I get the error
For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are: 0.1350785306607008
I can't seem to understand why they have to add up to the same total - are there any ways I can run a chi-squared test without muddling my data?
This chi-squared test for goodness of fit indeed requires the sums of both inputs to be (almost) the same. So, if you want to check whether your model fits the observations n
well, you have to adjust the counts y
of your model as described e.g. here. This could be done with a small wrapper:
from scipy.stats import chisquare
import numpy as np
def cs(n, y):
return chisquare(n, np.sum(n)/np.sum(y) * y)
Another possibility would be to go for R and use chisq.test
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With