Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chi Squared Analysis on Data sets that don't have matching frequencies

I have 15 data sets each of which I have fitted with a curve. Now I am trying to determine the quality of fit by doing a chi-squared test; however, when I run my code:

chi, p_value = stats.chisquare(n, y)

where n is the actual data and y is the predicted data, I get the error

For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are: 0.1350785306607008

I can't seem to understand why they have to add up to the same total - are there any ways I can run a chi-squared test without muddling my data?

like image 563
bigmac42 Avatar asked Oct 19 '25 11:10

bigmac42


1 Answers

This chi-squared test for goodness of fit indeed requires the sums of both inputs to be (almost) the same. So, if you want to check whether your model fits the observations n well, you have to adjust the counts y of your model as described e.g. here. This could be done with a small wrapper:

from scipy.stats import chisquare
import numpy as np

def cs(n, y):
    return chisquare(n, np.sum(n)/np.sum(y) * y)

Another possibility would be to go for R and use chisq.test.

like image 72
frank Avatar answered Oct 20 '25 23:10

frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!