Chi Squared Analysis on Data sets that don't have matching frequencies

Question

I have 15 data sets each of which I have fitted with a curve. Now I am trying to determine the quality of fit by doing a chi-squared test; however, when I run my code:

chi, p_value = stats.chisquare(n, y)

where n is the actual data and y is the predicted data, I get the error

For each axis slice, the sum of the observed frequencies must agree with the sum of the expected frequencies to a relative tolerance of 1e-08, but the percent differences are: 0.1350785306607008

I can't seem to understand why they have to add up to the same total - are there any ways I can run a chi-squared test without muddling my data?

frank · Accepted Answer

This chi-squared test for goodness of fit indeed requires the sums of both inputs to be (almost) the same. So, if you want to check whether your model fits the observations n well, you have to adjust the counts y of your model as described e.g. here. This could be done with a small wrapper:

from scipy.stats import chisquare
import numpy as np

def cs(n, y):
    return chisquare(n, np.sum(n)/np.sum(y) * y)

Another possibility would be to go for R and use chisq.test.

from scipy.stats import chisquare
import numpy as np

def cs(n, y):
    return chisquare(n, np.sum(n)/np.sum(y) * y)

Another possibility would be to go for R and use chisq.test.

Chi Squared Analysis on Data sets that don't have matching frequencies

Tags:

python

statistics

bigmac42

1 Answers

frank

Recent Activity

Donate For Us

Chi Squared Analysis on Data sets that don't have matching frequencies

Tags:

python

statistics

bigmac42

1 Answers

frank

Related questions

Recent Activity

Donate For Us