I want to compute binomial probabilities on python. I tried to apply the formula:
probability = scipy.misc.comb(n,k)*(p**k)*((1-p)**(n-k))
Some of the probabilities I get are infinite. I checked some values for which p=inf. For one of them, n=450,000 and k=17. This value must be greater than 1e302 which is the maximum value handled by floats.
I then tried to use sum(np.random.binomial(n,p,numberOfTrials)==valueOfInterest)/numberOfTrials
This draws numberOfTrials samples and computes the average number of times the value valueOfInterest is drawn.
This doesn't raise any infinite value. However, is this a valid way to proceed? And why this way wouldn't raise any infinite value whereas computing the probabilities does?
However, for N much larger than n, the binomial distribution remains a good approximation, and is widely used.
For large values of n, the distributions of the count X and the sample proportion are approximately normal. This result follows from the Central Limit Theorem. The mean and variance for the approximately normal distribution of X are np and np(1-p), identical to the mean and variance of the binomial(n,p) distribution.
The binomial distribution is calculated by multiplying the probability of success raised to the power of the number of successes and the probability of failure raised to the power of the difference between the number of successes and the number of trials.
The n observations will be nearly independent when the size of the population is much larger than the size of the sample. As a rule of thumb, the binomial sampling distribution for counts can be used when the population is at least 20 times as large as the sample.
Because you're using scipy I thought I would mention that scipy already has statistical distributions implemented. Also note that when n is this large the binomial distribution is well approximated by the normal distribution (or Poisson if p is very small).
n = 450000
p = .5
k = np.array([17., 225000, 226000])
b = scipy.stats.binom(n, p)
print b.pmf(k)
# array([  0.00000000e+00,   1.18941527e-03,   1.39679862e-05])
n = scipy.stats.norm(n*p, np.sqrt(n*p*(1-p)))
print n.pdf(k)
# array([  0.00000000e+00,   1.18941608e-03,   1.39680605e-05])
print b.pmf(k) - n.pdf(k)
# array([  0.00000000e+00,  -8.10313274e-10,  -7.43085142e-11])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With