Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy - correlation coefficient and related statistical functions don't give same results

For data X = [0,0,1,1,0]and Y = [1,1,0,1,1]

>> np.corrcoef(X,Y) 

returns

array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])

However, I cannot reproduce this result using np.var and np.cov given the equation shown in http://docs.scipy.org/doc/numpy/reference/generated/numpy.corrcoef.html:

>> np.cov([0,0,1,1,0],[1,1,0,1,1])/sqrt(np.var([0,0,1,1,0])*np.var([1,1,0,1,1]))

array([[ 1.53093109, -0.76546554],
       [-0.76546554,  1.02062073]])

What's going on here?

like image 566
neither-nor Avatar asked Aug 31 '25 22:08

neither-nor


1 Answers

This is because, np.var default delta degrees of freedom is 0, not 1.

In [57]:

X = [0,0,1,1,0]
Y = [1,1,0,1,1]
np.corrcoef(X,Y) 
Out[57]:
array([[ 1.        , -0.61237244],
       [-0.61237244,  1.        ]])
In [58]:

V = np.sqrt(np.array([np.var(X, ddof=1), np.var(Y, ddof=1)])).reshape(1,-1)
np.matrix(np.cov(X,Y))
Out[58]:
matrix([[ 0.3 , -0.15],
        [-0.15,  0.2 ]])
In [59]:

np.matrix(np.cov(X,Y))/(V*V.T)
Out[59]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

Or looks it the otherway:

In [70]:

V=np.diag(np.cov(X,Y)).reshape(1,-1) #the diagonal elements
In [71]:

np.matrix(np.cov(X,Y))/np.sqrt(V*V.T)
Out[71]:
matrix([[ 1.        , -0.61237244],
        [-0.61237244,  1.        ]])

What is really going on, np.cov(m, y=None, rowvar=1, bias=0, ddof=None), when bias and ddof both not provided, the default normalization is by N-1, N being the number of observation. So, that is equivalent to have delta degrees of freedom of 1. Unfortunately, the default for np.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False) has the default delta degrees of freedom of 0.

Whenever unsure, the safest way is to grab the diagonal elements of the covariance matrix rather than calculate var separately, to ensure consistent behavior.

like image 110
CT Zhu Avatar answered Sep 03 '25 14:09

CT Zhu