Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting error while plotting the dendrogram for the spearmanr correlation

I am getting an error while plotting the dendrogram for the spearmanr correlation. Below is the code I am using

corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
corr_condensed = hc.distance.squareform(1-corr)
z = hc.linkage(corr_condensed, method='average')
fig = plt.figure(figsize=(20,20))
dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)
plt.show()

Below is the error I am getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-9873c0be8dc7> in <module>()
      1 corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
----> 2 corr_condensed = hc.distance.squareform(1-corr)
      3 z = hc.linkage(corr_condensed, method='average')
      4 fig = plt.figure(figsize=(20,20))
      5 dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)

/usr/local/anaconda/lib/python3.6/site-packages/scipy/spatial/distance.py in squareform(X, force, checks)
   1844             raise ValueError('The matrix argument must be square.')
   1845         if checks:
-> 1846             is_valid_dm(X, throw=True, name='X')
   1847 
   1848         # One-side of the dimensions is set here.

/usr/local/anaconda/lib/python3.6/site-packages/scipy/spatial/distance.py in is_valid_dm(D, tol, throw, name, warning)
   1920                 if name:
   1921                     raise ValueError(('Distance matrix \'%s\' must be '
-> 1922                                      'symmetric.') % name)
   1923                 else:
   1924                     raise ValueError('Distance matrix must be symmetric.')

ValueError: Distance matrix 'X' must be symmetric.
like image 617
ashwin g Avatar asked Sep 06 '25 23:09

ashwin g


2 Answers

Variable corr might have nan values which might deform it.
Try:

corr = np.nan_to_num(corr)

Update:

skipping

    corr_condensed = hc.distance.squareform(1-corr)

works without any error for me.

So

corr = np.round(scipy.stats.spearmanr(full_data[list_of_continous]).correlation, 4)
z = hc.linkage(corr, method='average')
fig = plt.figure(figsize=(20,20))
dendrogram = hc.dendrogram(z, labels=full_data[list_of_continous].columns, orientation='left', leaf_font_size=30)
plt.show()

should work for you too.

like image 108
Anirudh R Avatar answered Sep 08 '25 14:09

Anirudh R


If you are sure the matrix is symmetric, set checks=False

corr_condensed = hc.distance.squareform(1-corr, checks=False)

like image 37
olubode Avatar answered Sep 08 '25 14:09

olubode