Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cosine distance between two matrices

Take two matrices, arr1, arr2 of size mxn and pxn respectively. I'm trying to find the cosine distance of their respected rows as a mxp matrix. Essentially I want to take the the pairwise dot product of the rows, then divide by the outer product of the norms of each rows.

import numpy as np
def cosine_distance(arr1, arr2):
    numerator = np.dot(arr1, arr2.T)
    denominator = np.outer(
        np.sqrt(np.square(arr1).sum(1)),
        np.sqrt(np.square(arr2).sum(1)))
   return np.nan_to_num(np.divide(numerator, denominator))

I Think this should be returning an mxn matrix with entries in [-1.0, 1.0] but for some reason I'm getting values out of that interval. I'm thinking that my one of these numpy functions is doing something other than what I think it does.

like image 620
Kevin Johnson Avatar asked Sep 09 '25 10:09

Kevin Johnson


1 Answers

It sounds like you need to divide by the outer product of the L2 norms of your arrays of vectors:

arr1.dot(arr2.T) / np.outer(np.linalg.norm(arr1, axis=1),
                            np.linalg.norm(arr2, axis=1))

e.g.

In [4]: arr1 = np.array([[1., -2., 3.],
                         [0., 0.5, 2.],
                         [-1., 1.5, 1.5],
                         [2., -0.5, 0.]])

In [5]: arr2 = np.array([[0., -3., 1.],
                         [1.5, 0.25, 1.]])

In [6]: arr1.dot(arr2.T)/np.outer(np.linalg.norm(arr1, axis=1),
                                  np.linalg.norm(arr2, axis=1))
Out[6]: 
array([[ 0.76063883,  0.58737848],
       [ 0.0766965 ,  0.56635211],
       [-0.40451992,  0.08785611],
       [ 0.2300895 ,  0.7662411 ]])
like image 134
xnx Avatar answered Sep 10 '25 23:09

xnx