Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pairwise distance python (one base vector against many others)

I have a base vector (consisting of 1's and 0's) and I want to find the cosine distance to 50,000 other vectors (also consisting of 1's and 0's). I found many ways to calculate an entire matrix of pairwise distance, but I'm not interested in that. Rather, I'm just interested in getting the 50,000 distances of my base vector against each other vector (and then sorting to find the top 5). What's the fastest way I could achieve this?

like image 626
Green Avatar asked Dec 08 '25 05:12

Green


1 Answers

The vectorized operation is exactly the same as doing them individually, as long as you are careful with the axes. Here I have individual "other" vectors in each row:

others = numpy.random.randint(0,2,(10,10))
base = numpy.random.randint(0,2,(10,1))
d = numpy.inner(base.T, others) / (numpy.linalg.norm(others, axis=0) * numpy.linalg.norm(base))
like image 89
Benjamin Avatar answered Dec 10 '25 17:12

Benjamin