I have 128 vectors of embeddings
image.shape = torch.Size([128, 512])
text.shape = torch.Size([128, 512])
And I want to calculate the tensor containing the cosine similarity between all elements (i.e:
cosine.shape = torch.Size([128, 128])
Where the first row is the cosine similarity between the 1st image and all text (128), etc.
At the moment I'm only doing this, but the result is a one-dimension array containing only N cosine similarities.
cosine_similarity = torch.nn.CosineSimilarity()
cosine = cosine_similarity(image, text)
How can I do it? I tried to transpose text but didn't work
A simpler and elegant solution:
import torch.nn.functional as F
similarity_matrix = F.cosine_similarity(image.unsqueeze(0), text.unsqueeze(1), dim=2)
similarity_matrix
has the shape 128x128.
Explanation:
As explained in its documentation, F.cosine_similarity(x1, x2, dim)
returns the cosine similarity between x1
and x2
along dim
, as long as x1
and x2
can be broadcasted to a common shape.
Your original tensors image
and text
have the shape 128x512 each, so after applying the F.cosine_similarity
function on dim=1
, you get as output a one-dimensional tensor of size 128.
To get an output of shape 128x128, you must make image
and text
broadcastable to that shape. In other words, you must reshape image
to 1x128x512 and text
to 128x1x512.
Reference:
Item 6 on this blog post
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With