Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualize the cosine similarity scores calculated using pretrained word embeddding in SpaCy

I have used SpaCy's pretrained model 'en_core_web_lg' to find the cosine distance between a group of values and attributes. I wanted to visualize the relationship of how close a word is from the other word, very much similar to clustering.

Here is the link to the table which contains similarity scores for each value vs attribute

Here the columns are the attributes for which i am trying to find the similarity score, while the row are the values for which i am trying to find what attribute it is most likely to be classified

This is the output i am trying to achieve. Please take a look at it

like image 784
Arpit Sah Avatar asked Dec 11 '25 02:12

Arpit Sah


1 Answers

If you want a plot similar to this: tSNE plot you need to reduce the dimensionality of your word vectors to 2 dimensions.

So, you have to apply to the desired word vectors a dimensionality reduction algorithm, such as t-SNE (which is also implemented in scikit-learn).

Similarity scores are not sufficient to do this; you need whole vectors.

Here, there is a nice Kaggle tutorial about t-SNE for visualizing word vectors. You can customize it, choosing only the words in which you are interested.

like image 189
Stefano Fiorucci - anakin87 Avatar answered Dec 14 '25 04:12

Stefano Fiorucci - anakin87