Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to handle OOV words when using pretrained embeddings in PyTorch

I am using word2vec pretrained embedding in PyTorch (following code here). However, it does not seem to handle unseen words. Is there any good way to solve it?

like image 411
Mr.cysl Avatar asked Oct 29 '25 12:10

Mr.cysl


1 Answers

FastText builds character ngram vectors as part of model training. When it finds an OOV word, it sums the character ngram vectors in the word to produce a vector for the word. You can find more detail here.

like image 139
polm23 Avatar answered Nov 01 '25 06:11

polm23