I have an multi-task encoder/decoder model in PyTorch with a (trainable) torch.nn.Embedding embedding layer at the input.
In one particular task, I'd like to pre-train the model self-supervised (to re-construct masked input data) and use it for inference (to fill in gaps in data).
I guess for training time I can just measure loss as the distance between the input embedding and the output embedding... But for inference, how do I invert an Embedding to reconstruct the proper category/token the output corresponds to? I can't see e.g. a "nearest" function on the Embedding class...
You can do it quite easily:
import torch
embeddings = torch.nn.Embedding(1000, 100)
my_sample = torch.randn(1, 100)
distance = torch.norm(embeddings.weight.data - my_sample, dim=1)
nearest = torch.argmin(distance)
Assuming you have 1000 tokens with 100 dimensionality this would return nearest embedding based on euclidean distance. You could also use other metrics in similar manner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With