I have a sequence of 12 words which I represent using a 12x256 matrix (using word embeddings). Let us refer to these as . I wish to take this as input and output a 1x256 vector. However I don't want to use a (12x256) x 256 dense layer. Instead I want to create the output embedding using a weighted summation of the 12 embeddings
where the wi s are scalars (thus there is weight sharing).
How can I create trainable wi s in pytorch? I am new and only familiar with the standard modules like nn.Linear.
You can implement this via 1D convolution with kernel_size = 1
import torch
batch_size=2
inputs = torch.randn(batch_size, 12, 256)
aggregation_layer = torch.nn.Conv1d(in_channels=12, out_channels=1, kernel_size=1)
weighted_sum = aggregation_layer(inputs)
Such convolution will have 12 parameters. Each parameter will be a equal to e_i in formula you provided.
In other words this convolution will ran over dimetion with size 256 and sum it with learnable weights.
This should do the trick for weighted avg:
from torch import nn
import torch
class LinearWeightedAvg(nn.Module):
def __init__(self, n_inputs):
super(LinearWeightedAvg, self).__init__()
self.weights = nn.ParameterList([nn.Parameter(torch.randn(1)) for i in range(n_inputs)])
def forward(self, input):
res = 0
for emb_idx, emb in enumerate(input):
res += emb * self.weights[emb_idx]
return res
example_data = torch.rand(12, 256)
wa_layer = LinearWeightedAvg(12)
res = wa_layer(example_data)
print(res.shape)
Answer inspired by a previous answer I received in the pytorch forums:
https://discuss.pytorch.org/t/dense-layer-with-different-inputs-for-each-neuron/47348
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With