I am doing the following operation,
energy.masked_fill(mask == 0, float("-1e20")) 
my python traces are below,
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "seq_sum.py", line 418, in forward
    enc_src = self.encoder(src, src_mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "seq_sum.py", line 71, in forward
    src = layer(src, src_mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "seq_sum.py", line 110, in forward
    _src, _ = self.self_attention(src, src, src, src_mask)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "seq_sum.py", line 191, in forward
    energy =  energy.masked_fill(mask == 0, float("-1e20"))
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 3
These are my attention layers code,
    Q = self.fc_q(query)
    K = self.fc_k(key)
    V = self.fc_v(value)
    
    #Q = [batch size, query len, hid dim]
    #K = [batch size, key len, hid dim]
    #V = [batch size, value len, hid dim]
            
    # Q = Q.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
    # K = K.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
    # V = V.view(batch_size, -1, self.n_heads, self.head_dim).permute(0, 2, 1, 3)
    Q = Q.view(batch_size, -1, self.n_heads, self.head_dim).view(-1, 1024)
    K = K.view(batch_size, -1, self.n_heads, self.head_dim).view(-1, 1024)
    V = V.view(batch_size, -1, self.n_heads, self.head_dim).view(-1, 1024)
    energy = torch.matmul(Q, K.transpose(1,0)) / self.scale
I am following below github code to do my seq to seq operation,seq2seq pytorch actual testing code is available on the below location, code to test a seq of 1024 to 1024 output
2nd example tried here I have commented out pos_embedding due CUDA error with large index (RuntimeError: cuda runtime error (59)
I took a look at your code (which by the way, didnt run with seq_len = 10) and the problem is that you hard coded the batch_size to be equal 1 (line 143) in your code.
It looks like the example you are trying to run the model on has batch_size = 2.
Just uncomment the previous line where you wrote batch_size = query.shape[0] and everything runs fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With