Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch gradients exist but weights not updating

So, I have a deep convolutional network with an lstm layer, and after the ltsm layer it splits off to compute two different functions (using two different linear layers) whose results are then added together to form the final network output.

When I compute the loss of the network so that I can have it compute the gradients and update the weights, I have it do a few operations and then have it compute the loss between the derived value and the calculated target value.

def update(output, target):
    # target output is calculated outside the function
    # operations on output
    loss(output, target).backward()
    self.optimizer.step()

The network has some loss (sometimes in a very small order of magnitude, but sometimes also on higher orders of magnitude), for example a few of the losses:

tensor(1.00000e-04 *
   5.7420)
tensor(2.7190)
tensor(0.9684)

It also has gradients as calculated here:

for param in self.parameters():
    print(param.grad.data.sum())

Which outputs:

tensor(1.00000e-03 *
   1.9996)
tensor(1.00000e-03 *
   2.6101)
tensor(1.00000e-02 *
   -1.3879)
tensor(1.00000e-03 *
   -4.5834)
tensor(1.00000e-02 *
   2.1762)
tensor(1.00000e-03 *
   3.6246)
tensor(1.00000e-03 *
   6.6234)
tensor(1.00000e-02 *
   2.9373)
tensor(1.00000e-02 *
   1.2680)
tensor(1.00000e-03 *
   1.8791)
tensor(1.00000e-02 *
   1.7322)
tensor(1.00000e-02 *
   1.7322)
tensor(0.)
tensor(0.)
tensor(1.00000e-03 *
   -6.7885)
tensor(1.00000e-02 *
   9.7793)

And:

tensor(2.4620)
tensor(0.9544)
tensor(-26.2465)
tensor(0.2280)
tensor(-219.2602)
tensor(-2.7870)
tensor(-50.8203)
tensor(3.2548)
tensor(19.6163)
tensor(-18.6029)
tensor(3.8564)
tensor(3.8564)
tensor(0.)
tensor(0.)
tensor(0.8040)
tensor(-0.1157)

But when I compare the weight before and after running the optimizer, I get the result that the weights are equal to each other.

Code to see if weights change:

before = list(neuralnet.parameters())
neuralnet.update()
after = list(neuralnet.parameters())
for i in range(len(before)):
    print(torch.equal(before[i].data, after[i].data))

The above returns True for each iteration.

like image 259
user10011538 Avatar asked Oct 31 '25 04:10

user10011538


2 Answers

https://discuss.pytorch.org/t/gradients-exist-but-weights-not-updating/20484/2?u=wr01 has the answer I sought. The problem was that neuralnet.parameters() does not clone the list of parameters, so when I was updating the weights, the weights were updating in the before variable.

like image 65
user10011538 Avatar answered Nov 02 '25 17:11

user10011538


While initializing the parameters do wrap those in torch.nn.Parameter() class for the optimizer to update these. If you are using pytorch < 0.4 try using torch.autograd.Variable(). For example:

import torch
import torch.utils.data
from torch import nn, optim
from torch.nn import functional as F

class TEMP(nn.Module):

    # Whole architecture
    def __init__(self):
        super(TEMP, self).__init__()
        self.input = nn.Parameter(torch.ones(1,requires_grad = True)) # <----wrap it like this


    def forward(self,x):
        wt = self.input
        y = wt*x 
        return y

model = TEMP()
optimizer = optim.Adam(model.parameters(), lr=0.001)
x = torch.randn(100)
y = 5*x
loss = torch.sum((y - model(x)).pow(2))
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(model.input)

And Please note if you are initializing a tensor in pytorch >= 0.4 do change the value of requires_grad = True if you want that variable to be updated.

like image 30
Gaurav Shrivastava Avatar answered Nov 02 '25 19:11

Gaurav Shrivastava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!