Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gradient clipping in pytorch has no effect (Gradient exploding still happens)

I have an exploding gradient problem when train the minibatch for 150-200 epochs with batch size = 256 and there’s about 30-60 minibatch (This depends on my specific config). But I have an exploding gradient issues even if I add the code below. enter image description here As you can see this below images, notice that in step about 40k there’s the swing of gradients between ± 20k, 40k and 60k respectively. I don’t know why this happens because i use the clip_grad_value_ above. Also Using the learning rate decay from 0.01 to about 0.008 at step 40k. enter image description here Or do I need to update the weight parameters by myself something like this image But i think optimizer.step() should do the job and the clip_grad_value_ is an inplace operation so i don’t need to take the return value from function. Please correct if i did anything wrong. Thank you very much

Best regards, Mint

like image 531
Puntawat Ponglertnapakorn Avatar asked Oct 18 '25 03:10

Puntawat Ponglertnapakorn


1 Answers

Your code looks right, but try using a smaller value for the clip-value argument. Here's the documentation on the clip_grad_value_() function you're using, which shows that each individual term in the gradient is set such that its magnitude does not exceed the clip value.

You have clip value set to 100, so if you have 100 parameters then abs(gradient).sum() can be as large as 10,000 (100*100).

like image 185
Brian Bartoldson Avatar answered Oct 21 '25 04:10

Brian Bartoldson