Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pytorch - Should backward() function be in the loop of epoch or batch?

When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
        lstm.zero_grad()
        loss.backward()
        optimizer.step()
loss_list.append(loss_sum / num_train_obs)

Calculate gradient across the epoch:

for e in range(epochs):
    for i in batches_list:
        out = nn_model(i)
        loss = loss_function(out, actual)
        loss_sum += loss.item()
    lstm.zero_grad()
    loss_sum.backward()
    optimizer.step()     
loss_list.append(loss_sum / num_train_obs)
like image 800
wwj123 Avatar asked Oct 28 '25 05:10

wwj123


1 Answers

Both are programmatically correct.

The first one is batch gradient descent, and the second one is gradient descent. In most of the problems we want to do batch gradient descent, so the first one is the right approach. It is also likely to train faster.

You may use the second approach if you want to do Gradient descent (but it is seldom desired to do GD when you can do batch GD). However, since in GD you don't clear the graph every batch (.zero_grad is called only once), you may run out-of-memory.

like image 131
Umang Gupta Avatar answered Oct 30 '25 11:10

Umang Gupta



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!