I'm following this tutorial for implementing the Backpropagation algorithm. However, I am stuck at implementing momentum for this algorithm.
Without Momentum, this is the code for weight update method:
def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']
And below is my implementation:
def updateWeights(network, row, l_rate, momentum=0.5):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i-1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                previous_weight = neuron['weights'][j]
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] + momentum * previous_weight
            previous_weight = neuron['weights'][-1]
            neuron['weights'][-1] += l_rate * neuron['delta'] + momentum * previous_weight
This gives me a Mathoverflow error since the weights are exponentially becoming too large over multiple epochs. I believe my previous_weight logic is wrong for the update.
Introduction of the momentum rate allows the attenuation of oscillations in the gradient descent.
Neural network momentum is a simple technique that often improves both training speed and accuracy. Training a neural network is the process of finding values for the weights and biases so that for a given set of input values, the computed output values closely match the known, correct, target values.
Momentum in neural networks is a variant of the stochastic gradient descent. It replaces the gradient with a momentum which is an aggregate of gradients as very well explained here. It is also the common name given to the momentum factor, as in your case.
Gradient Descent with Momentum takes small steps in directions where the gradients oscillate and take large steps along the direction where the past gradients have the same direction(same sign).
I'll give you a hint. You're multiplying momentum by the previous_weight in your implementation, which is another parameter of the network on the same step. This obviously blows up quickly.
What you should do instead is remember the whole update vector,
 l_rate * neuron['delta'] * inputs[j], on the previous backpropagation step and add it up. It might look something like this:
velocity[j] = l_rate * neuron['delta'] * inputs[j] + momentum * velocity[j]
neuron['weights'][j] += velocity[j]
... where velocity is an array of the same length as network, defined with a bigger scope than updateWeights and initialized with zeros. See this post for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With