Neural networks - why so many learning rules?

Question

I'm starting neural networks, currently following mostly D. Kriesel's tutorial. Right off the beginning it introduces at least three (different?) learning rules (Hebbian, delta rule, backpropagation) concerning supervised learning.

I might be missing something, but if the goal is merely to minimize the error, why not just apply gradient descent over Error(entire_set_of_weights)?

Edit: I must admit the answers still confuse me. It would be helpful if one could point out the actual difference between those methods, and the difference between them and straight gradient descent.

To stress it, these learning rules seem to take the layered structure of the network into account. On the other hand, finding the minimum of Error(W) for the entire set of weights completely ignores it. How does that fit in?

Larry OBrien · Accepted Answer

One question is how to apportion the "blame" for an error. The classic Delta Rule or LMS rule is essentially gradient descent. When you apply Delta Rule to a multilayer network, you get backprop. Other rules have been created for various reasons, including the desire for faster convergence, non-supervised learning, temporal questions, models that are believed to be closer to biology, etc.

On your specific question of "why not just gradient descent?" Gradient descent may work for some problems, but many problems have local minima, which naive gradient descent will get stuck in. The initial response to that is to add a "momentum" term, so that you might "roll out" of a local minimum; that's pretty much the classic backprop algorithm.

Neural networks - why so many learning rules?

Tags:

machine-learning

neural-network

sold

1 Answers

Larry OBrien

Recent Activity

Donate For Us

Neural networks - why so many learning rules?

Tags:

machine-learning

neural-network

sold

1 Answers

Larry OBrien

Related questions

Recent Activity

Donate For Us