Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neural networks - why so many learning rules?

I'm starting neural networks, currently following mostly D. Kriesel's tutorial. Right off the beginning it introduces at least three (different?) learning rules (Hebbian, delta rule, backpropagation) concerning supervised learning.

I might be missing something, but if the goal is merely to minimize the error, why not just apply gradient descent over Error(entire_set_of_weights)?

Edit: I must admit the answers still confuse me. It would be helpful if one could point out the actual difference between those methods, and the difference between them and straight gradient descent.

To stress it, these learning rules seem to take the layered structure of the network into account. On the other hand, finding the minimum of Error(W) for the entire set of weights completely ignores it. How does that fit in?

like image 905
sold Avatar asked Nov 21 '25 04:11

sold


1 Answers

One question is how to apportion the "blame" for an error. The classic Delta Rule or LMS rule is essentially gradient descent. When you apply Delta Rule to a multilayer network, you get backprop. Other rules have been created for various reasons, including the desire for faster convergence, non-supervised learning, temporal questions, models that are believed to be closer to biology, etc.

On your specific question of "why not just gradient descent?" Gradient descent may work for some problems, but many problems have local minima, which naive gradient descent will get stuck in. The initial response to that is to add a "momentum" term, so that you might "roll out" of a local minimum; that's pretty much the classic backprop algorithm.

like image 55
Larry OBrien Avatar answered Nov 24 '25 22:11

Larry OBrien



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!