Why does adding more layers to this neural net make the output worse?

Question

I am just getting started with neural networks and using Synaptic to get started (I know I know, neural networks in JavaScript, gasp!).

This is the example code given in this section for creating a neural network for learning the XOR function:

var myPerceptron = new Architect.Perceptron(2, 3, 1);
var myTrainer = new Trainer(myPerceptron);

myTrainer.XOR();

console.log(myPerceptron.activate([0, 0])); // 0.0268581547421616
console.log(myPerceptron.activate([1, 0])); // 0.9829673642853368
console.log(myPerceptron.activate([0, 1])); // 0.9831714267395621
console.log(myPerceptron.activate([1, 1])); // 0.02128894618097928

I am experimenting with adding more layers and seeing what happens. Adding one additional hidden layer doesn't have much effect, but adding 2 layers makes the output identical regardless of the input.

var myPerceptron = new Architect.Perceptron(2, 3, 3, 3, 1);
var myTrainer = new Trainer(myPerceptron);

myTrainer.XOR();

console.log(myPerceptron.activate([0, 0])); // 0.521076904986927
console.log(myPerceptron.activate([1, 0])); // 0.5210769149857782
console.log(myPerceptron.activate([0, 1])); // 0.5210769118775331
console.log(myPerceptron.activate([1, 1])); // 0.5210769209325651

Why does this happen? Is this simply because a more complex network needs a lot more training, or is it because this kind of network is intrinsically not suitable for this kind of problem?

Indie AI · Accepted Answer

I am not very familiar with Synaptic (but it does look kinda cool), but here are some general issues you could look into:

Weight initialization is important. Proper weight initialization allows our gradients to backpropagate through our network and for learning to occur. Is there an option to initialize weights in your network? Common initialization schemes are the Xavier Glorot Initialization given in Understanding the difficulty of training deep feedforward neural networks and more recently in Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Is your step size aka learning rate too large? It seems like your network is outputting constant values. If you are using saturating nonlinearities (i.e. bounded activation functions like sigmoid or tanh) then a large learning might may cause your nonlinearities to saturate and learning effectively halts and this may result in outputting constant values.
Related to the previous point: what type of nonlinearities are you using in your hidden layers? Again, if it's a saturating nonlinearity, this may be hindering your training. You can try rectifying linear units (ReLUs) which have the form $f(x) = \max(0,x)$. They are unbounded, so they do not saturate and they have gradient equal to 1 when $x > 0$. They have the interpretation of "activating" when the input is greater than 0. In that case they act like a switch and allow gradient to propagate through.

There might be other issues that hopefully others can comment on as well. These are the 3 that immediately come to mind for me.

I am not familiar with Synaptic so I am not sure how much control or what their default setup or parameters are.

Hope this helps!

Why does adding more layers to this neural net make the output worse?

Tags:

javascript

machine-learning

neural-network

Aron

1 Answers

Indie AI

Recent Activity

Donate For Us

Why does adding more layers to this neural net make the output worse?

Tags:

javascript

machine-learning

neural-network

Aron

1 Answers

Indie AI

Related questions

Recent Activity

Donate For Us