I have the following neural network, written in Keras using Tensorflow as the backend, which I'm running on Python 3.5 (Anaconda) on Windows 10:
model = Sequential()
model.add(Dense(100, input_dim=283, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(150, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(4, init='normal', activation='sigmoid'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
I'm training on my GPU. During training (10000 epochs), the accuracy of the naive network steadily increases from 0.25 to somewhere between 0.7 and 0.9, before suddenly dropping and sticking at 0.25:
Epoch 1/10000
6120/6120 [==============================] - 1s - loss: 1.5329 - acc: 0.2665
Epoch 2/10000
6120/6120 [==============================] - 1s - loss: 1.2985 - acc: 0.3784
Epoch 3/10000
6120/6120 [==============================] - 1s - loss: 1.2259 - acc: 0.4891
Epoch 4/10000
6120/6120 [==============================] - 1s - loss: 1.1867 - acc: 0.5208
Epoch 5/10000
6120/6120 [==============================] - 1s - loss: 1.1494 - acc: 0.5199
Epoch 6/10000
6120/6120 [==============================] - 1s - loss: 1.1042 - acc: 0.4953
Epoch 7/10000
6120/6120 [==============================] - 1s - loss: 1.0491 - acc: 0.4982
Epoch 8/10000
6120/6120 [==============================] - 1s - loss: 1.0066 - acc: 0.5065
Epoch 9/10000
6120/6120 [==============================] - 1s - loss: 0.9749 - acc: 0.5338
Epoch 10/10000
6120/6120 [==============================] - 1s - loss: 0.9456 - acc: 0.5696
Epoch 11/10000
6120/6120 [==============================] - 1s - loss: 0.9252 - acc: 0.5995
Epoch 12/10000
6120/6120 [==============================] - 1s - loss: 0.9111 - acc: 0.6106
Epoch 13/10000
6120/6120 [==============================] - 1s - loss: 0.8772 - acc: 0.6160
Epoch 14/10000
6120/6120 [==============================] - 1s - loss: 0.8517 - acc: 0.6245
Epoch 15/10000
6120/6120 [==============================] - 1s - loss: 0.8170 - acc: 0.6345
Epoch 16/10000
6120/6120 [==============================] - 1s - loss: 0.7850 - acc: 0.6428
Epoch 17/10000
6120/6120 [==============================] - 1s - loss: 0.7633 - acc: 0.6580
Epoch 18/10000
6120/6120 [==============================] - 4s - loss: 0.7375 - acc: 0.6717
Epoch 19/10000
6120/6120 [==============================] - 1s - loss: 0.7058 - acc: 0.6850
Epoch 20/10000
6120/6120 [==============================] - 1s - loss: 0.6787 - acc: 0.7018
Epoch 21/10000
6120/6120 [==============================] - 1s - loss: 0.6557 - acc: 0.7093
Epoch 22/10000
6120/6120 [==============================] - 1s - loss: 0.6304 - acc: 0.7208
Epoch 23/10000
6120/6120 [==============================] - 1s - loss: 0.6052 - acc: 0.7270
Epoch 24/10000
6120/6120 [==============================] - 1s - loss: 0.5848 - acc: 0.7371
Epoch 25/10000
6120/6120 [==============================] - 1s - loss: 0.5564 - acc: 0.7536
Epoch 26/10000
6120/6120 [==============================] - 1s - loss: 0.1787 - acc: 0.4163
Epoch 27/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 28/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 29/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 30/10000
6120/6120 [==============================] - 2s - loss: 1.1921e-07 - acc: 0.2500
Epoch 31/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500
Epoch 32/10000
6120/6120 [==============================] - 1s - loss: 1.1921e-07 - acc: 0.2500 ...
I'm guessing that this is due to the optimiser falling into a local minimum where it assigns all data to one category. How can I inhibit it from doing this?
Things I've tried (but didn't seem to stop this from happening):
None of these helped. Any other ideas why this is happening and/or how to inhibit it? Could it be a bug in Keras? Many thanks in advance for any suggestions.
Edit: The problem appears to have been solved by changing the final activation to softmax (from sigmoid) and adding maxnorm(3) regularization to the final two hidden layers:
model = Sequential()
model.add(Dense(100, input_dim=npoints, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(150, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
model.add(Dense(200, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dropout(0.2))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.add(Dense(ncat, init='normal', activation='softmax'))
model.compile(loss='mean_squared_error', optimizer=sgd, metrics=['accuracy'])
Many thanks for the suggestions.
The problem lied in sigmoid function as an activation in a last layer. In this case the output of your final layer cannot be interpreted as a probability distribution of an example given belonging to a single class. The output from this layer usually doesn't even sum up to 1. In this case the optimization may lead to unexpected behaviour. In my opinion adding a maxnorm constrain is not necessary but I strongly advise you to use a categorical_crossentropy instead of mse loss as it's proven that this function works better for this optimization case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With