Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't my CNN learn?

I am new to deep learning. And I just implemented a CNN with Tensorflow and was trying it on CIFAR-10 (An object recognition benchmark, where images are in 10 different classes).

During the process of training, the training loss decreased really fast at the beginning(from 100000 to 3), but then it always stuck at around 2.30(which is approximately log(1/10)). Since I use cross-entropy as loss function, a loss of 2.30 means my model has an accuracy around 10% ---- exactly the same as guessing randomly(I have checked the actual output of model, really almost all around 10% for each class).

I tried to increase the size of model so as to try whether it is because my model is not "strong" enough to overfit. But it turns out that the training loss would always stop decreasing at around 2.30 no matter how I increase or decrease the model size.

I am quite confident that I implemented it correctly, since my model worked for easier task such as MNIST(handwritten digit recognition). So I really wonder what the problem might be. Thanks a lot. enter image description here

conv1: convolution layer with relu

pooling1: max pooling layer

fc1: fully-connected layer with relu

output: fully-connected layer with softmax

CODE:

nn = NeuralNetwork(optimizer=Adam(0.001), log_dir='logs')
nn.add(Input('input', [32, 32, 3], ))
nn.add(Convolution2D(name='conv1', filter_height=3, filter_width=3, 
                     n_output_channels=256, activation_fn='relu'))
nn.add(Pooling2D('pooling1', mode='max', pool_shape=(3, 3), padding='SAME'))
nn.add(Convolution2D(name='conv2', filter_height=3, filter_width=3, 
                     n_output_channels=128, activation_fn='relu'))
nn.add(Pooling2D('pooling2', mode='max', pool_shape=(3, 3), padding='SAME'))
nn.add(FullyConnected('fc1', 384, activation_fn='relu',
                      weight_init=truncated_normal(), bias_init=constant(0.1)))
nn.add(FullyConnected('fc2', 192, activation_fn='relu', 
                      weight_init=truncated_normal(), bias_init=constant(0.1)))
nn.add(Output(loss_fn='sparse_softmax_cross_entropy', output_fn='softmax',
              name='output', target_shape=[], target_dtype=tf.int64, 
              output_shape=10))
nn.build()

EDIT:

As I have mentioned. I tried to increase the complexity of my model by adding more layers and almost tried the one on tutorial, except that I do not have norm layers(conv1, pooling1, conv2, pooling2, fc1, fc2, softmax) and preprocessiong like whitening etc. for simplicity, which I think might not compromise my performance as serious as from 86% to 10%.

Another clue that I think might help is that I found the output of layer fc1 is extremely sparse(almost 99% elements are zeros). Since I use ReLU as activation function, it means the units in fc1 are mostly dead. I there any thing I can do with it?

like image 955
Lifu Huang Avatar asked Sep 05 '25 03:09

Lifu Huang


1 Answers

It's possible that you're just seriously underestimating the architecture required to achieve reasonable results on this task. The model you described (input->conv1->pooling1->fc1->output) may be adequate for MNIST, but that doesn't mean that it can achieve better than random results on an image classification task involving real color images.

The reality is that you'd need to provide the actual code for people to provide more concrete recommendations, but based on the model you've described I would at least recommend looking at some other models that can solve this problem. For example, Tensorflow comes with an example CNN that can achieve ~86% accuracy on CIFAR-10, but this model is more complex. And even with the additional convolutional and fully-connected layers, normalization, and input pre-processing (whitening, data augmentation, etc.), and tuned hyperparameters, it still takes several hours of training on a powerful GPU to obtain good results.

Anyway, long story short, I think you should review the example model to get a feel for the kind of architecture that's needed. It's easy to underestimate how much more complex it is to identify objects in random color images vs. black and white sets of numbers.

like image 132
Aenimated1 Avatar answered Sep 07 '25 19:09

Aenimated1