I am new to deep learning. And I just implemented a CNN with Tensorflow and was trying it on CIFAR-10 (An object recognition benchmark, where images are in 10 different classes).
During the process of training, the training loss decreased really fast at the beginning(from 100000 to 3), but then it always stuck at around 2.30(which is approximately log(1/10)). Since I use cross-entropy as loss function, a loss of 2.30 means my model has an accuracy around 10% ---- exactly the same as guessing randomly(I have checked the actual output of model, really almost all around 10% for each class).
I tried to increase the size of model so as to try whether it is because my model is not "strong" enough to overfit. But it turns out that the training loss would always stop decreasing at around 2.30 no matter how I increase or decrease the model size.
I am quite confident that I implemented it correctly, since my model worked for easier task such as MNIST(handwritten digit recognition). So I really wonder what the problem might be. Thanks a lot.
conv1: convolution layer with relu
pooling1: max pooling layer
fc1: fully-connected layer with relu
output: fully-connected layer with softmax
CODE:
nn = NeuralNetwork(optimizer=Adam(0.001), log_dir='logs')
nn.add(Input('input', [32, 32, 3], ))
nn.add(Convolution2D(name='conv1', filter_height=3, filter_width=3,
n_output_channels=256, activation_fn='relu'))
nn.add(Pooling2D('pooling1', mode='max', pool_shape=(3, 3), padding='SAME'))
nn.add(Convolution2D(name='conv2', filter_height=3, filter_width=3,
n_output_channels=128, activation_fn='relu'))
nn.add(Pooling2D('pooling2', mode='max', pool_shape=(3, 3), padding='SAME'))
nn.add(FullyConnected('fc1', 384, activation_fn='relu',
weight_init=truncated_normal(), bias_init=constant(0.1)))
nn.add(FullyConnected('fc2', 192, activation_fn='relu',
weight_init=truncated_normal(), bias_init=constant(0.1)))
nn.add(Output(loss_fn='sparse_softmax_cross_entropy', output_fn='softmax',
name='output', target_shape=[], target_dtype=tf.int64,
output_shape=10))
nn.build()
EDIT:
As I have mentioned. I tried to increase the complexity of my model by adding more layers and almost tried the one on tutorial, except that I do not have norm layers(conv1, pooling1, conv2, pooling2, fc1, fc2, softmax) and preprocessiong like whitening etc. for simplicity, which I think might not compromise my performance as serious as from 86% to 10%.
Another clue that I think might help is that I found the output of layer fc1
is extremely sparse(almost 99% elements are zeros). Since I use ReLU as activation function, it means the units in fc1 are mostly dead. I there any thing I can do with it?
It's possible that you're just seriously underestimating the architecture required to achieve reasonable results on this task. The model you described (input->conv1->pooling1->fc1->output) may be adequate for MNIST, but that doesn't mean that it can achieve better than random results on an image classification task involving real color images.
The reality is that you'd need to provide the actual code for people to provide more concrete recommendations, but based on the model you've described I would at least recommend looking at some other models that can solve this problem. For example, Tensorflow comes with an example CNN that can achieve ~86% accuracy on CIFAR-10, but this model is more complex. And even with the additional convolutional and fully-connected layers, normalization, and input pre-processing (whitening, data augmentation, etc.), and tuned hyperparameters, it still takes several hours of training on a powerful GPU to obtain good results.
Anyway, long story short, I think you should review the example model to get a feel for the kind of architecture that's needed. It's easy to underestimate how much more complex it is to identify objects in random color images vs. black and white sets of numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With