Overfitting on image classification

Question

I'm working on image classification problem of sign language digits dataset with 10 categories (numbers from 0 to 10). My models are highly overfitting for some reason, even though I tried simple ones (like 1 Conv Layer), classical ResNet50 and even state-of-art NASNetMobile.

Images are colored and 100x100 in size. I tried tuning learning rate but it doesn't help much, although decreasing batch size results in earlier increase of val accuracy.

I applied augmentation to images and it didn't help too: my train accuracy can hit 1.0 when val accuracy can't get higher than 0.6.

I looked at the data and it seems to load just fine. Distribution of classes in validation set is fair too. I have 2062 images in total.

When I change my loss to binary_crossentropy it seems to give better results for both train accuracy and val accuracy, but that doesn't seem to be right.

I don't understand what's wrong, could you please help me find out what I'm missing? Thank you.

Here's a link to my notebook: click

thushv89 · Accepted Answer

This is going to be a very interesting answer. There's so many things you need to pay attention to when looking at a problem. Fortunately, there's a methodology (might be vague, but still a methodology).

TLDR: Start your journey at the data, not the model.

Analysing the data

First let's look at your data?

enter image description here

You have 10 classes. Each image is (100,100). And there only 2062 images. There's your first problem. There's very little data compared to a standard image classification problem. Therefore, you need to make sure that your data is easy to learn from without sacrificing generalizability of the data (i.e. so that it can do well on the validation/test sets). How do we do that?

Understand your data
Normalize your data
Reduce the number of features

Understanding data is a recurring theme in the other sections. So I won't have a separate section for that.

Normalizing your data

Here's first problem. You are rescaling your data to be between [0,1]. But you can do so much better by standardizing your data (i.e. (x - mean(x))/std(x)). Here's how you do that.

def create_datagen():
    return tf.keras.preprocessing.image.ImageDataGenerator(
        samplewise_center=True,
        samplewise_std_normalization=True,
        horizontal_flip=False,
        rotation_range=30,
        shear_range=0.2,
        validation_split=VALIDATION_SPLIT)

Another thing you might notice is I've set horizontal_flip=False. This brings me back to the first point. You have to make a judgement call to see what augmentation techniques might make sense.

Brightness/ Shear - Seems okay
Cropping/resizing - Seems okay
Horizontal/Vertical flip - This is not something I'd try at the beginning. If someone shows you a hand sign in two different horizontal orientations, you might have trouble understanding some signs.

Reducing the number of features

This is very important. You don't have that much data. And you want to make sure you get the most out of the data. The data has the original size of (100,100). You can do well with a significantly less size image (I have tried (64,64) - But you might be able to go even lower). So please reduce the size of the images whenever you can.

Next thing, it doesn't matter if you see a sign in RGB or Grayscale. You still can recognize the sign. But Grayscale cuts down the amount of samples by 66% compared to RGB. So use less color channels whenever you can.

This is how you do these,

def create_flow(datagen, subset, directory, hflip=False):
    return datagen.flow_from_directory(
        directory=directory,
        target_size=(64, 64),
        color_mode='grayscale',
        batch_size=BATCH_SIZE,
        class_mode='categorical',
        subset=subset,
        shuffle=True
    )

So again to reiterate, you need to spend time understanding data before you go ahead with a model. This is a bare minimal list for this problem. Feel free to try other things as well.

Creating the model

So, here's the changes I did to the model.

Added padding='same' to all the convolutional layers. If you don't do that by default it has padding=valid, which results in an automatic dimensionality reduction. This means, the deeper you go, the smaller your output is going to be. And you can see in the model you had you have a final convolution output of size (3,3). This is probably too small for the dense layer to make sense of. So pay attention to what the dense layer is getting.
Reduced the kernel size - Kernel size is directly related to the number of parameters. So to reduce the chances of overfitting to your small dataset. Go with a smaller kernel size whenever possible.
Removed dropout from convolutional layers - This is something I did as a precaution. Personally, I don't know if dropout works with convolution layers as well as with Dense layers. So I don't want to have an unknown complexity in my model at the beginning.
Removed the last convolutional layer - Reducing the parameters in the model to reduce changes of overfitting.

About the optimizer

After you do these changes, you don't need to change the learning rate of Adam. Adam works pretty well without any tuning. So that's a worry you can leave for later.

About the batch size

You were using a batch size of 8. Which is not even big enough to contain a single image for each class in a batch. Try to set this to a higher value. I set it to 32. Whenever you can try to increase batch size. May be not to very large values. But up to around 128 (for this problem should be fine).

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Convolution2D(8, (5, 5), activation='relu', input_shape=(64, 64, 1), padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.Convolution2D(16, (3, 3), activation='relu', padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.Convolution2D(32, (3, 3), activation='relu', padding='same'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.BatchNormalization())

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(10, activation='softmax'))
model.summary()

Final result

By doing some pre-meditation before jumping to making a model I achieved significantly better results than what you have.

Your result

Epoch 1/10
233/233 [==============================] - 37s 159ms/step - loss: 2.6027 - categorical_accuracy: 0.2218 - val_loss: 2.7203 - val_categorical_accuracy: 0.1000
Epoch 2/10
233/233 [==============================] - 37s 159ms/step - loss: 1.8627 - categorical_accuracy: 0.3711 - val_loss: 2.8415 - val_categorical_accuracy: 0.1450
Epoch 3/10
233/233 [==============================] - 37s 159ms/step - loss: 1.5608 - categorical_accuracy: 0.4689 - val_loss: 2.7879 - val_categorical_accuracy: 0.1750
Epoch 4/10
233/233 [==============================] - 37s 158ms/step - loss: 1.3778 - categorical_accuracy: 0.5145 - val_loss: 2.9411 - val_categorical_accuracy: 0.1450
Epoch 5/10
233/233 [==============================] - 38s 161ms/step - loss: 1.1507 - categorical_accuracy: 0.6090 - val_loss: 2.5648 - val_categorical_accuracy: 0.1650
Epoch 6/10
233/233 [==============================] - 38s 163ms/step - loss: 1.1377 - categorical_accuracy: 0.6042 - val_loss: 2.5416 - val_categorical_accuracy: 0.1850
Epoch 7/10
233/233 [==============================] - 37s 160ms/step - loss: 1.0224 - categorical_accuracy: 0.6472 - val_loss: 2.3338 - val_categorical_accuracy: 0.2450
Epoch 8/10
233/233 [==============================] - 37s 158ms/step - loss: 0.9198 - categorical_accuracy: 0.6788 - val_loss: 2.2660 - val_categorical_accuracy: 0.2450
Epoch 9/10
233/233 [==============================] - 37s 160ms/step - loss: 0.8494 - categorical_accuracy: 0.7111 - val_loss: 2.4924 - val_categorical_accuracy: 0.2150
Epoch 10/10
233/233 [==============================] - 37s 161ms/step - loss: 0.7699 - categorical_accuracy: 0.7417 - val_loss: 1.9339 - val_categorical_accuracy: 0.3450

My result

Epoch 1/10
59/59 [==============================] - 14s 240ms/step - loss: 1.8182 - categorical_accuracy: 0.3625 - val_loss: 2.1800 - val_categorical_accuracy: 0.1600
Epoch 2/10
59/59 [==============================] - 13s 228ms/step - loss: 1.1982 - categorical_accuracy: 0.5843 - val_loss: 2.2777 - val_categorical_accuracy: 0.1350
Epoch 3/10
59/59 [==============================] - 13s 228ms/step - loss: 0.9460 - categorical_accuracy: 0.6676 - val_loss: 2.5666 - val_categorical_accuracy: 0.1400
Epoch 4/10
59/59 [==============================] - 13s 226ms/step - loss: 0.7066 - categorical_accuracy: 0.7465 - val_loss: 2.3700 - val_categorical_accuracy: 0.2500
Epoch 5/10
59/59 [==============================] - 13s 227ms/step - loss: 0.5875 - categorical_accuracy: 0.8008 - val_loss: 2.0166 - val_categorical_accuracy: 0.3150
Epoch 6/10
59/59 [==============================] - 13s 228ms/step - loss: 0.4681 - categorical_accuracy: 0.8416 - val_loss: 1.4043 - val_categorical_accuracy: 0.4400
Epoch 7/10
59/59 [==============================] - 13s 228ms/step - loss: 0.4367 - categorical_accuracy: 0.8518 - val_loss: 1.7028 - val_categorical_accuracy: 0.4300
Epoch 8/10
59/59 [==============================] - 13s 226ms/step - loss: 0.3823 - categorical_accuracy: 0.8711 - val_loss: 1.3747 - val_categorical_accuracy: 0.5600
Epoch 9/10
59/59 [==============================] - 13s 227ms/step - loss: 0.3802 - categorical_accuracy: 0.8663 - val_loss: 1.0967 - val_categorical_accuracy: 0.6000
Epoch 10/10
59/59 [==============================] - 13s 227ms/step - loss: 0.3585 - categorical_accuracy: 0.8818 - val_loss: 1.0768 - val_categorical_accuracy: 0.5950

Note: This is a minimal effort I put. You can increase your accuracy further by augmenting data, optimizing the model structure, choosing the right batch size etc.

Overfitting on image classification

Tags:

python

tensorflow

deep-learning

computer-vision

keras

notEmissary

1 Answers

Analysing the data

Normalizing your data

Reducing the number of features

Creating the model

About the optimizer

About the batch size

Final result

Your result

My result

thushv89

Recent Activity

Donate For Us

Overfitting on image classification

Tags:

python

tensorflow

deep-learning

computer-vision

keras

notEmissary

1 Answers

Analysing the data

Normalizing your data

Reducing the number of features

Creating the model

About the optimizer

About the batch size

Final result

Your result

My result

thushv89

Related questions

Recent Activity

Donate For Us