I am implementing a Convolutional Neural Net using transfer learning in Keras by using pre-trained InceptionV3 model from keras.applications
like shown below
#Transfer learning with Inception V3
base_model = applications.InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
## set model architechture
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(y_train.shape[1], activation='softmax')(x)
model = Model(input=base_model.input, output=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.summary()
I was following a blog post that said the model must be trained for a few epochs after freezing the base model. I have trained the model for 5 epochs which gave me acc of 0.47. After that acc don't improve much. Then I stopped the training and unfreezed some of the layers like this and freezing first 2 Convolution layers.
for layer in model.layers[:172]:
layer.trainable = False
for layer in model.layers[172:]:
layer.trainable = True
And compiled with SGD with lower learning rate.
Was my approach to stop training the model when the acc don't improve much with layers freezed correct.? Should I have trained longer.?
How to know the correct time to stop training with layers freezed.?
IMHO, you don't have to train your randomly initialized layers until loss/accuracy stops improving.
When I used InceptionV3 for fine-tuning I trained my additional Dense layer for just 2 epochs, even though training it for few more epochs would most likely lead to better loss/accuracy. The number of epochs for initial training depends on your problem and data. (For me 2 epochs reached ~40%.)
I thinks it's a waste of time to train only Dense layer for too long. Train it to get something considerably better that random initialization. Then unfreeze more layers and train them longer together with your Dense layer. As soon as your Dense layer gives reasonable predictions, it's good to train other layers, especially that you have batch normalization in InceptionV3 that stabilizes the variance of gradients for earlier layers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With