Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best metric to evaluate how well a CNN is trained? validation error or training loss?

I want to train a CNN, but I want to use all data to train the network thus not performing validation. Is this a good choice? am I risking to overfit my CNN if using only the training loss as the criterium for early stopping the CNN?

In other words, what is the best 'monitor' parameter in KERAS (for example) for early stopping, among the options below?

early_stopper=EarlyStopping(monitor='train_loss', min_delta=0.0001, patience=20)
early_stopper=EarlyStopping(monitor='train_acc', min_delta=0.0001, patience=20)
early_stopper=EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=20)
early_stopper=EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=20) 

There is a discussion like this in stackoverflow Keras: Validation error is a good measure for stopping criteria or validation accuracy?, however, they talk about the validation only. Is it better using criteria in validation or training data to early stopping a CNN training?

like image 786
mad Avatar asked Nov 29 '25 20:11

mad


2 Answers

  1. I want to train a CNN, but I want to use all data to train the network thus not performing validation. Is this a good choice? am I risking to overfit my CNN if using only the training loss as the criterium for early stopping the CNN?

Answer: No, your purpose is to predict on new samples, even you got 100% training accuracy but you may got bad prediction on new samples. You don't have a way to check whether you have an overfitting

  1. In other words, what is the best 'monitor' parameter in KERAS (for example) for early stopping, among the options below?

Answer: It should be the criteria closest to the reality

early_stopper=EarlyStopping(monitor='val_acc', min_delta=0.0001, patience=20)

In addition, you may need train, validate, and test data. Train is to train your model, validate is to perform validating some models+parameters and select the best, and test is to verify independently your result (it's not used for choosing models, parameters, so it's equivalent to new samples)

like image 200
Tin Luu Avatar answered Dec 01 '25 12:12

Tin Luu


I've already up-voted Tin Luu's answer, but wanted to refine one critical, practical point: the best criterion is the one that best matches your success criteria. To wit, you have to define your practical scoring function before your question makes complete sense for us.

What is important to the application for which you're training this model? If it's nothing more than top-1 prediction accuracy, then validation accuracy (val_acc) is almost certainly your sole criterion. If you care about confidence levels (e.g. hedging your bets when 48% chance it's a cat, 42% it's a wolf, 10% it's a Ferrari), then proper implementation of an error function will make validation error (val_err) a better choice.

Finally, I stress again that the ultimate metric is actual performance according to your chosen criteria. Test data are a representative sampling of your actual input. You can use an early stopping criterion for faster training turnaround, but you're not ready for deployment until your real-world criteria are tested and satisfied.

like image 32
Prune Avatar answered Dec 01 '25 13:12

Prune



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!