I'm using keras defined as submodule in tensorflow v2. I'm training my model using fit_generator() method. I want to save my model every 10 epochs. How can I achieve this?
In Keras (not as a submodule of tf), I can give ModelCheckpoint(model_savepath,period=10). But in tf v2, they've changed this to ModelCheckpoint(model_savepath, save_freq) where save_freq can be 'epoch' in which case model is saved every epoch. If save_freq is integer, model is saved after so many samples have been processed. But I want it to be after 10 epochs. How can I achieve this?
Let's say for example, after epoch = 150 is over, it will be saved as model. save(model_1. h5) and after epoch = 152 , it will be saved as model. save(model_2.
To save weights every epoch, you can use something known as callbacks in Keras. checkpoint = ModelCheckpoint(.....) , assign the argument 'period' as 1 which assigns the periodicity of epochs. This should do it.
There are two formats you can use to save an entire model to disk: the TensorFlow SavedModel format, and the older Keras H5 format. The recommended format is SavedModel. It is the default when you use model.save() .
Using save_weights() method It saves the weights of the layers contained in the model. It is advised to use the save() method to save h5 models instead of save_weights() method for saving a model using tensorflow. However, h5 models can also be saved using save_weights() method.
SavedModel is the more comprehensive save format that saves the model architecture, weights, and the traced Tensorflow subgraphs of the call functions. This enables Keras to restore both built-in layers as well as custom objects. # Create a simple model. # Train the model. # Calling `save ('my_model')` creates a SavedModel folder `my_model`.
The function name is sufficient for loading as long as it is registered as a custom object. It's possible to load the TensorFlow graph generated by the Keras. If you do so, you won't need to provide any custom_objects. You can do so like this:
model.save () or tf.keras.models.save_model () tf.keras.models.load_model () There are two formats you can use to save an entire model to disk: the TensorFlow SavedModel format, and the older Keras H5 format . The recommended format is SavedModel. It is the default when you use model.save ().
Also, saving every N epochs is not an option for me. What I am trying to do is save the model after some specific epochs are done. Let's say for example, after epoch = 150 is over, it will be saved as model.save (model_1.h5) and after epoch = 152, it will be saved as model.save (model_2.h5) etc... for few specific epochs.
Using tf.keras.callbacks.ModelCheckpoint use save_freq='epoch' and pass an extra argument period=10.
Although this is not documented in the official docs, that is the way to do it (notice it is documented that you can pass period, just doesn't explain what it does).
Explicitly computing the number of batches per epoch worked for me.
BATCH_SIZE = 20
STEPS_PER_EPOCH = train_labels.size / BATCH_SIZE
SAVE_PERIOD = 10
# Create a callback that saves the model's weights every 10 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq= int(SAVE_PERIOD * STEPS_PER_EPOCH))
# Train the model with the new callback
model.fit(train_images,
train_labels,
batch_size=BATCH_SIZE,
steps_per_epoch=STEPS_PER_EPOCH,
epochs=50,
callbacks=[cp_callback],
validation_data=(test_images,test_labels),
verbose=0)
The param period mentioned in the accepted answer is now not available anymore.
Using the save_freq param is an alternative, but risky, as mentioned in the docs; e.g., if the dataset size changes, it may become unstable: Note that if the saving isn't aligned to epochs, the monitored metric may potentially be less reliable (again taken from the docs).
Thus, I use a subclass as a solution:
class EpochModelCheckpoint(tf.keras.callbacks.ModelCheckpoint):
def __init__(self,
filepath,
frequency=1,
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
options=None,
**kwargs):
super(EpochModelCheckpoint, self).__init__(filepath, monitor, verbose, save_best_only, save_weights_only,
mode, "epoch", options)
self.epochs_since_last_save = 0
self.frequency = frequency
def on_epoch_end(self, epoch, logs=None):
self.epochs_since_last_save += 1
# pylint: disable=protected-access
if self.epochs_since_last_save % self.frequency == 0:
self._save_model(epoch=epoch, batch=None, logs=logs)
def on_train_batch_end(self, batch, logs=None):
pass
use it as
callbacks=[
EpochModelCheckpoint("/your_save_location/epoch{epoch:02d}", frequency=10),
]
Note that, dependent on your TF version, you may have to change the args in the call to the superclass __init__.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With