using Keras fit_generator, steps_per_epoch should be equivalent to the total number available of samples divided by the batch_size.
But how would the generator or the fit_generator react if I choose a batch_size that does not fit n times into the samples? Does it yield samples until it cannot fill a whole batch_size anymore or does it just use a smaller batch_size for the last yield?
Why I ask: I divide my data into train/validation/test of different size (different %) but would use the same batch size for train and validation sets but especially for train and test sets. As they are different in size I cannot guarantee that batch size fit into the total amount of samples.
yield
It's you who create the generator, so the behavior is defined by you.
If steps_per_epoch is greater than the expected batches, fit will not see anything, it will simply keep requesting batches until it reaches the number of steps.
The only thing is: you must assure your generator is infinite.
Do this with while True: at the beginning, for instance.
ImageDataGenerator.If the generator is from an ImageDataGenerator, it's actually a keras.utils.Sequence and it has the length property: len(generatorInstance).
Then you can check yourself what happens:
remainingSamples = total_samples % batch_size #confirm that this is gerater than 0
wholeBatches = total_samples // batch_size
totalBatches = wholeBatches + 1
if len(generator) == wholeBatches:
print("missing the last batch")
elif len(generator) == totalBatches:
print("last batch included")
else:
print('weird behavior')
And check the size of the last batch:
lastBatch = generator[len(generator)-1]
if lastBatch.shape[0] == remainingSamples:
print('last batch contains the remaining samples')
else:
print('last batch is different')
If you assign N to the parameter steps_per_epoch of fit_generator(), Keras will basically call your generator N times before considering one epoch done. It's up to your generator to yield all your samples in N batches.
Note that since for most models it is fine to have different batch sizes each iteration, you could fix steps_per_epoch = ceil(dataset_size / batch_size) and let your generator output a smaller batch for the last samples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With