I am training a model using Keras.
model = Sequential()
model.add(LSTM(units=300, input_shape=(timestep,103), use_bias=True, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(units=536))
model.add(Activation("sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
while True:
        history = model.fit_generator( 
            generator = data_generator(x_[train_indices],
                    y_[train_indices], batch = batch, timestep=timestep),
                steps_per_epoch=(int)(train_indices.shape[0] / batch), 
                epochs=1, 
                verbose=1, 
                validation_steps=(int)(validation_indices.shape[0] / batch), 
                validation_data=data_generator(
                    x_[validation_indices],y_[validation_indices], batch=batch,timestep=timestep))
It is a multiouput classification accoriding to scikit-learn.org definition: Multioutput regression assigns each sample a set of target values.This can be thought of as predicting several properties for each data-point, such as wind direction and magnitude at a certain location.
Thus, it is a recurrent neural network I tried out different timestep sizes. But the result/problem is mostly the same.
After one epoch, my train loss is around 0.0X and my validation loss is around 0.6X. And this values keep stable for the next 10 epochs.
Dataset is around 680000 rows. Training data is 9/10 and validation data is 1/10.
I ask for intuition behind that..
High level question: Therefore it is a multioutput classification task (not multi class), I see the only way by using sigmoid an binary_crossentropy. Do you suggest an other approach?
One epoch leads to underfitting of the curve in the graph (below). As the number of epochs increases, more number of times the weight are changed in the neural network and the curve goes from underfitting to optimal to overfitting curve.
In general too many epochs may cause your model to over-fit the training data. It means that your model does not learn the data, it memorizes the data.
Why do we use multiple epochs? Researchers want to get good performance on non-training data (in practice this can be approximated with a hold-out set); usually (but not always) that takes more than one pass over the training data.
I've experienced this issue and found that the learning rate and batch size have a huge impact on the learning process. In my case, I've done two things.
Moreover, you can try the basic steps for preventing overfitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With