I created a convolutional-autoencoder
this way:
input_dim = Input((1, 200, 4))
x = Conv2D(64, (1,3), activation='relu', padding='same')(input_dim)
x = MaxPooling2D((1,2), padding='same')(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
x = MaxPooling2D((1,2), padding='same')(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((1,2), padding='same')(x)
#decoder
x = Conv2D(32, (1,3), activation='relu', padding='same')(encoded)
x = UpSampling2D((1,2))(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
x = UpSampling2D((1,2))(x)
x = Conv2D(64, (1,3), activation='relu')(x)
x = UpSampling2D((1,2))(x)
decoded = Conv2D(4, (1,3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mae',
metrics=['mean_squared_error'])
But when I try fitting the model with the last activation of the decoder being sigmoid
as above, the model loss decreases slightly (and remain unchanged at later epochs) so also the mean_square_error
. (using default Adam
settings):
autoencoder.fit(train, train, epochs=100, batch_size=256, shuffle=True,
validation_data=(test, test), callbacks=callbacks_list)
Epoch 1/100
97/98 [============================>.] - ETA: 0s - loss: 12.3690 - mean_squared_error: 2090.8232
Epoch 00001: loss improved from inf to 12.36328, saving model to weights.best.hdf5
98/98 [==============================] - 6s 65ms/step - loss: 12.3633 - mean_squared_error: 2089.3044 - val_loss: 12.1375 - val_mean_squared_error: 2029.4445
Epoch 2/100
97/98 [============================>.] - ETA: 0s - loss: 12.3444 - mean_squared_error: 2089.8032
Epoch 00002: loss improved from 12.36328 to 12.34172, saving model to weights.best.hdf5
98/98 [==============================] - 6s 64ms/step - loss: 12.3417 - mean_squared_error: 2089.1536 - val_loss: 12.1354 - val_mean_squared_error: 2029.4530
Epoch 3/100
97/98 [============================>.] - ETA: 0s - loss: 12.3461 - mean_squared_error: 2090.5886
Epoch 00003: loss improved from 12.34172 to 12.34068, saving model to weights.best.hdf5
98/98 [==============================] - 6s 63ms/step - loss: 12.3407 - mean_squared_error: 2089.1526 - val_loss: 12.1351 - val_mean_squared_error: 2029.4374
Epoch 4/100
97/98 [============================>.] - ETA: 0s - loss: 12.3320 - mean_squared_error: 2087.0349
Epoch 00004: loss improved from 12.34068 to 12.34050, saving model to weights.best.hdf5
98/98 [==============================] - 6s 63ms/step - loss: 12.3405 - mean_squared_error: 2089.1489 - val_loss: 12.1350 - val_mean_squared_error: 2029.4448
But then both loss
and mean_squared_error
decrease quickly when I changed the decoder's last activation to relu
.
Epoch 1/100
97/98 [============================>.] - ETA: 0s - loss: 9.8283 - mean_squared_error: 1267.3282
Epoch 00001: loss improved from inf to 9.82359, saving model to weights.best.hdf5
98/98 [==============================] - 6s 64ms/step - loss: 9.8236 - mean_squared_error: 1266.0548 - val_loss: 8.4972 - val_mean_squared_error: 971.0208
Epoch 2/100
97/98 [============================>.] - ETA: 0s - loss: 8.1906 - mean_squared_error: 910.6423
Epoch 00002: loss improved from 9.82359 to 8.19058, saving model to weights.best.hdf5
98/98 [==============================] - 6s 62ms/step - loss: 8.1906 - mean_squared_error: 910.5417 - val_loss: 7.6558 - val_mean_squared_error: 811.6011
Epoch 3/100
97/98 [============================>.] - ETA: 0s - loss: 7.3522 - mean_squared_error: 736.2031
Epoch 00003: loss improved from 8.19058 to 7.35255, saving model to weights.best.hdf5
98/98 [==============================] - 6s 61ms/step - loss: 7.3525 - mean_squared_error: 736.2403 - val_loss: 6.8044 - val_mean_squared_error: 650.5342
Epoch 4/100
97/98 [============================>.] - ETA: 0s - loss: 6.6166 - mean_squared_error: 621.1281
Epoch 00004: loss improved from 7.35255 to 6.61435, saving model to weights.best.hdf5
98/98 [==============================] - 6s 61ms/step - loss: 6.6143 - mean_squared_error: 620.6105 - val_loss: 6.2180 - val_mean_squared_error: 572.2390
I want to verify if it is valid to use an-all relu
function in the network architecture. Being novice to deep learning.
What you have asked invokes another question which is very fundamental. Ask yourself: "What you actually want the model to do?"- Predicting a real value? Or Value within a certain range? - You will get your answer.
But before that what I feel I should give you a brief on what activation functions are all about and why we use them.
Activation functions' main goal is to introduce non-linearity in your model. As the combination of linear functions is also a linear function, hence, without activation functions a Neural Network
is nothing but a giant linear function. Hence, being a liner function itself it won't be able to learn any non-linear behavior at all. This is the primary purpose of using an activation function.
Another purpose is to limit the range of output from a neuron. Following image shows Sigmoid
and ReLU
activation functions (the image is collected from here).
These two graphs show exactly what kind of limitations they can impose on values passed through them. If you look at Sigmoid
function it is allowing output to be in between 0 to 1
. So we can think it like a probability mapping based on some input value to the function. So where we can use it? Say for binary classification if we assign 0
and 1
for two different classes and use a Sigmoid
function in the output layer it can give us the probability of belonging to a certain class for an example input.
Now coming to ReLU
. What it does? It only allows Non-negative
values. As you can see all the negative values in horizontal axis is being mapped to 0 in vertical axis. But for positive values the 45 degree straight line shows that it does nothing to them and leave them as they are. Basically it helps us to get rid of negative values and makes them 0 and allows non-negative values only. Mathematically: relu(value) = max(0, value)
.
Now picture a situation: Say you want to predict real values which can be positive, zero or even negative! Will you use ReLU
activation function in the output layer just because it looks cool? Nope! Obviously not. If you do so it will never be able to predict any negative values as all the negative values are being trimmed down to 0.
Now coming to your case, I believe this model should predict values which shouldn't be limited from 0 to 1
. It should be a real valued
prediction.
Hence when you are using sigmoid
function, it is basically forcing the model to output between 0 to 1
and which is not a valid prediction in most of the cases and thus the model is producing large loss
and MSE
values. As the model is forcefully predicting something which is not anywhere near to the actual correct output.
Again when you are using ReLU
it is performing better. Because ReLU
doesn't change any non-negative value. Hence, the model is free to predict any non-negative values and now there is no bound to predict values which are close to actual outputs.
But what I think the model wants to predict intensity values which are likely from 0 to 255. Hence, there are already no negative values coming from your model. So in that sense technically there is no need of using ReLU
activation function in the last layer as it will not even get any negative values to filter out (if I am not mistaken). But you can use it as the official TensorFlow
documentation is using it. But it is only for safety purpose such that no negative
values can come out and again the ReLU
won't do anything to non-negative
values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With