Can relu be used at the last layer of a neural network?

Question

I created a convolutional-autoencoder this way:

input_dim = Input((1, 200, 4))
x = Conv2D(64, (1,3), activation='relu', padding='same')(input_dim)
x = MaxPooling2D((1,2), padding='same')(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
x = MaxPooling2D((1,2), padding='same')(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((1,2), padding='same')(x)

#decoder
x = Conv2D(32, (1,3), activation='relu', padding='same')(encoded)
x = UpSampling2D((1,2))(x)
x = Conv2D(32, (1,3), activation='relu', padding='same')(x)
x = UpSampling2D((1,2))(x)
x = Conv2D(64, (1,3), activation='relu')(x)
x = UpSampling2D((1,2))(x)
decoded = Conv2D(4, (1,3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_layer, decoded) 

autoencoder.compile(optimizer='adam', loss='mae', 
                    metrics=['mean_squared_error'])

But when I try fitting the model with the last activation of the decoder being sigmoid as above, the model loss decreases slightly (and remain unchanged at later epochs) so also the mean_square_error. (using default Adam settings):

autoencoder.fit(train, train, epochs=100, batch_size=256, shuffle=True, 
        validation_data=(test, test), callbacks=callbacks_list)

Epoch 1/100
97/98 [============================>.] - ETA: 0s - loss: 12.3690 - mean_squared_error: 2090.8232
Epoch 00001: loss improved from inf to 12.36328, saving model to weights.best.hdf5
98/98 [==============================] - 6s 65ms/step - loss: 12.3633 - mean_squared_error: 2089.3044 - val_loss: 12.1375 - val_mean_squared_error: 2029.4445
Epoch 2/100
97/98 [============================>.] - ETA: 0s - loss: 12.3444 - mean_squared_error: 2089.8032
Epoch 00002: loss improved from 12.36328 to 12.34172, saving model to weights.best.hdf5
98/98 [==============================] - 6s 64ms/step - loss: 12.3417 - mean_squared_error: 2089.1536 - val_loss: 12.1354 - val_mean_squared_error: 2029.4530
Epoch 3/100
97/98 [============================>.] - ETA: 0s - loss: 12.3461 - mean_squared_error: 2090.5886
Epoch 00003: loss improved from 12.34172 to 12.34068, saving model to weights.best.hdf5
98/98 [==============================] - 6s 63ms/step - loss: 12.3407 - mean_squared_error: 2089.1526 - val_loss: 12.1351 - val_mean_squared_error: 2029.4374
Epoch 4/100
97/98 [============================>.] - ETA: 0s - loss: 12.3320 - mean_squared_error: 2087.0349
Epoch 00004: loss improved from 12.34068 to 12.34050, saving model to weights.best.hdf5
98/98 [==============================] - 6s 63ms/step - loss: 12.3405 - mean_squared_error: 2089.1489 - val_loss: 12.1350 - val_mean_squared_error: 2029.4448

But then both loss and mean_squared_error decrease quickly when I changed the decoder's last activation to relu.

Epoch 1/100
97/98 [============================>.] - ETA: 0s - loss: 9.8283 - mean_squared_error: 1267.3282 
Epoch 00001: loss improved from inf to 9.82359, saving model to weights.best.hdf5
98/98 [==============================] - 6s 64ms/step - loss: 9.8236 - mean_squared_error: 1266.0548 - val_loss: 8.4972 - val_mean_squared_error: 971.0208
Epoch 2/100
97/98 [============================>.] - ETA: 0s - loss: 8.1906 - mean_squared_error: 910.6423 
Epoch 00002: loss improved from 9.82359 to 8.19058, saving model to weights.best.hdf5
98/98 [==============================] - 6s 62ms/step - loss: 8.1906 - mean_squared_error: 910.5417 - val_loss: 7.6558 - val_mean_squared_error: 811.6011
Epoch 3/100
97/98 [============================>.] - ETA: 0s - loss: 7.3522 - mean_squared_error: 736.2031
Epoch 00003: loss improved from 8.19058 to 7.35255, saving model to weights.best.hdf5
98/98 [==============================] - 6s 61ms/step - loss: 7.3525 - mean_squared_error: 736.2403 - val_loss: 6.8044 - val_mean_squared_error: 650.5342
Epoch 4/100
97/98 [============================>.] - ETA: 0s - loss: 6.6166 - mean_squared_error: 621.1281
Epoch 00004: loss improved from 7.35255 to 6.61435, saving model to weights.best.hdf5
98/98 [==============================] - 6s 61ms/step - loss: 6.6143 - mean_squared_error: 620.6105 - val_loss: 6.2180 - val_mean_squared_error: 572.2390

I want to verify if it is valid to use an-all relu function in the network architecture. Being novice to deep learning.

hafiz031 · Accepted Answer

What you have asked invokes another question which is very fundamental. Ask yourself: "What you actually want the model to do?"- Predicting a real value? Or Value within a certain range? - You will get your answer.

But before that what I feel I should give you a brief on what activation functions are all about and why we use them.

Activation functions' main goal is to introduce non-linearity in your model. As the combination of linear functions is also a linear function, hence, without activation functions a Neural Network is nothing but a giant linear function. Hence, being a liner function itself it won't be able to learn any non-linear behavior at all. This is the primary purpose of using an activation function.

Another purpose is to limit the range of output from a neuron. Following image shows Sigmoid and ReLU activation functions (the image is collected from here).

Sigmoid vs ReLU

These two graphs show exactly what kind of limitations they can impose on values passed through them. If you look at Sigmoid function it is allowing output to be in between 0 to 1. So we can think it like a probability mapping based on some input value to the function. So where we can use it? Say for binary classification if we assign 0 and 1 for two different classes and use a Sigmoid function in the output layer it can give us the probability of belonging to a certain class for an example input.

Now coming to ReLU. What it does? It only allows Non-negative values. As you can see all the negative values in horizontal axis is being mapped to 0 in vertical axis. But for positive values the 45 degree straight line shows that it does nothing to them and leave them as they are. Basically it helps us to get rid of negative values and makes them 0 and allows non-negative values only. Mathematically: relu(value) = max(0, value).

Now picture a situation: Say you want to predict real values which can be positive, zero or even negative! Will you use ReLU activation function in the output layer just because it looks cool? Nope! Obviously not. If you do so it will never be able to predict any negative values as all the negative values are being trimmed down to 0.

Now coming to your case, I believe this model should predict values which shouldn't be limited from 0 to 1. It should be a real valued prediction.

Hence when you are using sigmoid function, it is basically forcing the model to output between 0 to 1 and which is not a valid prediction in most of the cases and thus the model is producing large loss and MSE values. As the model is forcefully predicting something which is not anywhere near to the actual correct output.

Again when you are using ReLU it is performing better. Because ReLU doesn't change any non-negative value. Hence, the model is free to predict any non-negative values and now there is no bound to predict values which are close to actual outputs.

But what I think the model wants to predict intensity values which are likely from 0 to 255. Hence, there are already no negative values coming from your model. So in that sense technically there is no need of using ReLU activation function in the last layer as it will not even get any negative values to filter out (if I am not mistaken). But you can use it as the official TensorFlow documentation is using it. But it is only for safety purpose such that no negative values can come out and again the ReLU won't do anything to non-negative values.

Can relu be used at the last layer of a neural network?

Tags:

tensorflow

deep-learning

keras

conv-neural-network

autoencoder

1 Answers

hafiz031

Recent Activity

Donate For Us

Can relu be used at the last layer of a neural network?

Tags:

tensorflow

deep-learning

keras

conv-neural-network

autoencoder

1 Answers

hafiz031

Related questions

Recent Activity

Donate For Us