The output layer of my neural network (3 layered) is using sigmoid as activation which outputs only in range [0-1]. However, if I want to train it for outputs that are beyond [0-1], say in thousands, what should I do?
For example if I want to train
input ----> output
0 0 ------> 0
0 1 ------> 1000
1000 1 ----> 1
1 1 -------> 0
My program works for AND, OR, XOR etc. As input output are all in binary.
There were some suggestion to use,
Activation:
y = lambda*(abs(x)1/(1+exp(-1(x))))
Derivative of activation:
lambda*(abs(y)y(1-y))
This did not converge for the mentioned training pattern (if I have not done anything wrong). Are there any suggestion please?
For classification problems, it is customary to use a sigmoid/logistic activation function in the output layer to get proper probability values in the range [0,1]; coupled with 1-of-N encoding for multi-class classification, each node output would represent the probability of the instance belonging to each class value.
On the other hand, if you have a regression problem, there is no need to apply additional functions on the output, and you can just take the raw linear combination output. The network will automatically learn the weights to give whatever output values you have (even in the thousands).
What you should also be careful about is to scale the input features (by normalizing all features to the range [-1,1] for example).
Scale the outputs up to the values you want, or normalize the training data back down to a range of [0,1] are the obvious solutions. I can't think of any a priori reason that the scaling needs to be linear, either (although it obviously wants to be monotonically increasing) so you might tinker with log functions, here.
What kind of problem are you working on that you have such large ranges?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With