When solving a binary classification problem, I think there are two possible ways in caffe.
The first one is using "SigmoidCrossEntropyLossLayer" with one output unit.
The other one is using "SoftmaxWithLossLayer" with two output units.
My question is what’s the difference between these two approaches?
Which one should I use?
Thank you very much!
If you play a bit with the math, you can "duplicate" the predicted class probability of the "Sigmoid" layer to 0.5*x_i for class 1 and -0.5*x_i for class 0, then the "SoftmaxWithLoss" layer amounts to "SigmoindWithCrossEntropy" on the single output predictions x_i.
So I believe it can be said that these two methods can be regarded as equivalent for predicting binary outputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With